Computational protocol: Expression Divergence of Chemosensory Genes between Drosophila sechellia and Its Sibling Species and Its Implications for Host Shift

[…] For paired-end mRNA-seq library preparation, we used the TruSeq v2 kits from Illumina. Input of 4 µg of total RNA was used for mRNA enrichment by oligo-dT beads followed by cation-catalyzed fragmentation for 7.5 min at 94 °C. The mRNA fragments were then converted into double-stranded cDNA by random priming followed by end repair and A-tailing. The fragments were then ligated to the barcoded paired-end adaptors and subjected to ten cycles of polymerase chain reaction (PCR) amplification and purified by Ampure XP beads (Beckman Agencourt). The absolute concentrations of the libraries were determined by Qubit fluorometer (Invitrogen) and profiled by BioAnalyzer 2100 with High Sensitivity DNA Kit (Agilent). The six barcoded cDNA libraries were pooled together at equal molar ratio after quantitative PCR normalization (KAPA Library Quantification Kits) and loaded into three lanes of flow cell, and paired-end 2*100 nt multiplexed sequencing was conducted on Illumina HiSeq2000, yielding an average of 0.5 lanes of sequencing reads in total. The raw sequencing data reported in this work have been deposited in the NCBI GEO with accession numbers GSE67587, GSE67861 and GSE67862 for D. sechellia (Tuson, #14021-0248.25), D. sechellia (k-s10; #14021-0248.25) and D. simulans (k-s05; #14021-0251.194), respectively.The raw reads data of D. melanogaster from were included for analysis. Reference genomes version r5.48 (D. melanogaster), r1.3 (D. simulans), and r1.3 (D. sechellia) were used. For RNA-seq analysis, the 100-bp paired-end sequencing reads were mapped to the reference genome using TopHat 2.0.6 (), with allowance of two mismatches in read mapping to the reference genome. The expression levels of genes were measured in Fragments Per Kilobase of exon per Million fragments mapped (FPKMs) by Cufflinks with bias correction (“–multi-read-correct” and “–frag-bias-correct”). [...] Gene annotation tables of the three species were retrieved from FlyBase (version 2012_06) with individual versions 5.48, 1.3 and 1.3 for D. melanogaster, D. simulans and D. sechellia, respectively. The orthologous gene set (OGS) was manually curated by excluding genes duplicated in any of the species and only single copy genes in all three species were selected.The potential bias from sequencing between different samples was normalized by upper quartile implemented in NOISeq (). NOISeq requires technical replicates to perform the calculation. To identify DEGs, we used sequencing data from three lanes as technical repeats. As Dmel_TW had RNA-seq data from only a single sequencing lane for each sex, we generated technical replicates of Dmel_TW artificially in the software for comparison, using one function of NOISeq, NOISeq-sim. […]

Pipeline specifications

Software tools TopHat, Cufflinks, NOISeq
Databases FlyBase
Application RNA-seq analysis
Organisms Drosophila melanogaster, Drosophila sechellia, Drosophila simulans