Similar protocols

Protocol publication

[…] We used 10–20 individuals for RNA extraction per biological replicate, and performed two replicates for P. caryaefallax, P. subelliptica, P. caryaemagna, P. caryaecaulis, P. quercus and D. vitifoliae, and one replicate for P. foveata, P. foveola and P. caryaevenae. Whole bodies of all insect specimens were homogenized in RTL buffer (Qiagen) and then processed for total RNA extraction using RNeasy Mini kit (Qiagen) following the protocol provided. RNA integrity was examined using a fragment analyzer and samples of the RNA integrity number (RIN) > 7 were used for sequencing. The mRNA library construction and RNA sequencing were performed at the Genomics Core, Washington State University, Spokane, Washington. Briefly, mRNA molecules were enriched using the oligo-dT beads and libraries were constructed for paired-end sequencing on the Illumina HiSeq platform. Raw reads were adaptor-trimmed and filtered to a minimum quality score of 30 over 95% of the read. A single transcriptome reference was generated for each taxon by assembling filtered reads in Trinity with default setting (version 2.1.1; []) and assembled sequences were subsequently clustered at a minimum identity of 95% using CD-HIT-EST included in the CD-HIT package (version 4.6.1; []).Raw RNA reads of five Aphididae species, including the green peach aphid (Myzus persicae) and four plant-gall related species: Pemphigus obesinymphae, Pe. populicaulis, Tamalia coweni and T. inquilinus, were downloaded from the NCBI database (BioProject # PRJNA296778 for M. persicae, BioProject # PRJNA301746 for the two Pemphigus species, and BioProject # PRJNA297665 for the two Tamalia species). Unlike Pe. obesinymphae, Pe. populicaulis, and T. coweni that induce galls in plant tissues, T. inquilinus does not induce galls but inhabits galls induced by other galling insects []. De novo transcriptome references were generated for these five species using Trinity and CD-HIT-EST as described above. The M. persicae and draft D. vitifoliae genomes available from BIPAA ( were used to compare results from the de novo assembled transcriptomes to help assess how accurate transcript counts were to the true number of annotated genes; however, only M. persicae sequences used in this study were taken from the available genome. [...] Amino acid transporters in the APC (TC #2.A.3) and AAAP (TC # 2.A.18) families were annotated for all phylloxerid and downloaded aphid sequences following the previously described methods [, , ]. All bioinformatics tools used here were run at default setting unless explicitly stated. Briefly, longest open reading frames (ORFs) for all transcripts were predicted and translated into protein sequences using a stand-alone PERL script []. The protein sequences were searched against the Pfam domain database (Pfam29.0) for functional domains PF03024 (APC) and PF01490 (AAAP) (evalue <0.001) using the HMMSCAN program included in the HMMER software suite (version 3.1b1, []). Transcripts with HMMER APC or AAAP hits were verified subsequently by BLAST searching (evalue <0.001) against the NCBI non-redundant protein database. We excluded transcripts derived from possible plant tissue contaminants or other organisms that co-inhabit within the galls induced by Phylloxeridae species, and those of non-APCs or -AAAPs such as Na-K-Cl cotransporters by retaining only transcripts whose best BLAST hits were hemipteran APC or AAAP members.Because RNA sequencing and assembling approaches assign unique sequence ID for each splicing variant and truncated transcript that are encoded by same gene loci, the identified amino acid transporter transcripts were subsequently collapsed into putative representative loci following the methods previously described [, ]. The genomes of M. persicae and D. vitifoliae were used to map amino acid transporter transcripts to genome scaffold locations using BLASTN searches. Transcripts mapping to the same location were collapsed into the one encoding the longest ORF, or, when partial- or non-overlapping, merged into a single locus. To recover all possible AATs that are encoded by the genomes but were not identified from M. persicae and D. vitifoliae de novo transcriptome assemblies, we performed BLAST searches (evalue <0.001) using an APC or AAAP transcript of M. persicae and D. vitifoliae, respectively, against their own genome databases, and those recovered, if any, were subsequently verified at the NCBI non-redundant protein database as described above. For the remaining species lacking draft genome sequences, we: 1) collapsed transcripts having the same Trinity component number into the one encoding the longest ORF, and 2) collapsed closely related transcripts into the one encoding the longest ORF if they have a pairwise synonymous substitution rate (Ks value) less than 0.25 [] determined using PAML (version 4.8; []) or if two transcripts have less than 50-bp of overlapping region, as performed in []. All chosen representative transcripts were translated into the longest protein sequences in Blast2GO Pro []. Amino acid transporters encoded by Acyrthosiphon pisum and Drosophila melanogaster genomes were annotated and previously reported []. [...] We used DNA sequences of three protein-coding mitochondrial genes, cytochrome c oxidase subunit I (COI), cytochrome c oxidase subunit II (COII) and cytochrome b (CYTB), to resolve the phylogenetic relationship among the nine Phylloxera species and six Aphididae species as described above. COI and COII are widely used to infer insect phylogeny at a variety of hierarchy levels, from closely related species to orders, and CYTB is fast-evolving and thus useful for the phylogenetic analysis of closely-related taxa [].The DNA sequences of these three genes were either retrieved from the de novo transcriptomes we assembled or downloaded from the Genbank database (accession # FJ411411.1 for three A. pisum genes; accession # NC_029727.1 for three M. persicae genes; accession # AM748716.1 for Pe. obesinymphae COII). Three gene sequences (COI, COII and CYTB) in each taxon were concatenated to a single one and then aligned using MAFFT (version 7.130) with ‘auto’ setting []. The poorly aligned and divergent regions were eliminated on the Gblocks server with default settings []. The best-fit nucleotide substitution model was determined in MEGA6 [], using GTR + G + I. The maximum likelihood method was then run in MEGA6 to construct the phylogenetic trees by testing 1000 bootstrap replications [].Phylogenetic analyses of AATs were performed using putative APC protein sequences and AAAP arthropod expanded clade sequences, respectively. AAAP members are composed of the arthropod and non-arthropod expanded clades, between which the sequences are highly divergent [, , ]. The arthropod expanded clade was so designated because of its multiple gene duplications in the common ancestor of arthropods in contrast to those AAAPs that fall outside this clade []. Two A. pisum Na-K-Cl transporters (ACYPI001649 and ACYPI007138) and two human SLC36 proteins (SLC36A1 and SLC36A2), which were previously used as outgroups for the phylogenetic analyses of APC and AAAP members, respectively [, , ], were used likewise in this study. Sequences were aligned using MAFFT with ‘auto’ setting and the alignments were trimmed using TRIMAL (version 1.2) based on a gap threshold of 0.25 []. We used MEGA6 to determine the best-fit models of protein evolution, which are LG + G + F for APC proteins and LG + G for AAAP proteins. Because the LG model is not available in the phylogenetics program MRBAYES, we chose the WAG + G + F model for APC and WAG + G for AAAP arthropod expanded clade proteins, and ran the analyses using two runs with 4 chains per run in MRBAYES (version 3.2.1) until the standard deviation of split frequencies between runs dropped below 0.05. The first 25% of generations were discarded and the remaining generations were used to build a 50% majority-rule consensus tree. Lastly, we used the same alignment from above to perform a maximum likelihood inference (RAxML-HPC2) on XSEDE in the CIPRES computing environment [] for comparison and to generate a consensus topology.To test if phylogenetically dependent gene families differed per life history for transcript counts within a gene family, we used a PGLS model (counts ~ life history) with a Brownian correlation and a phylogenetic tree using the mitochondrial sequences generated above. Unique AAT sequences were counted for each gene family for each insect used in this study (see Fig. ) and combined with known counts from three free-living Sternorrhyncha [] to increase sample size prior to assessing for differences, as a conservative approach. The mitochondrial sequences for the additional insects were obtained from NCBI (NC_030055.1; Bactericera cockerelli, KU877168.1; Bemisia tabaci, KP692637.1 and AY691419.1; Planococcus citri). Sequences were aligned, concatenated using Gblocks to identify conserved mitochondrial sequences, and aligned for a final tree output using RAxML on the CIPRES environment, as described above. The PGLS model was run using the R computing environment and the library ‘picante’ []. […]

Pipeline specifications

Software tools Trinity, CD-HIT, HMMER hmmscan, HMMER, BLASTN, PAML, Blast2GO, MAFFT, Gblocks, MEGA, trimAl, MrBayes, RAxML, Picante
Databases Pfam
Organisms Bacteria
Diseases Gallstones
Chemicals Amino Acids