Computational protocol: Evolutionary history of host use, rather than plant phylogeny, determines gene expression in a generalist butterfly

Similar protocols

Protocol publication

[…] The methods conducted for caterpillar RNA isolation, quality assessment and sequencing are described in an additional file (see Additional file ). The RNA sequences obtained were assembled de novo using Trinity software []. Redundant sequences in the transcriptome assembly (TA) generated by Trinity were clustered using the program CD-HIT 4.5.4 (Cluster Database at High Identity with Tolerance) []; thereafter this clustered assembly was used. Using Blast (basic local alignment search tool) (National Center for Biotechnology Information, Bethesda MD) the TA sequences were compared at the amino acid level to a non-redundant predicted gene set (PGS) of the nearest genome, Heliconius melpomene. The non-redundant PGS of H. melpomene was generated using CD-HIT 4.5.4 clustering using default settings on the full H. melpomene PGS. In order to measure the quality of the Trinity-generated TA, these Blast-inferred orthologs were used to determine the fraction of PGS assembled and the length of the assembled region. The length of the TA-sequence that is aligned, divided by the full length of the best hit ortholog, namely Ortholog Hit Ratio (OHR), has been previously used as a measure of TA quality []. Two OHR metrics were generated. First, the longest TA-sequence for each of the 12,591 H. melpomene genes was determined (longest OHR). Second, the full length covered of each of these genes was determined by using multiple sequences (sum OHR). Gene expression was quantified using the summed per ortholog number of RNA-Seq reads with the aid of an in-house script developed in Python (Python Software Foundation, DE) using the mapping software NextGenMap 0.4.10 [] on default parameter settings. Gene ontology was obtained through lifting over the annotation of the non-redundant PGS for H. melpomene to the Blast-inferred orthologs. The methodology for gene annotation and enrichment was also included in an additional file (see Additional file ). […]

Pipeline specifications

Software tools Trinity, CD-HIT, BLASTN, NextGenMap
Application RNA-seq analysis