Computational protocol: Shoot transcriptome of the giant reed, Arundo donax

[…] Sequencing was performed at National Center for Genome Resources (Santa Fe, NM, USA) using the standard Illumina RNA library preparation protocol and a single lane of the HiSeq 100 bases pair-end approach. A total of 181,972,782 pair-end Illumina raw reads were produced, and quality assessed using FASTQC version 0.10.1 []. The first 12 bases of all reads were trimmed using seqqtk version 4.19 [] to remove sequencing biases. Contigs were de novo assembled with trans-ABySS version 1.4.8 and Velvet-Oases version 0.2.08 using kmer sizes of 49, 53, 59 and 63. This yielded 368,848 and 1,477,609 transcripts (≥200 bp) produced by trans-ABBySS and Velvet-Oases, respectively. Trans-ABBySS assembled transcripts were further merged using Cap3 at 99.9% sequence overlap identity resulting in 43,822 merged contigs, and 249,590 unmerged transcripts. Velvet-Oases has been shown to produce overall longer assembled transcripts as compared to other assemblers . We also found that Velvet-Oases can produce spurious isoforms and these can be removed by selecting representative transcripts for each locus .We screened assembled transcripts against Poaceae proteins (NCBI NR) and defined as ‘high confidence genes’ those transcripts with sequence identity ≥30% and coverage ≥70% of a known Poaceae genes. We also classified as ‘low confidence genes’ those transcripts with partial or no hits to known Poaceae genes that have been assembled by both trans-ABBySS and Velvet-Oases pipelines with 100% sequence identity and reciprocal transcript coverage greater than 90%. We report a total of 103,081 A. donax transcripts, of these 27,491 and 75,590 are high and low confidence genes, respectively ( and A). More than 70% of the high confidence genes were functionally annotated, while only 34.55% of the low confidence genes had partial hits to known and domain-containing Poaceae genes (A). We used AutoFACT version 3.4 to functionally annotate transcripts (). The relative abundance of the top 20 KEGG pathways of high confidence genes as compared to the low confidence gene set is shown in B. We found 1.86, 1.71 and 1.58 fold increase of the number of genes assigned to the spliceosome, metabolic pathways of purine metabolism and peroxisome among high confidence genes (B). C shows the top Gene Ontology annotations found among high and low confidence genes. Interestingly, two genes with copper ion binding and transport function were only found among the high confidence genes, while genes involved in nutrient reservoir activity and reproductive growth were only found among the low confidence genes (C). The resources generated in this study will facilitate comparative transcriptomics analyses of invasive plant species. […]

Pipeline specifications

Software tools FastQC, Seqtk, Trans-ABySS, Oases, CAP3, AutoFACT
Databases KEGG
Application Transcription analysis