Similar protocols

Pipeline publication

[…] to generate a BLAST database with MAKEBLASTDB from the NCBI BLAST 2.2.26 package, the mitochondrial and plastid contigs were identified by BLAST homology searches using the mitochondrial (GenBank accession number NC_017841, ) and plastid (GenBank accession number NC_008100, ) genomes as queries, and separated from the nuclear contigs. Putative contaminants were assessed by homology searches against the NCBI non-redundant database., RNA-Seq reads were filtered using a sliding-window quality approach with Sickle (Bioinformatics Core, University of California, Davis []) under the default parameters, and the overall read quality reassessed after filtering with FastQC. Illumina adapter sequences were then removed from the filtered sequences using custom Perl scripts, and PolyA-tails were removed from the reads with TrimEST from the EMBOSS 6.4.0 package. The filtered transcriptome reads were assembled with Trinity's Inchworm module with a maximum RAM allowance of 90 Gb (–JM 90G) on 8 processing cores (2 Intel Xeon E5506 CPUs at 2.13 GHz). Contigs were filtered by size with and contigs of at least 250 bp were selected for downstream analyses. Transcriptomic contigs were mapped on the genomic ones with GMAP version 2014-01-21 using the default parameters., The nuclear contigs of at least 500-bp in length were sorted by size and renumbered incrementally using customs Perl scripts. Contigs were then processed with the Maker 2.11 annotation gauntlet , using the Chlorella gene model as implemented in Augustus 2.5.5 . The resulting GFF annotations files were processed, curated, and converted to GenBank annotations files using custom Perl scripts. Putative functions were assigned using homology searches against the PFAM database (E-value threshold of 1E-30; ). Transposable elements were searched for with RepeatMasker [] using Repbase version 20130422 ., Illumina reads from the mitochondrial and plastid genome were first filtered out from the total dataset with bowtie 0.12.9 using –un and –al the flags against indexes built from the organelle sequences. Filtered nuclear reads were then mapped with bowtie against the 5,666 contigs (≥500 bp) with the –S flag, and the c […]

Pipeline specifications

Software tools FastQC, EMBOSS, Trinity, GMAP, AUGUSTUS