Computational protocol: Genomic Analysis and Isolation of RNA Polymerase II Dependent Promoters from Spodoptera frugiperda

[…] For isolation of RNA 1x106 Sf21 cells were grown and harvested at 2 h, 24 h and 48 h after initial passaging of the cell culture to 0.5x106 cells/mL. RNA was isolated using a RNeasyMini Kit 50 (Qiagen). The poly(A)+ mRNA fraction was isolated with oligo(dT)-magnetic beads. Libraries were prepared from mRNA using ScriptSeq- v2 RNA-Seq Library Preparation Kit (Epicentre Biotechnologies). The libraries were sequenced on a HiSeq2500 (Illumina) for 51 cycles following standard protocols. Image analysis to generate FastQ files was done with the Genome Analyser Pipeline Analysis software 1.8.2 (Illumina). Quality control and adapter clipping of the fastq sequences was done using fastq-mcf tool of ea-utils []. The Trinity package was used for further analysis of the mRNA transcripts []. A total of 30405 transcripts with a predicted open reading frame (ORF) >100 nucleotides were assembled. Transcript quantification was done with RSEM []. The average RSEM was 17, while 77 transcripts showed a RSEM higher than 1000. BLAST+ was used to identify 11625 protein coding regions from 30405 transcripts. [...] The Sf21 DNA was sequenced by Illumina sequencing technology with two libraries: a 2x104bp paired-end library of ~280 bp inserts and a 2x94 bp mate-pair library of ~4500 bp inserts.An initial assembly was produced with the paired-end data. SGA [] (version 0.9.43) was used for read correction and filtering which yielded ~78.3e6 read pairs which were used as input to SOAPdenovo2 [] (version r233) to perform contig assembly, scaffolding and gap closing.The mate-pair data were processed with FLASH [] (version 1.2.6) and all overlapping read pairs were discarded. The resulting ~8.7e6 pairs were used with SOAPdenovo2 for scaffolding the paired-end assembly. Both, paired-end and mate-pair data were utilized for a final gap closing step (SOAPdenovo2).Restricting to scaffolds of minimal size 300 bp, the resulting draft assembly is composed of 51,304 scaffolds, in total 466.7 MB with an N50 of 133.8 kb (statistics computed with QUAST [] (version 2.3)). The completeness of the assembly was assessed with CEGMA [] (version 2.4) which detected 99.19% of ultra-conserved Core Eukaryotic Genes (CEGs) in complete copies, suggesting a very high degree of completeness. Recently, a less complete Sf21 draft genome sequence was published, comprising 358 MB sequence with an N50 of 53.8 kb and 73.79% complete CEG hits []. The results of the Sf21 genome assembly are shown in . […]

Pipeline specifications

Software tools SOAPdenovo, QUAST, CEGMA
Application De novo sequencing analysis
Organisms Spodoptera frugiperda
Diseases Virus Diseases