Computational protocol: The genomes of Crithidia bombi and C. expoeki, common parasites of bumblebees

Similar protocols

Protocol publication

[…] We sequenced the full genome of Crithidia bombi using a combination of sequencing runs on the Roche 454 FLX Titanium, Pacific Biosciences PacBio RS (starting in 2009; both at the Functional Genomics Center Zurich, FGCZ;, and Illumina GA2 at GATC-Biotech (Konstanz, Germany). We used a total of 11 fragment libraries constructed for the 454-platform (9 paired-end libraries and 2 single-end libraries; ). In total, we generated 7,127,289 sequence reads with a mean length of 446 bp on this 454-platform. We produced one single-molecule real-time (SMRT) library for the PacBio platform according to the manufacturer's recommendations (Pacific Bioscience; but slightly modified, as we had to start with 10–20 μg, rather than 5 μg as suggested, sheared with g-tubes from Covaris, pn 520079), and then sequenced the library at the FGCZ on six SMRT cells and according to FGCZ's protocols. Our PacBio sequencing generated 270,958 sequences with mean length of 2,517 bp. For the Illumina, we constructed and sequenced four fragment libraries, producing 65,082,902 single-end reads of 76 bp length. Reads containing adapters were trimmed with cutadapt []. Quality-filtering and trimming was done with []. We used the Illumina reads to error-correct the PacBio reads with the pacBioToCA module of the WGS-Assembler version 7.We sequenced the full genome of Crithidia expoeki with the Pacific Biosciences PacBio RS platform at the FGCZ. One SMRT library was constructed and sequenced on 9 SMRT cells, generating 381,293 sequences with a mean length of 7,181 bp; trimming was done within a local installation of the Pacific Bioscience SMRT portal version 2.3.0. [...] We assembled the Crithidia bombi genome in two steps. First, we assembled all Roche 454 sequence reads using the runAssembly command line interface of the 454 GS de novo assembler version 2.7 with default settings, except for minimum overlap length (set to 40 bp), minimum overlap identity (set to 95%) and minimum contig length (set to 100 bp). The resulting assembly contained 265 scaffolds, a scaffold N50 of 658k bp, and a total size of 32.1 Mb. In a second step, we error-corrected the PacBio sequence reads with pacBioToCA [] from the WGS-Assembler version 7 [] using the Illumina sequence reads. The resulting corrected reads were then used to improve and extend the 454/Roche contigs from the first step using the software PBjelly version 12.9.14 []. In order to optimize parameters of the assembly tools and to assess the quality of the final assembly we used the CEGMA tool [] to count the core eukaryotic genes. Higher numbers of complete proteins are an indication of a more complete and accurate assembly.For Crithidia expoeki, we assembled the Pac Bio reads with a local installation of the PacBio SMRT Portal using the 'RS_HGAP_ Assembly.2' assembly protocol after filtering subreads to a minimum length of 500 bp, minimum quality of 0.75, and a seed read length of 8,000bp. A total of 367,242 reads with mean length 7,446 bp remained after filtering. We then used the Celera Assembler using the following settings: genome size = 35 Mb, target coverage = 25, overlapper error rate = 0.07, overlapper min length = 50, overlapper k-mer = 16. Finally, we used Quiver [] in the polishing step using only the unambiguously mapped reads. We manually inspected the assembly and removed 73 scaffolds with less than 10x coverage.In order to assess the completeness of the assembly we ran BUSCO v2.0.1 with the protist ensemble database downloaded from the BUSCO website. The option “—long” was set to turn on the Augustus optimization mode. BUSCO was run on the final genome assemblies of C. bombi and C. expoeki and for comparison also on the genome of L. major.All reads are deposited in the European Nucleotide Archive (ENA) under accession numbers PRJEB21108 (C. bombi) and PRJEB21109 (C. expoeki). […]

Pipeline specifications

Software tools cutadapt, ConDeTri, Newbler, PBSuite, CEGMA, HGAP, Celera assembler, BUSCO, AUGUSTUS
Application WGS analysis
Organisms Toxoplasma gondii, Apis mellifera, Leishmania, Leptomonas pyrrhocoris, Leishmania major, Trypanosoma brucei
Diseases Infection
Chemicals Titanium