Computational protocol: The Agassiz’s desert tortoise genome provides a resource for the conservation of a threatened species

Similar protocols

Protocol publication

[…] DNA sequence reads were trimmed to eliminate nucleotide biases and remove adaptors with Trimmomatic v0.27 []. We retained all reads ≥ 37 bp with a quality score ≥ 28. Illumina sequencing errors were corrected using SOAPec v2.01, and overlapping reads from the 200 bp libraries were joined to form single-end reads using FLASH v1.2.8 []. We compared the outputs of different de Bruijn graph assemblers, including ABySS 1.5.2 [], SOAPdenovo2 [], and Platanus v1.2.1 [] for both contig and scaffold assembly, in addition to SSPACE v3.0 [] for scaffold assembly; we then closed gaps using the GapCloser v1.12 module from SOAP (). The assemblies with the most scaffold contiguity (i.e., N50) and reasonable total length given the expected genome size were selected for further analysis. We evaluated completeness of the assemblies by their estimated gene content, using the Conserved Eukaryotic Genes Mapping Approach (CEGMA v2.5) [], which calculated the proportion of 248 core eukaryotic genes present in the genome assembly, and Benchmarking Universal Single Copy Orthologs (BUSCO v1.22) [], which calculated the proportion of a vertebrate-specific set of 3,023 conserved genes that were either complete, fragmented, or missing. We further improved the raw assembly by RNA scaffolding. In brief, assembled transcripts (described below) were searched for open reading frames (ORFs) and filtered to include only sequences that produced a significant BLAST hit to a protein sequence in the UniProtKB/Swiss-Prot database [] using TransDecoder v2.0 [] (https://transdecoder.github.io/). This filtered gene set was mapped to the genome assembly using BLAT [] and identified scaffolds were merged using L_RNA_scaffolder []. […]

Pipeline specifications

Software tools Trimmomatic, SOAPec, ABySS, SOAPdenovo, Platanus, SSPACE, CEGMA, BUSCO, TransDecoder, BLAT, L_RNA_scaffolder
Databases UniProt UniProtKB
Application De novo sequencing analysis
Organisms Gallus gallus
Chemicals Amino Acids