Computational protocol: Complete Chloroplast Genome of Tanaecium tetragonolobum: The First Bignoniaceae Plastome

Similar protocols

Protocol publication

[…] Illumina adaptors and barcodes were removed from raw reads. The clean reads were then filtered for quality using a custom Perl script that trimmed reads from the ends until there were three consecutive bases with a Phred quality score >20. Reads with a median quality score of 21 or less, with more than three uncalled bases, or less than 40 bp in length were removed from the dataset. The chloroplast genome of T. tetragonolobum was reconstructed using a combination of de novo and reference-guided assemblies. Clean and high-quality sequence reads were assembled de novo using Velvet 2.3 [], with a K-mer length value of 71. A reference-guided assembly was performed using YASRA 2.32 [] using Olea europaea L. (Oleaceae, Lamiales, GenBank accession number NC_013707) as reference. Contigs produced de novo were blasted against the original chloroplast genome reference in order to exclude contigs of nuclear origin. Contigs with coverage below 10x were eliminated, likely leading to the exclusion of contigs of mitochondrial origin as well. The remaining de novo and reference-guided contigs were assembled into larger contigs in Sequencher 5.3.2 (Gene Codes Inc., Ann Arbor, MI) based on at least 20 bps overlap and 98% similarity. Any discrepancies between de novo and reference-guided contigs were corrected by searching the high quality read pool using the UNIX ‘grep’ function. The ‘grep’ function was also used to find reads that could fill any gaps between contigs that did not assemble in the initial set of analyses (i.e., genome walking technique). We then applied Jellyfish [] to create a 20-kmer count look-up table that was used as basis to check for the quality of the T. tetragonolobum chloroplast genome sequences. Genome coverage was also analyzed using Jellyfish, which resulted in a 127-fold genome coverage.The chloroplast genome of T. tetragonolobum was annotated using DOGMA (Dual Organellar GenoMe Annotator,, []), with manual corrections for potential changes in the start and stop codons, as well as intron positions based on comparisons to homologous genes in other plastomes. Transfer RNA genes were identified with DOGMA [] and the tRNAscan-SE program ver. 1.23 (, []). We used CpBase ( to determine the functional classification of the chloroplast genes. A circular representation of the T. tetragonolobum chloroplast genome was made using GenomeVx tool (, []). The whole nucleotide sequence of the T. tetragonolobum plastome along with gene annotations was deposited in GenBank (accession number KR534325). The short read library of T. tetragonolobum is available from the ENA read archive under accession number ERS717260. [...] The software mVISTA (, []) was used in Shuffle-LAGAN mode [] to compare the complete cp genome of T. tetragonolobum with three representatives of chloropast genomes of other species of Lamiales: Boea hygrometrica (Bunge) R. Br. (Gesneriaceae; NC_016468), Olea europaea (Oleaceae; NC_013707), and Sesamum indicum L. (Pedaliaceae; NC_016433). The closely related but basal species Nicotiana tabacum L. (Solanaceae; Solanales; NC_001879) was used as reference in the comparative analyses.In order to examine variation in the evolutionary rates of chloroplast genes, we calculated the non-synonymous substitution rates (Ka), synonymous substitution rates (Ks), and their ratio (Ka/Ks) using Model Averaging in the KaKs_Calculator program []. Protein-coding sequences from T. tetragonolobum and three Lamiales species (B. hygrometrica, O. europaea, and S. indicum) were aligned using the software MAFFT v.7 []. The corresponding genes of N. tabacum were used as reference in the alignments. [...] We used the online REPuter software (, []) to identify and locate forward, palindrome, reverse, and complement sequences with n ≥30 bp and a sequence identity ≥90%. To assess the number of repeats in other chloroplast genomes, we ran the same REPuter analyses against the chloroplast genomes of the other three Lamiales species that were used in the comparative analyses. Simple sequence repeats (SSRs) were identified using the online software WebSat (, []) and Gramene Ssrtool (, []). We applied a threshold seven to mononucleotide repeats, four to dinucleotide repeats and three to, tri-, tetra-, penta-, and hexanucleotide repeats. Additionally, a potential set of microsatellite markers was identified for T. tetragonolobum. Primers were designed with the software PRIMER3 (, []) by setting product size ranges from 100 to 250 bp, primer size from 18 to 24 bp, GC content from 40 to 60, and 1°C as the maximum difference between the melting temperatures of the left and right primers. To identify variation in the set of chloroplast SSRs markers designed for T. tetragonolobum, we searched for the same loci in the cp genomes of Boea hygrometrica, Olea europaea, and Sesamum indicum. […]

Pipeline specifications

Software tools Velvet, Sequencher, Jellyfish, DOGMA, tRNAscan-SE, GenomeVx, mVISTA, LAGAN, MAFFT
Applications Genome annotation, Nucleotide sequence alignment, Genome data visualization
Organisms Tanaecium tetragonolobum
Chemicals Nucleotides