Computational protocol: Chloroplast Genome Sequence of Lagerstroemia guilinensis (Lythraceae, Myrtales), a Species Endemic to the Guilin Limestone Area in Guangxi Province, China

Similar protocols

Protocol publication

[…] Lagerstroemia is the most economically valuable genus in Lythraceae due to its utility as an ornamental plant. The genus is composed of about 55 species (). Lagerstroemia guilinensis is a 2-m-tall shrub with a distribution documented only around Xishan Park in Guilin City, China. Due to its narrow and limited distribution, L. guilinensis is at a higher risk of extinction than other broadly distributed Lagerstroemia species. L. guilinensis is only found growing on limestone mountains and blooms from May until July. Molecular research has been done to identify Lagerstroemia cultivars and interspecific hybrids (, ), but there is a lack of complete genome-level research on Lagerstroemia. We acquired L. guilinensis (ZAFU 1507144) samples from Xishan Park of Guilin City, Guangxi Province, China, to finish its chloroplast (cp) genome.Chloroplast genomes have a highly conserved circular DNA quadruplet structure ranging from 120 to 165 kb, with conserved gene order, similarity of sequence across the land plants, uniparental inheritance, and low recombination rates () compared to nuclear genomes. Plant cp genomes provide a valuable resource of markers in phylogenetics (), DNA barcoding (), and biogeography among populations (). With the dramatically reduced cost of next-generation sequencing, it has become more convenient to sequence whole cp genomes (). More than 900 land plant complete cp genomes can be accessed at the NCBI database ().The raw Illumina reads generated for this report were trimmed by quality score using Trimmomatic version 0.3 (). The de novo assembly of reads from L. guilinensis were finished using CLC Genomics Workbench version 7, with the default settings (CLC bio). De novo assembly was used () to construct the assemblies. After merging Illumina and Sanger sequence data, the whole cp genome for L. guilinensis was found to be 152,074 bp. The final cp genome was annotated by DOGMA ( with manual adjustment of the exon-intron junctions ().We elucidated the genomic characteristics of this species: the cp genome was 152,074 bp in length, with 37.6% overall G+C content. The genome structure was highly similar to that of land plants, consisting of two inverted regions (IRs) (25,677 bp), a large single copy (LSC) (83,811 bp), and a small single copy (SSC) (16,909 bp). Of the 112 unique genes (78 protein-coding genes, 4 rRNAs, and 30 tRNAs), 82 genes are located in the LSC region (60 protein-coding genes and 22 tRNA genes), 13 genes are located in the SSC region (12 protein-coding genes and 1 tRNA gene), and 17 genes are located in both IR regions (6 coding genes, 4 rRNA genes, and 7 tRNA genes). Sixteen genes were found to have introns, with 5 tRNA genes having a single intron each (trnA-GUC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC), eight protein-coding genes having a single intron each (atpF, ndhA, ndhB, petB, petD, rpl16, rpoC1, and rps16), and three protein-coding genes having two introns each (clpP, rps12, and ycf3). […]

Pipeline specifications