Computational protocol: The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis

Similar protocols

Protocol publication

[…] After filtering the raw data and removing of the impact of data quality (Phred score Cutoff-30), we obtained high-quality data. First, we used SOAPdenovo 2.01 (http://soap.genomics.org.cn/soapdenovo.html) [] to perform the initial assembly and obtain the contig sequences. Then, we used BLAT 36 [] (http://www.jurgott.org/linkage/LinkagePC.html) assembly to locate the long sequence of the near-edge species (Morus notabilis—KP939360.1, Morus mongolica- KM491711.2) of the chloroplast reference genome and obtain the relative positions of the contig sequences. According to the relative position of the contigs, we performed splicing and corrected assembly error. Finally, the whole framework maps of the chloroplast genomes were obtained.We used the software GapCloser 1.12 (Gapcloser is part of software SOAPdenovo) (https://sourceforge.net/projects/soapdenovo2/files/GapCloser) to fill the gaps in the frame sequence diagram using high-quality short sequences, and then used generation sequencing to complement and confirm the remaining gaps and suspicious areas. Finally, we verified the long single copy section (LSC), short single copy section (SSC), and inverted repeat (IR) regional connectivity to obtain the ring-shaped complete chloroplast genome sequence. The chloroplast genome sequences were annotated with CpGAVAS [] software (http://www.herbalgenomics.org/0506/cpgavas/analyzer/home) and DOGMA software, and then manually corrected. [...] To analyze the environmental pressure in the process of the evolution of different elms, KaKs_Calculator 2.0 [] (https://sourceforge.net/projects/kakscalculator2) were used to calculate Ka, Ks value of genes that with SNP differences. The codon preference were analyzed and maped by R software. We conducted a co-linear analysis of the Ulmus chloroplast genomes with published chloroplast genomes of other plants, including tobacco (Nicotiana tabacum NC_007500.1), Arabidopsis thaliana (NC_000932), poplar (Populus NC_009143) and mulberry (Moraceae NC_025772) species by GSV [] (http://cas-bioinfo.cas.unt.edu/gsv/homepage.php). Firstly, the sequences of all chloroplast sequences were pair-wise compared by BLAST (http://www.jurgott.org/linkage/LinkagePC.html). Then, screen comparison fragments that similarity were over than 80% and the matching length longer than 100 bp for drawing by GSV. To determine the phylogenetic positions of Ulmus species, we selected other 42 species published in NCBI and used the common chloroplast protein-coding genes to explore the evolution of the chloroplast genomes of Ulmus species and to verify their taxonomic status by MEGA 6.0 []. CGView Server (http://stothard.afns.ualberta.ca/cgview_server/index.html) were used to analyze the genetic variation of the chloroplast genome of five Ulmus species. […]

Pipeline specifications