Computational protocol: Comparative Analysis of the Complete Plastomes of Apostasia wallichii and Neuwiedia singapureana (Apostasioideae) Reveals Different Evolutionary Dynamics of IR/SSC Boundary among Photosynthetic Orchids

Similar protocols

Protocol publication

[…] The raw sequencing reads were quality trimmed with an error probability of <0.05 and de novo assembled using CLC Genomic Workbench 6.0.1 (CLC Bio, Aarhus, Denmark). Contigs of <30× sequencing depths were discarded. Since the remaining contigs may contain the information not only from chloroplast genome, but also from nuclear genome or from mitochondrial genome, these contigs were searched using NCBI Blastn against the plastome sequences of A. odorata (NC_030722). Matched contigs with E-values of <10-10 were designated plastomic contigs. The gaps between plastomic contigs were closed by obtaining amplicons with specific primers and directly sequencing the amplicons. Junctions between the LSC/SSC and IRs were amplified and confirmed by PCR assays. Genes were annotated using DOGMA () and tRNAscan-SE 1.21 (). The exact boundaries of the annotated genes were confirmed by their alignment with their orthologous genes from published orchid plastomes.The plastome sequences of A. odorata (NC_030722), N. singapureana (KM 244735) have been published in GenBank of NCBI. However, detailed analyses of plastome composition and structure and comparisons with those of other orchid species have not yet been conducted. Therefore, in this study, they were downloaded and compared with our newly sequenced two plastomes of A. wallichii (LC199394) and N. singapureana (LC199503) by using the mVISTA software (). The N. singapureana (KM 244735) was used as the reference. [...] The size and location of repeat sequences, including palindromic, reverse and direct repeats, within the newly sequenced plastomes of A. wallichii (LC199394) and N. singapureana (LC199503) were identified using REPuter software (). The repeats were identified according to the following conditions: (1) hamming distance of 3, (2) sequence identity ≥90%, and (3) minimum repeat size ≥30 bp. The simple sequence repeat (SSR) elements were detected using the tool GMATo (), and the criteria of the “Min-length” for mononucleotide SSRs, dinucleotide SSRs, and multi-nucleotide SSRs were set to be greater than 8, 5, and 3 units, respectively. [...] For mutational hotspot selection, at least two complete plastomes from different species within the study genus should be available (e.g., ; ). However, only the plastome of N. singapureana was sequenced in Neuwiedia genus. Therefore, the plastomes of A. wallichii and A. odorata were used to screen for the most informative regions for the genus Apostasia. The sequences of protein-coding genes and non-coding loci, including intergenic spacers and introns, were retrieved from the plastomes of A. wallichii and A. odorata. Non-coding loci <150 bp were discarded. The sequence alignments were conducted using MUSCLE () implemented in Mega 5.2 (). The sequences of protein-coding genes were aligned with the Align Codons option using the default parameters. The sequences of non-coding loci were first aligned with the default parameters and then realigned with the “Refining” option. The gaps located at the 5′- and 3′-ends of the alignments were excluded. The yielded alignments were used to count the pairwise nucleotide substitutions and the insertion and deletion (InDel) events using DnaSP v5 (). The sequence variability (SV) was calculated according to the method of , ) and : SV% = (the number of nucleotide mutations + the number of InDel events)/(the number of conserved sites + the number of nucleotide mutations + the number of InDel events) × 100. [...] The sequences of 66 protein-coding genes were retrieved from the 45 orchid plastomes and the outgroup plastome of Lilium longiflorum (Supplementary Table ). The sequence alignments of these genes were separately performed and then concatenated to generate a data set with 44,898 characters. The synonymous (ds) and non-synonymous (dn) substitution rates were estimated with the CodeML program of PAML 4.8 (). The parameters were set to the following: seqtype = 1, runmodel = -2. The SV for each non-coding locus of 10 genera from five orchid subfamilies: Epidendroideae (Cymbidium, Phalaenopsis, Masdevallia, Dendrobium, and Bletilla), Orchidoideae (Goodyera), Cypripedioideae (Paphiopedilum and Cypripedium), Vanilloideae (Vanilla), and Apostasioideae (Apostasia) were calculated in and this study. In order to evaluate the plastome-wide variation of substitution rates among orchid species, the synonymous (ds) and non-synonymous (dn) substitution rates for each protein-coding gene of those 10 genera were estimated in this study.The selective pressure on the 66 genes from the 45 orchid species was also examined. Firstly, maximum likelihood tree was constructed using RAxML 8.0.2 () based on the concatenated data set with a GTRGAMMA model. The genes of L. longiflorum were used as outgroup. Secondly, the selective pressure on the 66 genes was analyzed using the site models in the CodeML program. LRT P-values were determined for three pairs of site models: M1a vs. M2a, M0 vs. M3, and M7 vs. M8. [...] Mesquite v. 3.02 () was employed to evaluate the correlation between the shift of IR boundaries and the retained ndh gene length. The boundary of IRs that was used for the comparison was a region from the 5′ end of ycf1 to the junction between the IR and SSC (Figure ). Statistical analyses with Spearman and Mann–Whitney tests were performed using SPSS Statistics 20.0. […]

Pipeline specifications