Computational protocol: Selection for High Oridonin Yield in the Chinese Medicinal Plant Isodon (Lamiaceae) Using a Combined Phylogenetics and Population Genetics Approach

Similar protocols

Protocol publication

[…] Sequences were assembled in Geneious Pro 5.4.3 (Biomatters, Auckland, New Zealand). In addition to 18 samples newly sequenced for this paper, 40 samples were included from a previous study to provide phylogenetic context and to help with the analysis of patterns of oridonin evolution. Coleus xanthanthus C. Y. Wu & Y. C. Huang was used as the outgroup, using Genbank sequences from another study . A list of all sequences and associated Genbank identification numbers is provided in . Sequences were first aligned using MUSCLE as implemented in Geneious, then manually edited in Jalview . Ambiguously aligned regions, such as polynucleotide repeats and regions at the end of sequences were identified by eye, and excluded in Mesquite Version 2.74 .Incongruence between the different gene regions was tested using the incongruence length difference (ILD) test as implemented in PAUP* 4.0b10 . The ILD test was conducted using the heuristic search method with 1000 homogeneity replicates of 1000 random addition sequence replicates each, saving one tree per addition sequence replicate.Maximum parsimony (MP) analyses were done in PAUP using parsimony ratchet commands created in PAUPRat . Each PAUPRat analysis was done with 1000 ratchet iterations, with 15% of the characters perturbed on each iteration. The PAUPRat commands were implemented five separate times and the most parsimonious trees from each run were concatenated in PAUP to make a strict consensus tree. Support for the MP trees was assessed using parsimony bootstrap. Each bootstrap analysis consisted of 1000 bootstrap replicates each consisting of 100 heuristic search random addition sequence replicates, saving one tree per replicate. The bootstrap analysis was done five times. The bootstrap trees from each analysis were concatenated in PAUP without excluding duplicate trees, and a majority rule consensus tree was constructed using tree weights to provide bootstrap support values.The best model of evolution for maximum likelihood (ML) and Bayesian analyses was determined using jModeltest 0.1.1 , . Based on the Akaike information criterion, the TrN+Γ model was selected for the chloroplast data and the GTR+Γ model was selected for the ITS data. The ML analyses were conducted in PAUP and in the online version of RAxML as implemented through the Cyberinfrastructure for Phylogenetic Research (CIPRES) web portal using parameters specified by jModeltest. For ML analyses conducted in PAUP, the heuristic search method was used with 100 random addition sequence replicates. Bayesian analyses were conducted using MrBayes 3.1 , . Each Bayesian analysis consisted of four Markov chains running for 1,000,000 generations, and sampling every 100 generations. Since the exact jModeltest specifications could not be implemented in MrBayes, a general model (i.e., GTR+Γ) using estimated parameters was used for both the chloroplast and ITS datasets. The trees and parameters corresponding to the first 30% of samples were discarded as the burnin in order to calculate the best tree and posterior probability support values. [...] Statistical tests of population genetic variation were conducted within GENALEX v6.41 . First, the effect of pooling parents and offspring was tested to see if it increased Wright’s inbreeding coefficient (FIS) for each population. Due to a significant increase in FIS in several populations, analyses of population genetic variation involve only parents. Each microsatellite locus was tested for deviation from Hardy-Weinberg equilibrium (HWE) using chi-square tests. Due to the large number of tests (n = 99), the nominal level of statistical significance (α = 0.05) was adjusted by Dunn-Šidák correction (1−(1−α)1/n) to 0.00052. The number of alleles (NA), observed (HO) and expected heterozygosity (HE), and fixation index (F = 1−(HO/HE)) were averaged across loci for each population. Using the combined microsatellite data from the maternal parent and offspring, the Multi-locus Mating System Program (MLTR; ) was employed to calculate the maternal parent inbreeding coefficient (Fmat) and the biparental inbreeding rate (Tm−Ts). The biparental inbreeding rate is estimated here as the difference between the multi-locus population out-crossing rate (Tm) and the single locus out-crossing rate (Ts).The spatial pattern of microsatellite variation was examined in order to identify distinct population clusters and the relationship between genetic distance and geographic distance. A principal coordinate analysis (PCoA) was first employed to summarize microsatellite genetic variation among all individuals. The eigenvectors of the PCoA were calculated from a covariance matrix with data standardization using the program GENALEX. The clustering of individuals from each population was examined based on the first two principal coordinates. A Bayesian clustering algorithm implemented in the program TESS v2.3.1 was also employed, using the admixture model with correlated allele frequencies to account for any migrants in the dataset, following recommendations of Francois and Durand . TESS was run by setting the cluster (“k”) value incrementally from two to seven with 20 independent runs at each k value. A burn-in period of 75,000 sweeps was followed by MCMC sampling for 500,000 sweeps. The optimal k value was determined by examining the deviance information criterion (DIC) and the 20 independent runs at this value of k were summarized using the program CLUMPP with the Greedy algorithm. The program DISTRUCT was used to graphically display the output.Microsatellite genetic data were also examined for genetic isolation by distance (IBD) and compared to chemical distance. A matrix of genetic distance was compared to Euclidean and log-transformed Euclidean geographic distance. Genetic distance was also compared to a matrix of untransformed and log-transformed absolute difference of oridonin amounts (% dry wt.) in both parents and offspring. A mantel test of matrix correspondence was conducted using GENALEX, with statistical significance assessed after 999 permutations. […]

Pipeline specifications

Software tools Geneious, MUSCLE, Jalview, Mesquite, jModelTest, PAUP*, RAxML, MrBayes
Applications Phylogenetics, Nucleotide sequence alignment
Diseases Neoplasms
Chemicals Diterpenes