Computational protocol: Infrageneric Phylogeny and Temporal Divergence of Sorghum (Andropogoneae, Poaceae) Based on Low-Copy Nuclear and Plastid Sequences

Similar protocols

Protocol publication

[…] We sampled 79 accessions of 28 species in Sorghum –, covering the morphological diversity and the geographic ranges of five subgenera (), plus the monotypic genus Cleistachne, together with seven species in six allied genera as outgroups , . Seeds were obtained from International Livestock Research Institute (ILRI), International Crops Research Institute for the Semi-Arid Tropics (IS), and United States Department of Agriculture (USDA). Leaf material was obtained from seedlings and dry herbarium specimens deposited at CANB, IBSC, K, and US ( , –).Two LCN genes, phosphoenolpyruvate carboxylase 4 (Pepc4) and granule-bound starch synthase I (GBSSI), were chosen for this study. The housekeeping Pepc4 gene encodes PEPC enzyme responsible for the preliminary carbon assimilation in C4 photosynthesis , whereas GBSSI gene encodes GBSSI enzyme for amylose synthesis in plants and prokaryotes . These two LCN genes have been used for accurate phylogenetic assessments in Poaceae , . They are predominantly low-copy in Poaceae, making it possible to establish orthology and track homoeologues arising by allopolyploidy , . Based on genome-wide researches on cereal crops, these two LCN genes appear to be on different chromosomes , , thus each of the LCN markers can provide an independent phylogenetic estimation.Genomic DNA extraction by means of DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA) was undertaken in accordance with the manufacturer’s instructions. Two LCN markers were amplified using primers and protocols listed in , . PCR products were purified by the PEG method . Cycle sequencing reactions were conducted in 10 µL volumes containing 0.25 µL of BigDye v.3.1, 0.5 µL of primer, 1.75 µL of sequencing buffer (5×) and 1.0 µL of purified PCR product. For accessions that failed direct sequencing, the purified PCR products were cloned into pCR4-TOPO vectors and transformed into Escherichia coli TOP10 competent cells following the protocol of TOPO TA Cloning Kit (Invitrogen, Carlsbad, CA, USA). Transformed cells were plated and grown for 16 h on LB agar with X-Gal (Promega, Madison, WI, USA) and ampicillin (Sigma, St. Louis, MO, USA). We started with fewer colonies and picked more to ensure results, and eight to 24 colonies were selected from each individual via blue-white screening in order to assess allelic sequences and PCR errors , . Inserts were sequenced with primers T7 and T3 on the ABI PRISM 3730XL DNA Analyzer (Applied Biosystems, Forster City, CA, USA).Cloned sequences of nuclear loci were initially aligned with MUSCLE v.3.8.31 and adjusted in Se-Al v.2.0a11 (http://tree.bio.ed.ac.uk/software/seal/). Subsequently, the corrected clones were assembled into individual-specific alignments that were analyzed separately using a maximum parsimony optimality criterion with the default parsimony settings in PAUP* v.4.0b10 . The resulting trees were used to determine unique alleles present in each individual . Alleles were recognized when one or more clones from a given individual were united by one or more characters . After identifying all sequence clones for a given allele, the sequences were combined in a single project in Sequencher v.5.2.3 (Gene Codes Corp., Ann Arbor, Michigan, USA) and manually edited using a “majority-rule” criterion to form a final consensus allele sequence, and instances of PCR errors , were easily identified and never occurred in more than one sequence. Newly obtained consensus sequences of 62 Pepc4 alleles and 76 GBSSI alleles were submitted to GenBank (http://ncbi.nlm.nih.gov/genbank; ).Three plastid markers (ndhA intron, rpl32-trnL, and rps16 intron) were amplified and sequenced to estimate lineage ages in Sorghum. Primer sequences and amplification protocols for the plastid markers were listed in . PCR products were purified by the PEG method . Cycle sequencing reactions were conducted in 10 µL volume and were run on an ABI PRISM 3730XL DNA Analyzer. Both strands were assembled in Sequencher v.5.2.3. Sequence alignment was initially performed using MUSCLE v.3.8.31 in the multiple alignment routine followed by manual adjustment in Se-Al v.2.0a11. The Pepc4, GBSSI, and combined plastid matrices were submitted to TreeBASE (http://purl.org/phylo/treebase/phylows/study/TB2:S15625). [...] Each data set was analyzed with maximum likelihood (ML) using GARLI v.0.96 , and Bayesian inference (BI) using MrBayes v.3.2.1 . The substitution model for different data partitions was determined by the Akaike Information Criterion (AIC) implemented in Modeltest v.3.7 , and the best-fit model for each data set was listed in . ML topology was estimated using the best-fit model, and ML bootstrap support (MLBS) of internal nodes was determined by 1000 bootstrap replicates in GARLI v.0.96 with runs set for an unlimited number of generations, and automatic termination following 10,000 generations without a significant topology change (lnL increase of 0.01). The output file containing the best trees for bootstrap reweighted data was then read into PAUP* v.4.0b10 where the majority-rule consensus tree was constructed to calculate bootstrap support values.Bayesian inference (BI) analyses were conducted in MrBayes v.3.2.1 using the best-fit model for Pepc4 and GBSSI loci (). Each analysis consisted of two independent runs for 40 million generations; trees were sampled every 1000 generations, and the first 25% were discarded as burn-in. The majority-rule (50%) consensus trees were constructed after conservative exclusion of the first 10 million generations from each run as the burn-in, and the pooled trees (c. 60,000) were used to calculate the Bayesian posterior probabilities (PP) for internal nodes using the “sumt” command. The AWTY (Are We There Yet?) approach was used to explore the convergence of paired MCMC runs in BI analysis . The stationarity of two runs was inspected by cumulative plots displaying the posterior probabilities of splits at selected increments over an MCMC run, and the convergence was visualized by comparative plots displaying posterior probabilities of all splits for paired MCMC runs.The nuclear data were used to help determine bi-parental contributions, and multiple alleles were present for most polyploid taxa. Thus, the nuclear data cannot be combined with the plastid dataset, which provided the maternal phylogenetic framework. We rooted the Pepc4 tree using species of Apluda, Bothriochloa, Chrysopogon, Dichanthium and Sorghastrum as outgroups and rooted the GBSSI tree using species of Bothriochloa, Dichanthium, Microstegium and Sorghastrum as outgroups , because clean GBSSI sequences of Apluda and Chrysopogon could not be isolated in the laboratory. The appropriate choice of outgroups was confirmed by phylogenetic proximity (the monophyletic ingroup being supported), genetic proximity (short branch length being observed) and base compositional similarity (ingroup-like GC%; ) . [...] For molecular dating analyses using the plastid markers, a strict molecular clock model was rejected at a significance level of 0.05 (IL = 686.7024, d.f. = 60, P = 0.025) based on a likelihood ratio test . A Bayesian relaxed clock model was implemented in BEAST v.1.7.4 to estimate lineage ages in Sorghum. Three plastid markers were partitioned using BEAUti v.1.7.4 (within BEAST) with the best-fit model determined by Modeltest v.3.7 ().The Andropogoneae crown age was estimated at 17.1±4.1 Mya and within this confidence interval , although the most reliable fossils of subfamily Panicoideae were the petrified vegetative parts from the Richardo Formation in California now dated to be approximately 12.5 Mya –. Because the lineages may have occurred earlier than the fossil record , the Sorghum stem age was set as a normal prior distribution (mean 17.1, SD 4.1). A Yule prior (Speciation: Yule Process) was employed. An uncorrelated lognormal distributed relaxed clock model was used, which permitted evolutionary rates to vary along branches according to lognormal distribution. Following optimal operator adjustment, as suggested by output diagnostics from preliminary BEAST runs, two independent MCMC runs were performed with 40 million generations, each run sampling every 1000 generations with the 25% of the samples discarded as burn-in. All parameters had a potential scale reduction factor that was close to one, indicating that the posterior distribution had been adequately sampled. The convergence between two runs was checked using the “cumulative” and “compare” functions implemented in the AWTY . A 50% majority rule consensus from the retained posterior trees (c. 60,000) of three runs were obtained using TreeAnnotator v.1.7.4 (within BEAST) with a PP limit of 0.5 and mean lineage heights. […]

Pipeline specifications

Software tools MUSCLE, Se-Al, Sequencher, PhyloWS, GARLI, MrBayes, ModelTest-NG, AWTY, BEAST
Databases TreeBASE
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Sorghum bicolor, Homo sapiens