Computational protocol: Species Detection and Identification in Sexual Organisms Using Population Genetic Theory and DNA Sequences

Similar protocols

Protocol publication

[…] DNA sequences were downloaded from GenBank, except for liverworts for which Jochen Heinrich provided sequences already aligned. Alignments were checked and sequences trimmed when necessary in MacClade , after which PAUP* was used to make phylogenetic trees which were visualized in PAUP* and Dendroscope . When it was necessary to correct sequence differences for multiple hits, evolutionary models were selected with ModelTest . Then pairwise distances were listed in PAUP* and copied into Excel spreadsheets for calculation of K/θ ratios.Using the K/θ ratio to determine the probability that two samples come from different evolutionary species involves six steps, of which numbers 2–5 are calculated by formulae pasted in the Excel spreadsheet:Find statistically well-supported pairs of sister clades using standard phylogenetic distance methods such as bootstrapped neighbor-joining, maximum likelihood, or Bayesian inference. I use neighbor-joining trees with ≥70% bootstrap support. Each pair is then tested separately, starting at the tips of the tree, to determine the probability that the clades are samples from independently evolving species; this continues until species are found in the following steps.For each clade in a pair, estimate nucleotide diversity π by the mean pairwise difference d between sequences multiplied by the sample size correction n/(n-1), where n is the number of sequences in the clade.For each clade, estimate θ = 2Neμ by π/(1–4π/3) . When d = 0, we used a non-zero estimate of π by assuming that one pairwise difference is not zero but instead is 1/L where L is the sequence length; then π = 2/Ln(n-1).Calculate K = mean pairwise sequence difference between the two clades, corrected for multiple hits.Calculate K/θ for the pair of clades. The values of θ for the two clades may differ, in which case I use the larger value of θ to get a conservative estimate of K/θ.Using the K/θ ratio and the numbers n1 and n2 of individuals in the two clades, find the probability that the individuals were sampled from populations that have been evolving independently long enough to become reciprocally monophyletic. This is best done using a table available on request from Noah Rosenberg or me; altenatively the values can be estimated from Figure 6 in . For clades that are members of a non-bifurcating tree such as a polytomy (A, B, C) or ladder ((A,B)C), I compare A and B first, then compare C to whichever one of those is closest to D. […]

Pipeline specifications

Software tools MacClade, Dendroscope, ModelTest-NG
Application Phylogenetics
Organisms Panthera pardus, Marchantia polymorpha, Homo sapiens