Computational protocol: Influence of Introgression and Geological Processes on Phylogenetic Relationships of Western North American Mountain Suckers (Pantosteus, Catostomidae)

Similar protocols

Protocol publication

[…] Sequences were edited using Chromas Lite 2.0 (Technelysium, Tewantin, Queensland, Australia) and imported into BioEdit 7.0.5.2 . Sequences coding for amino acids were aligned by eye and checked via amino acid coding in MEGA 4.0.2 to test for unexpected frame shift errors or stop codons. Editing resulted in 8055 base pairs representing the complete sequence for the nine genes, plus 22 bp of COIII. Phylogenetic analyses were performed under maximum likelihood (ML) using RAxML 7.2.8 , by bootstrapping with 1000 replicates with the final best ML tree being calculated using the GTRGAMMA model on the CIPRES cluster at the San Diego Supercomputer Center. Maximum parsimony (MP) analysis was conducted with PAUP* 4.0b10 using a heuristic search with 1000 random additions and TBR branch swapping. Tree lengths reported for MP include both informative and uninformative characters. Robustness of nodes for MP was estimated by bootstrapping with 1000 replicates using a heuristic search with 10 random additions and TBR branch swapping. The ML tree from the DNA analysis presented in this study was deposited in TreeBASE, accession number TB2:S15337, (http://purl.org/phylo/treebase/phylows/study/TB2:S15337). Average between-species genetic distances were calculated based on the proportion of shared differences (p-distance) using MEGA for each lineage within Pantosteus, Catostomus as a whole, and then the remaining outgroups as a whole. [...] Eighty-seven morphological characters and eight fossil occurrences were taken from . Calibration of the rate of evolution requires fossils to be assigned to specific branches based on synapomorphies shared by the fossils and recent taxa. This is achieved by cladistic interpretation of apomorphic characters for each fossil. These taxa are then represented as terminal taxa on specific branches. Confidence limits of estimates of times of cladistic branching have two primary sources of error: (1) fossils are unlikely to represent a lineage soon after its divergence from its sister lineage , and (2) both sister lineages do not necessarily possess new apomorphies of the lineages immediately after their initial genetic isolation. Fossils of both lineages should ideally contribute to estimation of age of a node, but both are rarely available in the record. For these reasons it is not obvious which end of the stem branch of a crown group should be assigned the age represented by the fossil. We choose to correct the age of the oldest fossil bearing synapomorphies of a lineage in question with the method of , which uses the density of the record of other fossils from that lineage to estimate the age of origin of the stem branch. The number of million-year time horizons or independent localities in which fossils of the lineage occur are applied in an equation that uses these data to estimate the probability that the oldest fossil represents the first (unobserved) appearance of the lineage :where a is the confidence interval as a fraction of the total known stratigraphic range, C 1 is the 50% or 95% confidence level, and H is the number of known fossil horizons. The 50% interval is chosen as the mid-point of the distribution for calculation of the node calibration points. These were entered into the BEAST analyses. The ages of the oldest fossils, their identities, apomorphies, calibration points, locations, number of horizons, stratigraphy, and catalog numbers are given below, following best practice recommendations by .An important assumption is that the fish of interest could have lived in a depositional environment below the oldest fossil, so that the absence of fossils possessing synapomorphies of a specific lineage can be interpreted as absence of the lineage at that time. The probability that the earliest fossil occurrence represents the first occurrence following cladogenesis is then used to correct the age estimate. The corrected age is applied to the node joining the taxa bearing the morphological synapomorphies supporting their sister-group relationship ().BEAST 1.7.1 was used to estimate molecular divergence times of mtDNA lineages based on corrected fossil age estimates. We generated input files using BEAUti 1.7.1. The dataset was trimmed to single representatives per species/lineage because having a mix of within- and between-species data complicates dating owing to different processes for estimating within- versus between-species rates (S. Ho, pers. comm.). The analysis used an uncorrelated lognormal relaxed molecular clock with rate variation following a tree prior using the speciation birth-death process, a GTR+I+G model (identified using the AIC in Modeltest 3.7, ). The topology was constrained to match the ML results.Calibrations were based on eight time horizons , with age of occurrence estimated for two nodes with two date estimates in each based on . These two dates, 5.5 Ma and 12.7 Ma were given a log-normal prior with a standard deviation of 1. BEAST analyses were run for 50 million generations, with parameters logged every 10000 generations. Multiple runs were conducted to check for stationarity and to ensure that independent runs were converging on a similar result. The log and tree files from four runs were combined in LogCombiner 1.7.1 with a 10% burn-in. The combined logfile was examined in Tracer 1.5, while the combined treefile was summarized using TreeAnnotator 1.7.1 with the mean values placed on the maximum clade credibility tree.The diagnoses of recent and fossil forms with specimen disposition for taxa in the morphological phylogenetic analysis are detailed in . Identities, localities, and curatorial data for 347 western catostomin skeletons can be accessed at the University of Michigan Museum of Zoology Fish Division website. The morphological tree was estimated with PAUP, using 87 partly new osteological and morphological traits defined and listed in matrix format in . Inferred instances of introgression are plotted on the tree, based on observed scatter of non-congruent morphological and molecular character states. Alternative hypotheses (such as convergence and retained polymorphisms) were examined by contrasting molecular and morphological trees. High bootstrap support values were used to identify strongly supported nodes in each analysis. Conflict between nodes on these two trees allowed rejection of alternative hypotheses. […]

Pipeline specifications

Software tools BioEdit, MEGA, RAxML, PhyloWS, BEAST, ModelTest-NG, PAUP*
Databases TreeBASE
Application Phylogenetics