Computational protocol: Short-wavelength sensitive opsin (SWS1) as a new marker for vertebrate phylogenetics

Similar protocols

Protocol publication

[…] Sixty two vertebrate SWS1 opsin nucleotide and amino acid sequences were retrieved from GenBank, with accession numbers for all sequences used in the analyses presented here provided in Table . SWS1 coding sequences range in length from 1005 (salmonids) to 1056 (pig) nucleotides, with very few indels (only 6 indels in complete coding sequences in the entire alignment; see Table and ). All SWS1 opsin genes identified so far have four introns at highly conserved homologous positions (located at amino acid positions 120, 176, 231, and 311 in the macaque sequence []). The first two introns are generally short, ranging in length from 70–76 bp in fish (Dimidiochromis compressiceps), to 283–324 bp in mammals (Macaca fascicularis); whereas the second two introns tend to be longer (120–143 bp in D. compressiceps, 627–979 bp in M. fascicularis) [,]. Only one copy of SWS1 has been found in all taxa investigated so far, with the exception of the smelt (Plecoglossus altivelis), which may be due to a unique duplication specific to this lineage of fish []. Only one smelt sequence was included in our analyses, as investigations including the second sequence showed it to be strongly monophyletic with the first, and had no other effect on the phylogeny (results not shown).Sampling within the vertebrate groups was as follows: one lamprey (Geotria australis), 17 actinopterygians (all of which were teleosts); four lissamphibians (referred to in the text as amphibians); 13 birds; three squamates; and 23 mammals (Table ). The amino acid sequences were aligned using ClustalX [], ). This amino acid alignment was then used to produce an equivalently aligned nucleotide sequence alignment. [...] Phylogenetic analyses were performed using PAUP*v4b10, [] for the maximum parsimony (MP) and likelihood (ML) methods, and MrBayes version 3.1 [] for the Bayesian analyses. For the MP analysis all characters were assigned equal weight. Heuristic searches, with random addition of taxa and TBR branch swapping, were performed with 10000 random-addition sequences. A strict consensus tree was calculated from the equally most parsimonious trees found. To assess support for internal branches, bootstrap analyses [] of 1000 replicates with 10 random-addition sequences for each replicate, were performed.ModelTest [] was used to perform a series of nested likelihood ratio tests in order to determine which nucleotide model of those tested best fit the data. This model was then used in subsequent model-based phylogenetic analyses such as likelihood and Bayesian analyses. Heuristic ML analyses were conducted with TBR branch swapping (10 random addition replicates), as well as bootstrap analyses with 100 replicates in order to assess the robustness of the clades recovered []. The Bayesian analyses were run for two million generations with default priors, sampling the chains every 100 generations. To ensure that our analyses were not trapped in local optima, four independent Markov Chain Monte Carlo (MCMC) runs were performed (with default heating values). Stationarity was assumed when the cumulative posterior probabilities of all clades stabilized. The first 5000 trees were considered 'burn-in' and discarded, and the remaining trees were saved. The associated Bayesian posterior probabilities were calculated from the sample points after the MCMC algorithm started to converge. [...] Parameters such as base frequencies, substitution rate frequencies, among site rate variation (α), and invariant sites (I) were all estimated on the ML phylogeny using maximum likelihood methods under the GTR+I+Γ model [-] as implemented in PAUP*. Chi-squared tests of base compositional homogeneity were also implemented in PAUP* []. Since estimates of invariant sites (I) can be problematic, particularly in reduced data partitions due to insufficient data [], the number of invariant sites was therefore also calculated by simple counts of the observed number of constant sites in our data set, as implemented in MEGA3 []. […]

Pipeline specifications

Software tools Clustal W, MrBayes, ModelTest-NG, PAUP*, MEGA
Application Phylogenetics