Similar protocols

Protocol publication

[…] Sequences were edited and aligned manually using the BioEdit program version . In case of ITS sequences, the electroferograms were carefully inspected for intra-individual polymorphic sites (IPS) having more than one signal (cf ). These were labelled with NC-IUPAC ambiguity codes. Polymorphic positions within ITS sequences, in which both bases were detected also separately in different accessions elsewhere in the alignment, were considered additive polymorphic sites (APS) .We believe that the observed intra-individual variation did not arise from PCR errors, because numerous accessions of P. hieracioides and also other taxa possessed homogenized ITS sequences. The geographic pattern of homogeneous and heterogeneous ITS sequences was obviously not random, indicating a genuine variation rather than erroneous signals caused by inaccurate amplification. Although the intra-individual ITS variation might be revealed by cloning procedure, this approach, on the other hand, may result in amplification of numerous ITS sequence types which do not represent relevant variability . Incomplete concerted evolution and/or recombination, processes operating on the multiple-copy regions like ITS, increase the number of unwanted sequence types making the ITS phylogenetic analysis even more complex. Furthermore, with our current sampling involving hundreds of accessions exhibiting intra-individual ITS polymorphism, the cloning procedure would be out of the reasonable solution. As long as most of the sequences were possible to obtain by direct sequencing and polymorphic sites could be designated by IUPAC ambiguity codes, we decided to separately analyse two ITS datasets, one including and one excluding the individuals with APS (see below).Altogether we generated three alignments: the first one was designated as ITS_1 and involved 117 sequences without APS (. ITS_1). The second one was designated as ITS_2 and comprised 216 sequences coming from all accessions analysed for the ITS region and including all sequences with APS (. ITS_2). The second alignment was constructed after considering that the intra-individual polymorphic sites, especially those with APS, may significantly influence the hierarchical structure of the phylogenetic trees.Finally, the third cpDNA alignment comprised all plastid sequences, in particular 219 accessions (. cpDNA). In order to detect possible incongruence between particular markers, the incongruence length difference (ILD) test , as implemented in the “partition homogeneity” test of PAUP* (1000 replicates, Mulpars off, outgroups included), was employed. Although topologies of the ITS and cpDNA trees were shown to be incongruent, statistical support especially at their basal nodes was comparatively poor, causing their main branching patterns to be unsupported. Therefore, we decided to take the advantage of synergistic effect of combining datasets and generated also concatenated alignments. The concatenated alignment, ITS_cpDNA, contained 117 ITS sequences without APS and corresponding cpDNA sequences (. ITS_cpDNA).Indels were not coded separately but treated as missing data. The aligned datasets were analysed independently using the following phylogenetic approaches:Maximum parsimony (MP) phylogenetic analysis was performed with the heuristic search option in PAUP* version 4.0b10 . The following settings were utilised: accelerated character transformation (ACCTRAN), gaps treated as missing data, single-site polymorphisms determined uncertainties, tree construction with stepwise addition, 1000 bootstrap replicates with random taxon addition, tree bisection-reconnection (TBR) branch swapping, and retention of multiple trees found during branch swapping (MULTREES option in effect). The identical sequences were merged in McClade version 4.0 PPC to reduce computation time. Clade support was calculated via bootstrap analyses using 10000 re-samplings done with the fast heuristic search in PAUP*. Bootstrap support was categorized according to the following criteria: strong (>85%), moderate (70%–85%), weak (50%–69%), or poor (<50%).Bayesian inference (BI) was run in MrBayes version 3.1.2, using the Markov Chain Monte Carlo algorithm (MCMC) . Bayesian analyses were performed on the CIPRES Portal version 1.15 . Prior to Bayesian analyses, the most appropriate nucleotide substitution models were chosen, using the Akaike Information Criterion (AIC) in jMODELTEST version 0.0.1 , . Evolutionary models were calculated for each part of the datasets separately. Specifically, the ITS datasets contained partitions corresponding to the ITS1 and ITS2 spacers and the 5.8S rRNA gene, while concatenated datasets included besides the three ITS region partitions also a fourth partition represented by the cpDNA region. The following models or model combinations were found to be the most appropriate for the datasets studied: (1) ITS_1 and ITS_2 datasets – the SYM + G model for ITS1 and ITS2 sequences and the K80 model for 5.8S rRNA gene sequences; (2) cpDNA dataset – the TVM + I + G model; (3) concatenated ITS_cpDNA – the SYM + G model for ITS1 and ITS2, the K80 model for the 5.8S rRNA gene sequences, and the TVM + I + G model for the cpDNA partition.All BI analyses were run with four independent Metropolis-coupled MCMC chains (three heated and one cold chain) for ten to twenty five million generations and sampled every 1000th generation. The first 25% of sampled trees were regarded as ‘burn-in’ trees and were discarded prior to reconstruction of a 50% majority-rule consensus tree. Stationarity was confirmed by checking convergence diagnostic parameters. Specifically, the average standard deviation of split frequencies was lower than 0.01 in all cases; the plots of generations versus log probability of the observed data showed no obvious trends; and the Potential Scale Reduction Factor (PSRF) approached 1. Finally, topologies and node posterior probability values were compared among the runs. The topologies were stabilized among all datasets with only minute differences in branching pattern of terminal clades. Nodes with posterior probability (PP) values of 0.90 and above were regarded as significant and those with PP values below 0.90 regarded as non-significant.3. Net-like approaches were used to identify and display potential contradictory signals in the datasets. All alignments including ITS_2 containing intra-individual polymorphic sites were analysed using the neighbour-net analysis of in SplitsTree version 4.10 with uncorrected P-distance and default settings. To visualize the relationships among cpDNA haplotypes and to detect possible ancestral polymorphism, the cpDNA dataset was subjected to haplotype network analysis based on the parsimony method of using TCS version 1.21 limited to 30 steps of parsimonious connection in creating the network.4. In order to precisely specify the amount of phylogenetically informative signal in the datasets, we analysed spectrum of supporting nucleotide positions. There are three groups of supporting positions recognized by Wägelle and Rödding : (1) symmetrical or binary positions have two different character states in functional outgroup and ingroup and thus support both group of a split equally; (2) asymmetrical positions support only one group which possesses the same nucleotide at particular position, while the other group harbours different and more than one character state at this position; (3) noisy positions include same character states present in all sequences of the functional ingroup but also at least in one sequence of the functional out-group and thus represent convergences or chance similarities between ingroup and outgroup, or alternatively ingroup autapomorphies.5. We also apply a Bayesian coalescent-based approach to estimate a species tree employing *BEAST as implemented in the program BEAST version 1.7.4 . Two input files, one with 117 homogeneous ITS sequences (. ITS_1) and the second one containing corresponding cpDNA sequences, were used for the BEAST analysis. Populations assigned to particular taxa were used as OTU's. The input file for *BEAST was created in BEAUti version 1.7.4, with the following settings: two data partitions (corresponding to the two loci), the best-fit evolutionary model for each partition as determined by jMODELTEST, uncorrelated lognormal clock, a Yule process model for the species tree prior, and other parameters as default. Four independent MCMC analyses were run each for 120 million generations, sampling every 1000th generation. Another MCMC analysis was run with settings suitable for a subsequent JML analysis (see below), i.e. with piecewise constant population size model and 40 million generations sampling every 1000th generation. The computer program Tracer version 1.5 was used to check convergence of all parameters to the stationary distribution in each run and TreeAnnotater version 1.7.4 was employed to set the burn-in (discarding the first 30000 trees) and to calculate the maximum clade credibility tree.6. We performed statistical tree topology tests on ML gene trees inferred from the ITS_1, cpDNA and ITS_cpDNA alignments to find out whether discrepancies between topologies shown in particular gene trees and coalescent species tree are statistically significant. To this end, looking at the topology of the coalescent species tree, we enforced the following constrains on gene trees: basal position of P. olympica within the subsection Hieracioides; sister relationship of P. nuristanica and P. japonica; and monophyly of P. hispidissima and P. hieracioides (). Constrained trees were built under the evolutionary substitution model as specified for each alignment above, using the maximum likelihood (ML) criterion and heuristic search with TBR swapping algorithm and 10 random sequence addition replicates. The site-wise likelihoods for the best unconstrained ML tree and all constrained trees were calculated in the computer program raxmlGUI version 1.3 and consequently were compared using the approximately unbiased, weighted Shimodaira-Hasegawa, and weighted Kishino-Hasegawa tests as implemented in the computer package CONSEL version 0.1j –. A p-value of <0.05 was chosen for rejection of the null hypothesis that the log likelihoods of the constrained and best unconstrained trees are not significantly different.7. To test whether hybridization influenced species relationships and could be the source of gene tree incongruence, we employed the program JML . This software uses a posterior distribution of species trees, population sizes and branch lengths to simulate replicate sequence datasets under the coalescent with no migration. The minimum pairwise sequence distance between sequences of two species is evaluated on the simulated datasets and compared to the one estimated from the original data (i.e. from the ITS or cpDNA dataset). This procedure, the posterior predictive test, is a good predictor of hybridization events that disturb the bifurcating species tree model. Two separate JML analyses were run to simulate sequence replicates in the ITS and cpDNA datasets. For these analyses, 40,000 species trees resulting from the JML-specified *BEAST analysis were used. Settings for particular simulations involved: (1) relative mutation rate as inferred from the log file generated during the *BEAST analyses (set to 1.016 and 0.385 for the ITS and the cpDNA simulation, respectively); (2) heredity scalar (2 and 1 for the ITS and cpDNA simulation, respectively); and (3) appropriate model of sequence evolution for both markers. In each analysis, 9,000 trees were removed as burn-in and every 10th tree was used for simulations. Based on the original sequence data files, minimum pairwise sequence distances between all pairs of species and exact probabilities of observing these distances in simulations under assumption of no migration were calculated. All pairwise sequence distances with p-value <0.05 were recorded as potential cases of hybridization. […]

Pipeline specifications

Software tools BioEdit, MrBayes, CIPRES Science Gateway, jModelTest, SplitsTree, BEAST, raxmlGUI, CONSEL
Application Phylogenetics
Organisms Physiculus japonicus