Computational protocol: Embracing discordance: Phylogenomic analyses provide evidence for allopolyploidy leading to cryptic diversity in a Mediterranean Campanula (Campanulaceae) clade

Similar protocols

Protocol publication

[…] Quality filtering of Illumina reads was carried out using cutadapt (Martin ) and sickle (Joshi ) to remove adapter sequences and trim low‐quality nucleotides. Default parameters were used. The HybPiper pipeline (v.1.0; Johnson et al. ) was then used to assemble loci. This pipeline uses BWA (Li and Durbin ) to align reads to target sequences and SPAdes (Bankevich et al. ) to assemble these reads into contigs. If multiple contigs that contained sequences representing at least 75% of the original bait length were found, these were flagged as potential paralogs and all copies were removed from downstream analyses. A further filtering step was conducted by manual inspection of gene trees (see Phylogenetic analysis below) to remove paralogous loci that may have been missed by the first filtering step.Consensus contigs were aligned to the original probe sequences. The resulting loci were not trimmed to the original probe length, however. This allowed the sequences to extend into putative intronic regions. After quality filtering and removal of potential paralogous loci, 130 loci contained orthologous sequences for all 109 sampled taxa (no missing data).Plastomes were assembled in a similar way, using Trachelium caeruleum (Haberle et al. ) as a reference. Aligned contigs were trimmed to the plastome reference length. [...] Individual gene and plastome alignments were constructed using MAFFT (v.7.245; Katoh et al. , ). Plastomes were considered as a single locus for all subsequent analyses. We estimated individual nuclear gene trees as well as a concatenated phylogeny using maximum likelihood (ML) with the program RA × ML (v.7.3.2; Stamatakis ). The ML searches were run using 10 distinct starting trees and 1000 bootstrap replicates to measure support. PartitionFinder (v.2.0.0; Lanfear et al. ) was used to infer the optimal partitioning schemes and models of molecular evolution for the alignments using the rcluster search option.Initial results indicated a number of samples with inconsistent and contradictory phylogenetic placement (referred to as rogue taxa) present in the nuclear dataset. We used Rogue NaRok (Aberer et al. ) to identify such OTUs. This analysis, optimized for support using a majority rule consensus threshold, identified four individuals of C. erinus as rogue samples. These accessions were removed from all datasets and the ML analyses were rerun as above. [...] The relatively young age of the C. erinus complex suggests lineage sorting has the potential to confound results from concatenation approaches (Crowl et al. ). We, therefore, utilized recently developed coalescent methods to estimate a species tree for the clade in this study.ASTRAL‐II (v.4.10.0; Mirarab and Warnow ), which estimates the species tree that maximizes the number of shared quartet trees given a set of gene trees, has been found to be consistent and accurate in simulations compared to alternative coalescent approaches (Mirarab et al. ). The 130 ML gene trees (best trees) inferred using RA × ML were used as input, and local posterior probabilities were estimated to provide support for relationships. With respect to individuals being assigned to “species” in the allele table, two approaches were taken: (1) A population tree was estimated by assigning individuals to separate populations; (2) C. erinus individuals were assigned to eastern‐Mediterranean (octoploid) and western‐Mediterranean (tetraploid) lineages while C. drabifolia and C. creutzburgii populations were kept separate, as suggested by the ML analyses.Additionally, we used SVDquartets (Chifman and Kubatko ) implemented in PAUP* (v.4.0a147; Swofford ) to verify results generated by ASTRAL‐II. A coalescent approach originally intended for SNP data, SVDquartets has been shown to perform well on multilocus datasets despite violating the assumption that sites are independent (Chifman and Kubatko ). We used the concatenated nuclear data matrix as input, evaluated 100,000 random quartets, and assessed support using 100 bootstrap replicates. [...] Though current species‐tree methods assume no migration between populations (Heled and Drummond ; Bryant et al. ), concordance analyses can still recover primary phylogenetic signal in the presence of gene flow (Larget et al. ). We, therefore, summarized topological concordance among loci using BUCKy (v.1.4.4; Baum ; Ane et al. ; Larget et al. ). Due to computational constraints, it was necessary to reduce our molecular dataset to the eight major lineages recovered in previous analyses. We chose individuals at random to represent the eight lineages. A second iteration of this, carried out using a different random sampling of individuals, verified the results were not affected by which representative samples were present in the dataset. Individual gene trees for this analysis were estimated using MrBayes (v.3.2; Ronquist et al. ). To test the impact of the discordance parameter (alpha), independent analyses were run using alpha = 1, alpha = 10, and alpha = 1000. All analyses were run with four Markov chain Monte Carlo (MCMC) chains for 1 million generations. Burn‐in was set to 10%. […]

Pipeline specifications

Software tools cutadapt, BWA, SPAdes, MAFFT, PartitionFinder, ASTRAL, SVDquartets, BUCKy, MrBayes
Applications Phylogenetics, Population genetic analysis
Organisms Corynebacterium simulans