Computational protocol: Phylogenetic utility of ribosomal genes for reconstructing the phylogeny of five Chinese satyrine tribes (Lepidoptera, Nymphalidae)

Similar protocols

Protocol publication

[…] Sequence chromatogram was checked carefully using Chromas Pro software (Technelysium Pty Ltd., Tewntin, Australia). Each protein-coding sequence was translated for confirmation and assignment of codon positions in Primer Premier version 5.00 software (Premier Biosoft International, Palo Alto, CA). Multiple sequences were aligned using MAFFT version 7.037 with the auto strategy () and, if necessary, manual adjustment was made in MEGA version 6.06 (). Base frequency and the number of variable and parsimony informative sites were calculated in MEGA version 6.06 (). We investigated the chi-square of homogeneity of base frequencies across taxa for each gene with the program PAUP4.0b10 (). The aligned ambiguous regions of two non-coding ribosomal genes (i.e. 16s rDNA and 28s rDNA) were retained because these positions might contain some information that is potentially useful for phylogenetic reconstruction (; ). As proposed by , we performed tests of substitutional saturation based on the Iss (i.e. index of substitutional saturation) statistic for different partitioned dataset with DAMBE version 5.3.74 (). For this method, if Iss is smaller than Iss.c (i.e. critical Iss), we can infer that the sequences have experienced little substitutional saturation ().Maximum likelihood (ML) analysis was performed using the raxmlGUI version 1.3 interface () of RAxML version 7.2.6 (). The best-fit substitution model for each gene partition was determined by jModelTest version 2.1.4 () under the Akaike Information Criterion (AIC) (). Clade supports were assessed using the ML + rapid bootstrap algorithm with 1000 bootstrap iterations.Bayesian inference (BI) analyses were conducted in MrBayes 3.1.2 (). The best-fit partitioning schemes and partition-specific substitution models, defined from 16 subsets formed by gene and codon position of the six genes used, were tested using the ‘greed’ algorithm of program PartitionFinder v1.1 () under the Bayesian information criterion (BIC). Two independent MCMC runs were performed either for 300,000 generations or until the average standard deviation of split frequencies fell below 0.01. The sampling frequency was set as every 100 generations. After the first 25% of the yielded trees were discarded as burn-in, a 50% majority-rule consensus tree with the posterior probability (PP) values was constructed by summarizing the remaining trees. For BI analyses, two different datasets, the full six-gene-dataset and the non-COI + Cytb + COII-3rds-dataset (with 3rd positions removed), were used to examine the phylogenetic utility of the 3rd sites of COI + Cytb + COII, because these sites have suffered substantial saturation (see the results). [...] We used phylogenetic informativeness (PI) profiles to quantify the relative contribution of each partition to the resulted tree. The peak of the PI distribution is suggested to predict the maximum phylogenetic informativeness for corresponding partition (). The PI profiles were generated with the PhyDesign (; ). For this, the aligned sequences and an ultrametric tree are needed as input files. In the sequence file, the eight partition schemes identified by PartitionFinder v1.1 () were applied. The ultrametric tree was generated with the BEAST version 1.7.5 () using the eight partitions and corresponding models determined by PartitionFinder v1.1 () as well. […]

Pipeline specifications