Computational protocol: Ancient DNA from Giant Panda (Ailuropoda melanoleuca) of South-Western China Reveals Genetic Diversity Loss during the Holocene

Similar protocols

Protocol publication

[…] Each fragment obtained was compared to the sequences available in GenBank using NCBI’s online Nucleotide BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi, default settings) in order to validate the data. The newly determined DNA sequences were deposited in GenBank (Accession Nos. KF386262-3, KP306766-73).We also downloaded short read data of 49 modern giant pandas from the European Nucleotide Archive (ENA), accession number SRA053353 []. We used Cutadapt v1.4.2 [] to trim Illumina adapter sequences from all reads and discarded sequences shorter than 30 bp. We used Flash v1.2.10 [] to merge paired end reads and selected all non-overlapping pairs of trimmed forward and reverse reads, which were then mapped against a reference mitochondrial genome (GenBank accession number of FM177761.1) by using the “aln” and “sampe” algorithms in BWA v0.7.8-r455 with default parameters []. We used SAMtools v0.1.19-44428cd [] to exclude sequences with a map quality score less than 30 (“view”), sort the alignment (“sort”) and collapse sequences with identical 5’ mapping coordinates (“rmdup”). Finally, a consensus sequence was generated for each individual, based on maximum effective read depth [] using ANGSD v0.916 []. Besides assembling the NGS short reads, we retrieved three assembled giant panda mitochondrial genomes from GenBank [,,] which, together with the sequences from our two individuals, provided a total of 54 giant panda mitochondrial sequences. The sequences were aligned with ClustalX v1.81 [] and the alignments were carefully checked manually. Haplotype identification was carried out by inputting all the variable nucleotide sites among the 54 combined sequences to DnaSP6 []. In addition, we aligned the panda sequences with those from 200 additional representatives (4 American black bears, 7 Asian black bears, 2 sloth bears, 120 brown bears, 30 cave bears, 2 spectacled bears, 32 polar bears, 2 sun bears, and 1 American giant short-faced bear) of the family Ursidae [] and one dog (Canis lupus familiaris, GenBank accession number of AB499817.1) as an outgroup for sequence alignment and further phylogenetic analyses (). [...] Phylogenetic relationships among the 54 sampled giant panda sequences, as well as their position within the larger Ursid phylogeny, were inferred by maximum likelihood in RAxML-HPC v.8 [] on the CIPRES portal []. Canis lupus familiaris served as an outgroup to root the tree. The D-loop sequences were not used for this analysis due to the possibility of saturation and ambiguous alignment of divergent intergeneric sequences. Optimal alignment partitions were selected from all possible combinations of 12s, 16s and the individual codon positions of the protein coding cytb and ND1 genes under the Bayesian Information Criterion using PartitionFinder v1.1.1 [] with the greedy search algorithm and linked branch lengths. Only the GTR + G substitution model was considered. Five hundred rapid maximum likelihood bootstrap replicates were carried out using the GTR + CAT substitution model for each partition, which approximates the GTR + G substitution model, with a final maximum likelihood search for the best scoring tree (-f a option). To further compare ancient giant panda and previously analyzed extant giant panda haplotypes, we aligned our newly obtained D-loop sequences against the 655 bp D-loop data set from previous studies [,]. We used this alignment to reconstruct a median-joining network and thus to investigate the phylogeographical relationship among haplotypes based on maximum-parsimony as implemented in NETWORK v4.6.1.0 []. [...] Changes in female effective population size over the timespan of the mitochondrial phylogeny were inferred by Bayesian skyline plot analysis [] in BEAST 1.8.2 []. The skyline plot was time-calibrated using the calibrated radiocarbon dates of the two ancient sequences and by applying an informative prior on the per-lineage substitution rate.The substitution rate for the panda lineage averaged across the 12s, 16s, cytb and ND1 gene sequences was estimated by aligning sequences from seven living and two extinct representatives of the Ursidae to the sequence of a modern giant panda and computing a time-calibrated phylogeny. Four node calibration priors based on fossils and other evidence were applied, following the approach used in a recent investigation of the Ursidae []. The monophyly of each descendent clade was enforced for the analysis. These node calibrations comprised (see [] and references therein):A uniform prior on the basal divergence of the Ursidae with an upper limit of 12 million years, representing the age of a well-documented fossil representative of the Ailuropodinae, and a lower limit of 20 million years based on a previous molecular dating study.A uniform prior on the divergence of the Tremarctinae and Ursinae clades of 7 to 14 million years, based on the fossil Tremarctine bear Plionarctos.A uniform prior on the basal divergence of the Ursinae of 4.3 to 6 million years, based on reported ages of Ursus minimus.A uniform prior on the divergence of brown and polar bears of 0.48 to 1.1 million years, based on previous studies. This prior was applied to the common ancestor of a polar bear and a Finnish brown bear, likely representing the initial divergence of these respective species, rather than more recent gene flow events.A uniform prior on the basal divergence of the Ursidae with an upper limit of 12 million years, representing the age of a well-documented fossil representative of the Ailuropodinae, and a lower limit of 20 million years based on a previous molecular dating study.A uniform prior on the divergence of the Tremarctinae and Ursinae clades of 7 to 14 million years, based on the fossil Tremarctine bear Plionarctos.A uniform prior on the basal divergence of the Ursinae of 4.3 to 6 million years, based on reported ages of Ursus minimus.A uniform prior on the divergence of brown and polar bears of 0.48 to 1.1 million years, based on previous studies. This prior was applied to the common ancestor of a polar bear and a Finnish brown bear, likely representing the initial divergence of these respective species, rather than more recent gene flow events.Optimal partitions and substitution models were selected as described for the maximum likelihood phylogenetic analysis, considering all substitution models available in BEAUti v.1.8.2 (part of the BEAST v1.8.2 distribution []). A lognormal relaxed-clock model was utilized, with an uninformative uniform prior on the mean substitution rate of 0 to 100 percent per million years, both representing implausibly extreme values. The default exponential prior was retained for the ucld.stdev parameter. A speciation birth-death process was used for the tree prior. The Markov Chain Monte Carlo (MCMC) ran for sufficient length to achieve convergence and adequate sampling (ESS > 200) of all parameters as determined using the program Tracer v1.6 []. The posterior sample of the lineage-specific substitution rate for the Ailuropoda lineage was extracted from the posterior sample of trees using TreeStat v1.8.2 and assessed in Tracer.Demographic analysis of the 54 sampled giant panda sequences was carried out using a piecewise constant coalescent Bayesian Skyline model with 10 groups. The giant panda D-loop sequences were included in this analysis as they could be aligned unambiguously and substantial saturation is unlikely at the population level. As we failed to recover several sections of the D-loop from the ancient samples, all homologous positions of the D-loop alignment containing missing data were removed. To facilitate the application of the substitution rate for the Ailuropoda lineage estimated by the calibrated Ursidae analysis, optimal substitution models were selected for the partitions selected for that analysis with the addition of the D-loop as a separate partition. A single relaxed-clock model was then used for the 12s, 16s, cytb and ND1 partitions, with a normal prior applied to the per lineage substitution rate approximating the posterior estimate generated by the calibrated Ursidae analysis. A separate, unlinked relaxed-clock model was specified for the D-loop, with the substitution rate estimated within an uninformative, uniform prior between 0 and 2 × 10−7 substitutions per site per year. Default exponential priors were used for the ucld.stdev parameters of both relaxed-clock models. The MCMC chain was run as described previously, and the maximum clade credibility tree was extracted and annotated with relevant summary statistics using TreeAnnotator, with node heights scaled to the median of the posterior sample. The Bayesian skyline reconstruction was generated in Tracer. […]

Pipeline specifications

Software tools cutadapt, BWA, SAMtools, ANGSD, Clustal W, DnaSP, RAxML, CIPRES Science Gateway, PartitionFinder, BEAST, TreeStat
Application Phylogenetics
Organisms Ailuropoda melanoleuca
Diseases Giant Cell Tumors