Computational protocol: Population dynamics of rhesus macaques and associated foamy virus in Bangladesh

Similar protocols

Protocol publication

[…] DNA fragment sizes, allele frequencies, and observed (Ho) and expected heterozygosity (He) assuming equilibrium conditions were determined using Cervus v3.0.3. Missing data (no amplifications) were reported by loci but not considered in the analyses. We used a model-based clustering method for multilocus genotype data (13 microsatellites) to infer population structure and assign individuals to populations as implemented in Structure v2.3.3., The model assumes K populations (where K is inferred from the data) and each of which is characterized by a set of allele frequencies at each locus. Individuals are probabilistically assigned to populations, or jointly to two or more populations if their microsatellite data indicate that they are admixed. We evaluated the observed genetic diversity at different K values (K=2–10), and each K value was run independently 10 times with a burn-in period of 10 000 iterations followed by 10 000 iterations. Because individuals may have mixed ancestry, we applied the admixture model that assumes correlated allele frequency, in which each individual draws some fraction of his/her genome from each of the K populations. Structure Harvester Web v0.6.92 was used to compute Delta K values. R 2.15.1 (http://www.R-project.org) and ggplot2 (v0.9.1) were used to graphically display the results. [...] Sequences were aligned using MUSCLE v3.8.31; the relatively small number of apparent insertion/deletion events led to an alignment that was <0.3% gap. Amino-acid translation was performed using EMBOSS v6.4.0.0.A large phylogenetic tree was built from the entire gag nucleic acid sequence alignment using FastTree v2.1.3 with a Jukes–Cantor model. The tips of this tree were colored by sampling location using a python script together with the Bio.Phylo Biopython library. This colored tree was rendered using Archaeopteryx v0.957 beta (http://www.phylosoft.org/archaeopteryx/). Additionally, a NeighborNet splits network was constructed using SplitsTree using LogDet distances and the NeighborNet algorithm.Nucleic acid sequences were clustered using UCLUST v1.1 at a 97.5% identity, producing a total of 32 clusters. Three of these clusters (containing five sequences from animals MBG132, MBG189 and MBG110) clustered separately from other larger clusters as a result of G to A hypermutation (a more complete analysis in preparation for publication). When the putative hypermutation sites within these clusters were removed from the alignments, these sequences no longer clustered separately. Thus, these sequences were considered part of the clusters with which they clustered when these hypermutation sites were excluded; we define a viral ‘strain' in the context of this study to be one of these clusters. For each macaque, the number of unique strains to which that animal's sequences had been assigned was computed. Based on these counts, macaques were identified as either singly or multiply infected.When the resulting clusters were ordered by size, there were six that were substantially larger than the rest. Each of these six was uniquely associated with a geographic region, except for Bormi, which had two clusters associated with it. We named each of these strains by the geographic location where it dominated; in the case of Bormi, the two clusters were arbitrarily designated Bormi1 and Bormi2. This assignment of viral strains is used in the rest of the present paper, furnishing the labeling in .An iterative method was employed to detect recombinant strains among these major viral strains. Two representative sequences were selected from each strain (except four for the Dhamrai strain) for recombination analysis with GARD, using the HKY+Γ model. The trees generated from each non-recombinant alignment segment were inspected, and the strain with highest observed discord between trees was removed. This process was repeated until no recombination was detected within the alignment using GARD. This analysis removed both strains from the Bormi region, and left four strains found in the Charmaguria, Dhamrai, Dokhalo and Karamjal monkeys, which were identified as non-recombinant. Each step of the GARD results was confirmed by a PHI test analysis as implemented in SplitsTree; this test showed identical results with a P-value of less than 0.05 for each step. These four identified non-recombinant strains will be referred to as the ‘parental' strains, which forms a subset of the above-defined ‘core' strains.The parental strains were used as genotype representatives for cBrother v2.0 analyses, where a representative from each cluster was evaluated for recombinant relationships to these potential parental strains. Two independent cBrother runs of 1.1 million generations were run for each cluster's representative sequence, with the initial 10% discarded as burn-in, and sampling every 1000 generations. Convergence was assessed using the Gelman–Rubin diagnostic included with the cBrother distribution. Portions of evaluated sequences which were found to be descendant from a particular parental strain at 90% posterior probability or higher for at least 200 contiguous base pairs were identified as being partially descended from that parental strain.For each of Bormi1 and Bormi2, which we had identified as recombinant strains in our GARD and PHI test analysis, two representative sequences were aligned with two representatives from each of the parental strains. These alignments were split at the corresponding breakpoints identified from the above cBrother analysis (304 and 496, respectively). Each of the resulting four alignments was analyzed using PhyML 3.0 under a HKY85 model with 100 bootstrap replicates and trees were generated using FigTree v1.3.1 (http://tree.bio.ed.ac.uk/software/figtree). The resulting trees put the recombinant sequences in the same positions as those inferred by cBrother.Within-host pairwise distances were computed using R v2.15.1 (http://www.R-project.org) and the ape package's dist.dna function with the ‘raw' model (normalized Hamming distance). was also created using R and ape. […]

Pipeline specifications

Software tools Cervus, Structure Harvester, Ggplot2, MUSCLE, EMBOSS, FastTree, Bio::Phylo, Biopython, NeighborNet, SplitsTree, UCLUST, PhyML, FigTree, APE
Applications Miscellaneous, Phylogenetics, Population genetic analysis, Nucleotide sequence alignment
Organisms Macaca mulatta, Homo sapiens, Rabbit fibroma virus
Diseases HIV Infections