Computational protocol: HIV-1 subtype C envelope characteristics associated with divergent rates of chronic disease progression

Similar protocols

Protocol publication

[…] The full-length envelopes were sequenced in the forward and reverse directions using the ABI Prism Big Dye Terminator Version 3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA), utilizing primers spanning the entire envelope and approximately 300 bp apart. Sequences were then resolved on the ABI 3130 XL genetic analyzer. Contigs were assembled and edited using the Sequencher v 4.8 software (Genecodes, Ann Arbor, MI). The sequences were aligned using Clustal W [] and manually edited in the Genetic Data Environment (GDE 2.2). For phylogenetic analysis, subtype reference strains were obtained from the Los Alamos HIV sequence database http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html). Phylogenetic trees were generated in PAUP*4.0b10 using the TVM I + G model of substitution as determined by MODELTEST 3.7 []. Trees were rooted with a homologous region of Group O reference (O.CM.96). Maximum likelihood (ML) trees of sequences from individual patients were also drawn using the appropriate evolutionary model (as determined by MODELTEST 3.7) and rooted with the "Best-fit root" as determined by Path-O-Gen v1.2 []. All trees were bootstrapped with 1,000 sampling replicates. Trees were viewed with FigTree v1.1.2 []. The approximate time of HIV-1 infection was estimated using BEAST (Bayesian Evolutionary Analysis Sampling Trees) version 1.4.8 (http://beast.bio.ed.ac.uk) in order to predict approximate time of infection prior to study enrollment []. BEAUTi was used to generate the .xml file to generate the BEAST file. The GTR substitution model with estimated base frequencies and a site heterogeneity model of gamma + invariant sites were used. A relaxed, uncorrelated lognormal molecular clock model was chosen. The MCMC (Monte Carlo Markov Chain) length of chain was set at 30,000,000 to give an effective sample size (ESS) > 170. The number and location of putative N-linked glycosylation sites (PNGs) were estimated using N-GlycoSite (http://www.hiv.lanl.gov/content/sequence/GLYCOSITE/glycosite.html) from the Los Alamos National Laboratory database. Sequence diversity was calculated using the Maximum Composite Likelihood option in Mega 4.0 []. Characteristic differences between progressors and slow progressors including corresponding study entry and exit time-points were identified using VESPA (Viral Epidemiology Signature Pattern Analysis) []. Nucleotide substitution rates were calculated using baseml from the PAML software package []. Sites under positive selection were identified using the SLAC option in HyPhy [] and CODEML as implemented in the PAML software package.Positively selected sites and signature mutations were mapped onto the X-ray structure of a clade C HIV-1 gp120 (3LQA.pdb) [] using the BIOPREDICTA module in the VLifeMDS software package (VLife Science Technologies, 2007). Gp41 was modeled in SWISS-MODEL [] using 1ENV.pdb [] as a template. Structures were rendered and annotated in PyMol []. […]

Pipeline specifications

Software tools Sequencher, Clustal W, ModelTest-NG, TempEst, FigTree, BEAST, MEGA, PAML, HyPhy, BioPredicta, VLifeMDS, PyMOL
Application Phylogenetics
Organisms Human immunodeficiency virus 1, Homo sapiens
Diseases HIV Infections