Computational protocol: Cross-Border Sexual Transmission of the Newly Emerging HIV-1 Clade CRF51_01B

Similar protocols

Protocol publication

[…] In order to investigate the geographical origin of HIV-1 CRF51_01B and its possible movement between Malaysia and Singapore, a retrospective molecular epidemiological analysis was conducted among 595 subjects recruited for an antiretroviral resistance surveillance study in the city of Kuala Lumpur, Malaysia, between 2008 and 2012. HIV-1 RNA was extracted from plasma through magnetic silica-based method implemented in the automated NucliSENS easyMAG platform (BioMerieux, France). Reverse transcription was carried out with random hexamers using SuperScript III RNase H− Reverse Transcriptase (Invitrogen, Carlsbad, California, USA) according to manufacturer’s instruction. Next, sets of primers previously described were used to amplify the protease and reverse transcriptase genes through nested PCR with QIAGEN HotStarTaq Plus DNA Polymerase (Qiagen, Hilden, Germany). PCR products were purified and sequencing was performed in an ABI PRISM 3730XL DNA Analyzer with BigDye terminators (Applied Biosystems, Foster City, California, USA).In order to discern the subtypes of amplified sequences, reference sequences relevant to the Southeast Asia region including three near full length and all partial genomic sequences of CRF51_01B , were downloaded from Los Alamos National Laboratory (LANL) HIV sequence database ( for phylogenetic analysis. References and query sequences were aligned in accordance to HIV Sequence Compendium 2012 ( Neighbour-joining analysis was carried out to discern the phylogenetic relationships of all the isolates in MEGA version 5 with Kimura two-parameter model with a transition-transversion ratio of 2.0. Amplification of the complete genomic sequence for isolates that formed a monophyletic cluster with known CRF51_01B was carried out using primers as described elsewhere . All CRF51_01B sequences reported in this study, 11MYKL055 and 09MYKL050 have been deposited in GenBank with accession numbers KJ485697 and KJ485698, respectively. [...] Recombinant Identification Program (RIP) tool from LANL HIV sequence database was used to depict recombination structure in submitted query sequences through comparison with a background alignment . Subsequently, bootscanning and informative sites estimation using SimPlot version 3.5.1 were carried out to determine the mosaic structures of the recombinants as well as the location of breakpoints. Next, sub-region neighbor-joining trees were built for each recombinant segment to reaffirm the identity or parental origin of the genomic fragments.In order to improve resolution of subsequent phylogenetic analysis, additional genomic sequences of subtype B and CRF01_AE from Singapore and Malaysia were downloaded from LANL HIV sequence database and GenBank database. These sequences were then aligned against reported CRF51_01B strains as well as other reference sequences, and neighbour-joining analysis was carried out. The criteria for confirming partial sequences to be CRF51_01B isolates is that sequences must be available in all three genomic regions of protease (subtype B), gp120 (CRF01_AE) and gp41 (subtype B), and these sequences should cluster with known/reported CRF51_01B with strong bootstrap support. This also served as a preliminary analysis to identify subtype B and CRF01_AE strains that are closely related to CRF51_01B in protease, gp120 and gp41 genomic regions.Maximum likelihood analysis for the Malaysian CRF51_01B sequences was performed using PAUP version 4.0 on the protease, gp120 and gp41 regions along with sequences of known Singaporean CRF51_01B isolates downloaded from the LANL HIV sequence database. In order to deduce the timeline for emergence of CRF51_01B isolates in Singapore and Malaysia, Bayesian Markov chain Monte Carlo (MCMC) sampling method was performed to obtain the posterior distribution of phylogenies in BEAST version 1.7.5 under the uncorrelated log-normal relaxed clock model with general time-reversible (GTR) nucleotide substitution and constant tree models. Three independent MCMC chain runs of 30 million steps sampled for every 30000 states were performed on specific genomic regions mentioned as before. The MCMC output was checked for convergence and effective sampling size (ESS) using Tracer version 1.4 ( with 10% of each chain discarded as burn-in. […]

Pipeline specifications

Software tools MEGA, SimPlot, PAUP*, BEAST
Application Phylogenetics
Organisms Human immunodeficiency virus 1, Homo sapiens, Human immunodeficiency virus 2
Diseases HIV Infections