Computational protocol: Adaptive Evolution of HIV at HLA Epitopes Is Associated with Ethnicity in Canada

Similar protocols

Protocol publication

[…] For each sample, viral RNA was extracted and the HIV pol region amplified by RT-PCR for sequencing as previously described . Pol sequence fragments from different sets of primers were assembled in BioEdit v7.0.9 . Assembled sequences were translated for in-frame alignment to the NCBI HIV-1 subtype B reference genome (accession number: NC_001802) using TranslatorX . Nucleotide sequences were trimmed to identical length (1239 bp) and deposited in GenBank (HM468501-HM468505, HM468508, HM468510, HM468512-HM468514, HM468516, HM468517, HM468520, HM468523-HM468527, HM468530-HM468535, HM468538-HM468540, HM468542-HM468553, HM468557-HM468559, HM468566-HM468571, HM468573, HM468574, HM468576-HM468580, HM468582, HM468583, HM468585-HM468587, HM468589-HM468591, HM468593, HM468595-HM468598, HM468600-HM468602, HM468604, HM468606-HM468608, HM468611, HM468612, HM468614-HM468617, HM468619, HM468621, HM468624, HM468626-HM468629, HM468631, HM468632, HM468634-HM468636, HM468638-HM468640, HM468642-HM468650, HM468652-HM468660, HM468662, HM468665, HM468666, HM468668-HM468671, HM468674-HM468677, HM468679, HM468680, HM468682-HM468689, HM468691-HM468694, HM468696-HM468698, HM468700-HM468705, HM468707, HM468709-HM468711, HM468713-HM468716, HM468718-HM468720, HM468722, HM468725-HM468728, HM468730-HM468736, HM468741-HM468744, HM468747, HM468749, HM468750, HM468752, HM468756, HM468758, HM468760-HM468762, HM468764, HM468765, HM468768-HM468772, HM468774, HM468775, HM468777, HM468779, HM468781-HM468783, HM468785, HM468792, HM468793, HM468795-HM468797, HM468799, HM468801, HM468802, HM468805-HM468809, HM468813-HM468815, HM468817, HM468818, HM468820, HM468825-HM468828, HM468832, HM468834-HM468839, HM468841, HM468846-HM468848, HM468850, HM468852, HM468853, HM468855-HM468863, HM468866-HM468868, HM468872, HM468874, HM468875, HM468880, HM468883-HM468887, HM468889, HM468892, HM468894, HM468896, HM468897, HM468901, HM468902, HM468908, HM468910-HM468914, HM468917-HM468920, HM468922, HM468925, HM468927, HM468928, HM468931-HM468935, HM468939, HM468941, HM468943, HM468944, HM468948-HM468957, HM468959, HM468964, HM468966, HM468967, HM468970, HM468971, HM468974, HM468978, HM468981, HM468987, HM468994-HM468996, HM469001, HM469003, HM469006, HM469007, HM469009, HM469010, HM469014, HM469015, HM469017-HM469020, HM469024-HM469029, HM469031-HM469033, HM469036, HM469038, HM469040, HM469044, HM469045, HM469047-HM469051, HM469053, HM469055, HM469057, HM469058, HM469061, HM469062, HM469070, HM469073, HM469074, HM469076, HM469080, HM469086-HM469088, HM469090, HM469092, HM469096, HM469098, HM469099, HM469100, HM469102, HM469103, HM469105, HM469110, HM469113, HM469119, HM469121, HM469125, HM469126, HM469128, HM469129, HM469137-HM469139, HM469141, HM469143, HM469144, HM469148, HM469150, HM469151, HM469153, HM469154, HM469161, HM469162, HM469168, HM469170, HM469171, HM469173, HM469174, HM469179, HM469182, HM469183, HM469185, HM469187, HM469188, HM469190, HM469191, HM469201, HM469202, HM469204, HM469207, HM469210, HM469213, HM469215, HM469217, HM469220-HM469223, HM469225, HM469226, HM469228, HM469232, HM469234, HM469237, HM469240, HM469242, HM469246, HM469249, HM469256, HM469273, HM469276, HM469283, HM469286, HM469288, HM469289, HM469292, HM469294, HM469296-HM469299, HM469300, HM469305, HM469306, HM469309, HM469310, HM469312-HM469315, HM469319, HM469320, HM469327, HM469329, HM469331, HM469333, HM469337, HM469338, HM469340, HM469346-HM469348, HM469351, HM469355, HM469356, HM469359, HM469362, HM469364-HM469366, JQ674950-JQ674959, JQ674961-JQ674963, JQ674965-JQ674992, JQ674994-JQ675008, JQ675011-JQ675067, JQ675069-JQ675080, JQ675082-JQ675088, JQ675090-JQ675096, JQ675098, JQ675099, JQ675101, JQ675102, JQ675104-JQ675109, JQ675111-JQ675115, JQ675117-JQ675143, JQ675145-JQ675158, JQ675160-JQ675169, JQ675171, JQ675173-JQ675176, JQ675179-JQ675183, JQ675185-JQ675212, JQ675214-JQ675232, JQ675234-JQ675240, JQ675242-JQ675248, JQ675250, JQ675253-JQ675258, JQ675260, JQ675262, JQ675263, JQ675266, JQ675268, JQ675269, JQ675272, JQ675275, JQ675277-JQ675281, JQ675283-JQ675285, JQ675287, JQ675288, JX014632-JX015114). All sequence manipulations, such as nucleotide to amino acid translations, were carried out in BioEdit. For each sequence, the subtype was determined by submission of pol to the REGA HIV-1 Subtyping Tool v2.0 ( In addition, stage of infection was determined for each sample as < or >155 days (recent vs. established, respectively) using the Calypte BED-CEIA™ (capture enzyme immunoassay). The BED-CEIA (or BED for short) measures the proportion of IgG antibodies which are HIV-specific in a sample . Only subtype B samples, which account for the majority of infections circulating in Canada , and originating from established infections, as determined by BED, were included. Sequences were divided into five datasets based on ethnicity: Caucasian, Aboriginal, African-Caribbean, Latin-American and Asian, for analysis. HLA allele frequencies have previously been demonstrated to differ between these groups in North America . [...] Phylogenetic interrelationships between 1272 sequences were reconstructed in FastTree 2.1 under a GTR+Γ model, as selected by jModelTest. Two clusters of sequences for which there was an association between ethnicity and phylogeny were removed from the dataset and the ML reconstruction was repeated on 1248 sequences. Using the APE package in R, subtrees were randomly selected for Bayesian phylogenetic analysis in BEAST. Sequence subsets were run in duplicate under a Bayesian Skyline Plot model with 10 breakpoints and linear splines. Convergence was assessed in Tracer after 100 million generations. After elimination of a burn-in period (10–20% of run), a posterior distribution of trees was generated for analysis in BaTS. Terminal nodes were annotated with our character of interest, ethnicity, and the non-random distribution of ethnicity was tested in each subtree. [...] Sites previously identified to be under HLA-mediated selection were mapped along sequences in the alignment. Sites inferred to be under positive selection (p<0.01) were determined using HyPhy in each ethnic group dataset. In datasets exceeding 50 sequences, the SLAC algorithm was employed . The SLAC algorithm calculates observed numbers of non-synonymous (N) and synonymous (S) mutations at each codon in an alignment, and compares these to expected numbers (E[N] and E[S]) in order to estimate selection pressure (dN = N/E[N], dS = S/E[S]). A low dN/dS ratio (<1) indicates purifying selection, while a high dN/dS (>1) suggests diversifying positive selection pressure. In datasets <50 sequences, both the FEL and REL algorithms were used, and only sites appearing in both lists were considered to be under positive selection (as recommended by the HyPhy user manual).Site-specific entropy values for each amino acid within the alignment were calculated using the Los Alamos Entropy Tool ( Entropy is a measure of the variability at each position in an alignment; a site with high entropy is highly variable.Next, population divergence between ethnic groups was measured at the amino acid level by calculating the Fixation Index (FST) using an analysis of molecular variance (AMOVA), as implemented in Arlequin v3.5.1.2 . In order to calculate FST between amino acid sequences, positional amino acid frequency data were generated in BioEdit and transformed into a format readable by Arlequin using an in-house Perl script (available upon request).In order to determine whether sites that were divergent between ethnic groups were associated with sites under positive selection and HLA-associated sites, we used Fisher’s exact test. All statistical analyses were carried out in SPSS .Finally, we estimated whether the strength of selective pressure at each site differed between pairs of populations using two tests implemented in HyPhy. The former compares selection pressure across two trees constructed for populations separately. However, recent non-synonymous changes (on external branches, leading to tips) reflect only adaptation at the host level and may be maladaptive at the population level. Hence the second test discriminates between substitutions occurring on internal branches from those occurring on external branches of the tree. […]

Pipeline specifications