Computational protocol: Genetic signatures for Helicobacter pylori strains of West African origin

Similar protocols

Protocol publication

[…] Multilocus sequence typing was performed on the strains of West African and European origin as described previously [, , ]. Nucleotide sequences of 7 conserved housekeeping genes (atpA, efp, mutY, ppa, trpC, yphC, and ureI) from each strain were extracted from Genbank or an H. pylori MLST database (http://pubmlst.org/helicobacter), and were concatenated and aligned to corresponding loci from 178 reference strains (previously assigned to H. pylori populations or subpopulations) using the Muscle algorithm within MEGA7. Phylogenetic relationships were analyzed using MEGA7 [] with the Kimura 2-parameter model of nucleotide substitution and 1,000 bootstrap replicates. [...] Seven representative strains from diverse geographic origins () were compared at the whole-genome level using nWayComp, which compares deduced protein sequences and searches for sequence homologies among multiple strains [, ]. For each protein encoded by all seven strains, a 7x7 table of amino acid sequence identities was generated, and mean percent amino acid identities were calculated based on all possible comparisons among the 7 strains. The mean ± SD amino acid sequence identity for the full set of 1187 orthologous protein sequences was 94.2 ± 0.06%. We designated a mean percent amino acid identity of <90% as the criterion for highly divergent protein sequences. The gene alignments of divergent genes were examined by eye to exclude possible misalignments or mismatches to known paralogs, and proteins with mean percent amino acid sequence identity values of less than 50% were excluded. The gene numbers of orthologs in reference strains 26695 and J99 were determined using the PyloriGene webserver (http://genolist.pasteur.fr/PyloriGene/).Eight strains classified as hpEurope and eight strains classified as hspWAfrica based on MLST () were similarly analyzed at the genome-wide level using nWaycomp []. For each protein encoded by all 16 strains, a 16x16 table of amino acid sequence identities was generated. Mean percent amino acid identities were calculated based on several comparisons among the 16 strains, and three values were calculated. The first value was the mean percent amino acid sequence identity based on comparisons among only the eight hspWAfrica strains, the second was the mean percent amino acid sequence identity based on comparisons among the eight hpEurope strains, and the third was the mean percent amino acid sequence identity based on comparisons of hspWAfrica strains with hpEurope strains. The African-European result was subtracted from the intra-African result to obtain a first difference value. The African-European result was then subtracted from the intra-European result to obtain a second difference value. If both difference values were >5% (corresponding to >5% difference in amino acid sequence identity), the protein was considered to exhibit a high level of divergence when comparing hspWAfrican and hpEurope strains. [...] Nucleotide sequences from 16 H. pylori strains (eight classified as hpEurope and eight classified as hspWAfrica) encoding proteins with divergent sequences were analyzed using the McDonald-Kreitman test []. Nucleotide sequences were aligned using Muscle []. The McDonald-Kreitman test was performed using an online resource which ignores codons with gaps and applies a Jukes and Cantor divergence correction []. P indicates the polymorphisms within the populations, and D indicates the fixed divergence between populations, with n denoting nonsynonymous and s, synonymous changes. The Neutrality Index was calculated as NI = (Pn/Ps)/(Dn/Ds), and the alpha value depicts the proportion of adaptive substitutions estimated as 1 –NI. […]

Pipeline specifications

Software tools MUSCLE, MEGA, nWayComp, MKT
Databases PubMLST Genolist
Applications Genome annotation, Phylogenetics, Population genetic analysis, Nucleotide sequence alignment
Organisms Helicobacter pylori, Bacteria, Homo sapiens
Diseases Stomach Neoplasms