Computational protocol: Whole Genome Sequence and Phylogenetic Analysis Show Helicobacter pylori Strains from Latin America Have Followed a Unique Evolution Pathway

Similar protocols

Protocol publication

[…] For this analysis the bacterial genome sequences provided in FASTA format were used to determine the Virtual Genome Fingerprints (VGF) using the VAMPhyRE software (Mendez-Tenorio et al., manuscript in preparation). Draft sequences were provided as a single string of concatenated contigs or scaffold sequences. The analysis consisted of two main stages; the first consisted in calculation of VGF using a collection of 15,264 highly diverse 13-mer probe sequences (VAMPhyRe Probe Set, VPS). The probe collection was tested with the entire sequence of both complete and draft genomes (including all concatenated contigs) using virtual hybridization, and allowing only one mismatch in both (+ and −) genome strands, in order to find all the complementary sites for the VPS. The detailed list of the sites identified by the virtual hybridization approach in a given genome is known as Virtual Genomic Fingerprint (VGF), and is characteristic for each bacterial genome. In the second stage the genomic distances were estimated by comparing the VGF of each bacterial genome in order to determine the number of sites shared by all pair of genomic fingerprints. Since some sites shared between genomes may correspond to non-homologous positions, the sites were extended by three positions to both left to right sites, for a total of 19 nt length. A site was considered as homologous between two genomes if the number of matches between the two sequences was ≥16 out of 19; a previous statistical analysis with unrelated sequences showed that by using such values no shared signals were observed. From the number of shared homologous signals between a pair of sequences, a similarity coefficient and a distance value for each pair of genomic fingerprints are estimated using an approach previously described (Nei and Li, ). This method was used to build a matrix of distances for all pairs of genomic fingerprints. Virtual Genomic Fingerprints can be calculated from both draft and complete genomes. Additionally, VGFs for H. pylori without the cagPAI island where calculated by subtracting the VGF of the island from the VGF of the whole genome. The matrix of distances calculated with VAMPhyRe was used to build phylogenomic trees using MEGA5.2.2 (Tamura et al., ). Additionally, Minimal Spanning and Split Decomposition phylogenomic networks (Huson and Bryant, ) were also calculated from the matrix of distances using SplitsTree4.For the MLST analyses we selected the 7 H. pylori house-keeping genes previously described (Achtman et al., ) in the 110 Latin American genomes as well as in the 61 NCBI available genomes selected for this study (Table ). [...] We first selected the cagPAI and MLST genes from 10 NCBI reference strains and aligned them by “reverse translation” (Wernersson and Pedersen, ) with MEGA 5.02. Next, the individual gene alignments were used to calculate Nucleotide Hidden Markov Models (NHMMs) with hmmbuild from the HMMER 3.1 software (Wheeler and Eddy, ). Then, we searched for these NHMMs on the complete and draft genomes used in this study with the nhmmer software, and the most significant gene alignments were selected, extracted and the reading frame of each gene verified, aligned by reverse translation with MEGA 5.02 and concatenated. The concatenated alignments were used for phylogenetic analysis of the cagPAI and MLST genes with MEGA 5.02. Distance-based phylogenetic trees for the cagPAI and MSLT genes were calculated using the T92+G+I (Tamura model with Gamma function and Invariable sites). Bootstrap analysis was performed with 1,000 replications, and Phylogenetic/Phylogenomic trees were edited and annotated with iTol (Interactive Tree Of Life) v3. [...] To better document traces of previous interactions with ancestors we build the NeighborNet. For this, distance matrices obtained with the VGF analyses were converted to Nexus format and used as input file to generate phylogenetic networks using SplitsTree v4.14.2 software (Huson and Bryant, ). The network was computed choosing the Ordinary Least Squares Variance and the “Equal Angles” Split Transformation parameters. […]

Pipeline specifications

Software tools MEGA, SplitsTree, HMMER, iTOL, NeighborNet
Application Phylogenetics
Organisms Bacteria, Helicobacter pylori
Diseases Stomach Neoplasms, Helicobacter Infections