Computational protocol: Genomic structure and insertion sites of Helicobacter pylori prophages from various geographical origins

Similar protocols

Protocol publication

[…] The assembled prophages were analyzed using PHAST to provide a first annotation. The annotation of prophage genomes was carried out further using Phages v. 1.0 (, and RAST. The annotation of coding sequences (CDS) found by the three different methods were compared.The annotation of both H. pylori India7 (accession number CP002331) and Cuz20 (CP002076) prophages, as well as that of the Helicobacter 1961P (NC_019512.1), KHP30 (NC_019928.1), KHP40 (NC_019931.1), phiHP33 (NC_016568.1) phages, were used for comparative purposes.The annotated prophages were aligned using the progressive Mauve algorithm software (version 2.3.1), to check the order of the CDS in the prophage genomes and the existence of a consensus sequence. In order to infer phylogenetic relationships among prophages, the intact genomes of the 23 prophages identified in the present study, were aligned using MAFFT version 7 together with other six phage Helicobacter genomes available at public databases (1961P, KHP30, KHP40, phiHP33, H. pylori India7, and H. pylori Cuz20) as well as with the H. acinonychis (accession number NC_008229.1) prophage used as an outgroup. A nucleotide Neighbour-joining phylogenomic tree was constructed using the MEGA (Molecular Evolutionary Genetics Analysis) 6.0 software, with distances estimated using the Kimura two-parameter model. Considering the huge genomic diversity observed among all prophage genomes as well as their different lengths, both complete and pairwise deletion options were used. While the former removes all sites containing missing data or alignment gaps before the distance estimations begin, in the pairwise-deletion, option sites are only removed during the analysis as the need arises. Branching significance was estimated using bootstrap confidence levels by randomly resampling the data 1,000 times with the referred evolutionary distance model.To determine the population structure of prophages, we use prophage sequence typing (PST), as previously described. Briefly, the multi-fasta file with the alignment of integrase and holin gene sequences was converted to the STRUCTURE 2.3.4 program input file using xmfa2structure by X. Didelot and D. Falush ( STRUCTURE was used to study the number of K populations using the admixture, performing runs in duplicate. In each run, a Markov Chain Monte Carlo (MCMC) of 10,000 iterations and a burn-in period of 10,000 iterations were chosen. The highest mean value of ln likelihood was compared for multiple runs of 2 ≤ K ≤ 6.The existence of putative recombination phenomena within prophage genomes was first evaluated using the Recombination Detection Program version 4 (RDP4) with default settings. RDP4 simultaneously applies different methods for detecting and characterizing individual recombination events that are evident within a sequence alignment without any need for predefined sets of non-recombinant reference sequences. SimPlot software ( was also used for characterizing with higher detail the genomic mosaicism of the identified recombinant prophages, as previously described for bacterial pathogens. The similarity estimations were performed by using the Kimura two-parameter model with sliding window and step sizes that varied according to each recombinant genome. […]

Pipeline specifications

Software tools PHAST, RAST, Mauve, MAFFT, MEGA, ClonalFrame, RDP4, SimPlot
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Helicobacter pylori, Homo sapiens, Bacteria
Diseases Gastritis, Stomach Neoplasms