[…] databases, and the genomic data was acquired from NCBI FTP site ( in this study. 99 genomes were selected, including 75 complete H. pylori genomes and 24 NPHS genomes, which belong to 19 species (released at the analysis time). To ensure the accuracy and consistency of initial data, chromosome, plasmids, and scaffolds of each candidate strain were concatenated by sequence “NNNNNCATTCCATTCATTAATTAATTAATGAATGAATGNNNNN” to establish a pseudochromosome for further analysis []., In order to get the accordance dataset and avoid contradiction that was caused by difference of the gene prediction method applied in different projects, a single gene finding program, Glimmer version 3.02 [], was used to predict open reading frames (ORFs). The ORFs were removed while their start or end position was inside the sideward sequence. The predicted results and raw databases information were corroborated to one another. And the program RNAmmer-1.2 [] was used to predict full length of rRNA gene sequences. The size, GC content, number of genes, source, and other characteristics of all selected genomes were listed in ., In order to better understand the phylogenetic relationships among Helicobacter species, a phylogenetic tree was constructed using the 16S rRNA genes obtained from the 99 genomes. In addition, Campylobacter jejuni and Campylobacter fetus were used as outgroup. Multiple sequence alignment of 101 16S rRNA genes was performed using MAFFT version 7.123b []. The phylogenetic tree was inferred by the Neighbor-Joining method [] using MEGA7 []. To estimate the consensus tree, 1000-bootstrap resampling was done., Orthologous group analyses were performed with software OrthoMCL version 2.0.9 [], which could generate a similarity matrix normalized by species representation relationship of sequences, and it was then grouped using the Markov Clustering Algorithm (MCL) []. All-against-all BLASTP comparisons were used to get pair sequences of protein dataset in OrthoMCL at start. An E-value cutoff of 1e − 5 and the aligned sequence length longer than the coverage of 50% of a query sequence was chosen to perform OrthoMCL., A family matrix, which was generated from the genome pairwise comparison of the gene contents of any two genomes, was visualized. The gene families obtained from the OrthoMCL were used to get core and pan genome datasets. The number of unique genes and gene families for each individual species relative to other 98 genomes was calculated and visualized with bar graph., The dataset was combined into three groups: 75 H. pylori genomes alone, 24 NPHS genomes alon […]

Software tools Glimmer, RNAmmer, MAFFT, MEGA, BLASTP
Organisms Helicobacter pylori, Homo sapiens
Diseases Stomach Diseases, Gastroenteritis, Stomach Diseases, Gastrointestinal Neoplasms, Digestive System Neoplasms