Computational protocol: Rubeoparvulum massiliense gen. nov., sp. nov., a new bacterial genus isolated from the human gut of a Senegalese infant with severe acute malnutrition

[…] Open reading frames (ORFs) were predicted using Prodigal with default parameters, but the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank and the Clusters of Orthologous Groups (COGs) databases using BLASTP (E value 1e-03, coverage 0.7 and identity percentage 30%). If no hit was found, it was searched against the NR database using BLASTP with an E value of 1e-03, a coverage of 0.7 and an identity percentage of 30%, and if the sequence length was smaller than 80 aa, we used an E value of 1e-05. The tRNAScanSE tool was used to find tRNA genes, while ribosomal RNAs were found using RNAmmer . Lipoprotein signal peptides and the number of transmembrane helices were predicted using Phobius . Mobile genetic elements were predicted using PHAST and RAST . ORFans were identified if all the BLASTP performed did not give positive results (E value smaller than 1e-03 for ORFs with sequence size larger than 80 aa or E value smaller than 1e-05 for ORFs with sequence length smaller than 80 aa). Such parameter thresholds have already been used in previous studies to define ORFans. Artemis and DNA Plotter were used for data management and the visualization of genomic features respectively. The Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment .Comparator species for genomic comparison were identified in the 16S RNA tree using Phylopattern software . The genome of strain mt6T was compared to those of Alkaliphilus metalliredigens strain QYMF, Clostridium aceticum strain DSM 1496, Alkaliphilus transvaalensis strain SAGM1 and Alkaliphilus oremlandii strain OhILAs.For each selected genome, the complete genome sequence, proteome genome sequence and Orfeome genome sequence were retrieved from the FTP of NCBI. An annotation of the entire proteome was performed to define the distribution of functional classes of predicted genes according to the clusters of orthologous groups of proteins (using the same method as for the genome annotation). Annotation and comparison processes were performed in the multiagent software system DAGOBAH , which includes Figenix libraries that provide pipeline analysis. To evaluate the genomic similarity between studied genomes, we determined two parameters, digital DNA-DNA hybridization (DH), which exhibits a high correlation with DDH , and average genomic identity of orthologous gene sequences (AGIOS) , which was designed to be independent from DDH . The AGIOS score is the mean value of nucleotide similarity between all couples of orthologous proteins between the two studied genomes . […]

Pipeline specifications

Software tools Prodigal, BLASTP, RNAmmer, Phobius, PHAST, RAST, Mauve, FIGENIX
Databases COGs
Application Nucleotide sequence alignment
Organisms Homo sapiens
Diseases Malnutrition