[…] s of gene calls were combined using Critica as the preferred start call for genes with the same stop codon. Genes with less than 80 amino acids which were predicted by only one of the gene callers and had no Blast hit in the KEGG database at 1e-05, were deleted. This was followed by a round of manual curation to eliminate obvious overlaps. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [], TMHMM [], and signalP []. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform developed by the Joint Genome Institute, Walnut Creek, CA, USA []., The genome consists of a 4,352,825 bp long chromosome with a 65% G+C content and a 53,732 bp plasmid with 60% G+C content ( and ). Of the 3,933 genes predicted, 3,850 were protein-coding genes, and 83 RNAs; nine pseudogenes were also identified. The majority of the protein-coding genes (72.7%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in ., We would l […]

