Computational protocol: Proteome scale comparative modeling for conserved drug and vaccine targets identification in Corynebacterium pseudotuberculosis

Similar protocols

Protocol publication

[…] To select conserved targets that were essential to the bacteria, a subtractive genomics approach was followed []. Briefly, the set of core-modelome proteins from C. pseudotuberculosis were subjected to the Database of Essential Genes (DEG) for homology analyses. DEG contains experimentally validated essential genes from 20 bacteria []. The BLASTp cutoff values used were: E-value = 0.0001, bit score ≥100, identity ≥ 35% [].Furthermore, the pool of essential genes was subjected to NCBI-BLASTp (E-value = 0.0001, bit score ≥100, identity ≥ 35%) against (human, equine, bovine and ovine proteomes) to identify essential non-host homologs targets []. The set of essential non-host homologous proteins were further crosschecked with the NCBI-BLASTp PDB database using default parameters to find any structural similarity with the available host homologs protein structures, keeping cutoff level to ≤ 15% for query coverage. These proteins were checked for their biochemical pathway using KEGG (Kyoto Encyclopedia of Genes and Genomes) [], virulence using PAIDB (Pathogenicity island database) [], functionality using UniProt (Universal Protein Resource) [], and cellular localization using CELLO (subCELlular LOcalization predictor) []. The final list of targets was based on 12 criteria as described previously []. [...] We have extrapolated our analyses and also considered protein targets that were predicted as essential to bacterial survival but showed homology to host proteins. This was based on the possibility to find differences between bacterial and host proteins to rationally design inhibitors. The pool of essential protein targets that showed cut off values equal or higher than those for essential non-host homologs through NCBI-BLASTp was treated as host homologous proteins. These were also analyzed for pathway involvement, virulence, functional annotation and cellular localization like essential non-host homologous proteins. To verify the presence of significant residue differences in druggable protein cavities, a structural comparison was performed for each pathogen and their corresponding host protein through the molecular visualization program PyMOL (v1.5, Schrodinger, LLC) ( The related published data of each template structure for each host homolog was also crosschecked for information about these residues, based on the PDB code of each template structure as input in the PDBelite server []. Catalytic Site Atlas (CSA) was also consulted to get robust information of the active site residues for the druggable enzyme targets []. CSA is a database documenting enzyme active sites and catalytic residues in enzymes of 3D structure and has 2 types of entry, original hand-annotated entries with literature references and homologous entries, found by PSI-BLAST alignment to an individual original entry, using an e-value cut-off of 0.00005. CSA can be accessed via a 4-letter PDB code. The equivalent residue that aligns in the query sequence to the catalytic residue found in the original entry is documented. Though the DoGSiteScorer predicts the druggable protein cavities, the host homologous proteins were further subjected to CASTp (Computed Atlas of Surface Topography of Proteins) [], Pocket-Finder and Q-SiteFinder [] to get more reliable and robust results about the druggable cavities of the target proteins. [...] The ligand library was obtained from the ZINC database, containing 11,193 drug-like molecules, with Tanimoto cutoff level of 60% []. Proteins were inspected for structural errors such as missing atoms or erroneous bonds and protonation states in MVD (Molegro Virtual Docker) []. The cavities predicted with DogSiteScorer (druggability ≥ 0.80) for all protein targets, were compared with the cavities detected by MVD. The most druggable cavity, according to DogSiteScorer, was subjected to virtual screening. MVD includes three search algorithms for molecular docking namely MolDock Optimizer [], MolDock Simplex Evolution (SE), and Iterated Simplex (IS). In this work the MolDock Optimizer search algorithm, which is based on a differential evolutionary algorithm, was employed. The default parameters used for the guided differential evolution algorithm are a) population size = 50, b) crossover rate = 0.9, and c) scaling factor = 0.5. The top ranked 200 compounds for each protein were analyzed in Chimera for shape complementarity and hydrogen bond interactions, leading to the selection of a final set of 10 compounds for each target protein. […]

Pipeline specifications

Software tools BLASTP, CELLO, PyMOL, ProteinsPlus, CASTp, Pocket-Finder, Q-SiteFinder, MVD
Databases UniProt PAIDB DEG KEGG
Organisms Corynebacterium pseudotuberculosis, Homo sapiens, Equus caballus, Bos taurus, Ovis aries
Diseases Edema, Lymphadenitis, Mastitis, Ulcer, Yersinia pseudotuberculosis Infections