Computational protocol: SpaK/SpaR Two-component System Characterized by a Structure-drivenDomain-fusion Method and in Vitro Phosphorylation Studies

Similar protocols

Protocol publication

[…] SpaK (gi: 6226707, Uniprot P33113) and SpaR (gi: 417799, Uniprot P33112) protein sequences were input to the AS2TS protein structure modeling system (;, which generated initial homology models based on structures taken from the Protein Databank (PDB) (version released December 11, 2007). Structural templates having global sequence homology to each of SpaK and SpaR were further studied by examining domain-level homology.As no suitable template for the N-terminal domain (218 residues) of SpaK was identified, this domain was not modeled. Based on match length (227 residues), e-value (4e-57), and sequence identity (28%), PDB entry 2c2a_A, a sensor histidine kinase from Thermotoga maritima, was identified as the primary template for modeling SpaK (). Additional templates identified by AS2TS are shown in Supplemental Results . Two domains of SpaK (SpaK_d1: residues 219–300 and SpaK_d2: 301–459) were modeled separately, pending determination of relative conformation to be provided by structure-driven domain-fusion analysis (see ). Although identification of a structure template with acceptable global sequence homology enables initial model construction, there often remain sub-sequences in the protein of interest that do not correspond to any portion of the template due to insertions or deletions relative to that template. For this reason, and in order to construct as complete a model as possible to confirm the fitness of the modeled complex, the Local-Global Alignment (LGA) modeler gap-filling procedure (in-house software) was used to construct necessary loops, gaps or insertions by “grafting” in suitable regions from related structures in PDB.Similarly, SpaR was modeled as two separate domains, comprising residues SpaR_d1: 1–117 and SpaR_d2: 118–220. The N-terminal domain was initially modeled based on the structural template 1mvo_A (crystal structure of the PhoP receiver domain from Bacillus subtilis), which showed the highest level of sequence identity (46%) to that domain (see Supplemental Results ). In order to complete the model, the LGA gap-filling procedure was used to construct regions of missing coordinates. PDB entry 2gwr_A, a response regulator protein from Mycobacterium tuberculosis, was identified as the primary template for homology modeling of the C-terminal domain of SpaR (match length 216, e-value 9e-58, sequence identity 30%). This template was also used for the construction of the domain orientation (). Further refinement of the constructed SpaK and SpaR models was performed based on the structure comparison of modeled domains with other PDB templates that were structurally identified by a PDB-search procedure using LGA and the PDB release of July 8, 2008. In all created models the positioning of the sidechains for residues that were identical in the template were copied to the models, and the coordinates for missing side chain atoms were predicted using SCWRL . [...] The LGA software (, was used to perform structure homology searches against the PDB database to identify all entries with detected (LGA_S> = 35%) structural similarity to any of the four modeled domains (see above) within the homology models of SpaK and SpaR. We selected an LGS_S cutoff value of 35% based on our observation that the number and quality of hits increased rapidly at LGA_S< = 33% (data not shown) and based on previous work that determined the minimal structure homology needed to assure quality of structure alignment. Those entries with homology to both respective domains of SpaK and SpaR were selected as putative domain-fusion templates for modeling a SpaK/SpaR complex (). Reported in are the sequence identities between SpaK or SpaR compared to each corresponding domain-fusion template, whereby residue-residue correspondences were extracted from the structure alignments between the models and the domain-fusion templates. We do not report the PSI-BLAST calculated sequence identities, as these are highly inaccurate and meaningless when calculated from sequence alignments at low levels of sequence identity (i.e., below 10%). […]

Pipeline specifications

Software tools AS2TS, LGA, SCWRL, BLASTP
Application Protein structure analysis
Organisms Bacillus subtilis
Chemicals Adenosine Triphosphate