Computational protocol: Identification and classification of small molecule kinases: insights into substrate recognition and specificity

Similar protocols

Protocol publication

[…] Sequences of known EPK and ELK sequences were obtained from Pfam v23.0. []. Seed sequences of ELK and EPK groups given in Fig.  were obtained from Uniprot [] using the Pfam identifier of the family as a query, and supplemented with sequences from the annotated genomes of model organisms. A representative PDB structure from each ELK and EPK group was used for structural alignment (PDB ids are given in Fig. ). Pairwise structural alignments of each ELK and EPK representative PDB structure with Rio kinase (pdbid : 1zp9) were generated using MASS [], Matt [] and DeepAlign []. Secondary structure elements and Hanks and Hunter subdomain motifs were aligned manually. These structural and motif landmarks ensured correct placement of intervening regions despite the absence of significant sequence similarity. The proteins within each group were aligned against that group’s representative PDB sequence.MAPGAPS [], a program to align sequences to a hierarchical set of profiles, was used to generate the final core domain alignment. The input to MAPGAPS is a set of alignment profiles, a consensus sequence for each profile, and a manually-curated template alignment of the consensus sequences. The template alignment defines both the hierarchical relationships between profiles, the alignment of each profile to its parent profile within the hierarchy and, consequently, the alignment of each profile to the root profile, which, in our case, corresponds to the ELK structural core. Based on this input, MAPGAPS identifies those database sequences with a significant match to at least one of the profiles, optimally aligns each matching sequence to its highest-scoring profile and, based on the template alignment, aligns all of the sequences to the ELK structural core. This yields an accurate core alignment by first aligning each database sequence to its most closely-related profile and then aligning each profile alignment to the structural core based on the (manually-curated) template alignment.More specifically, we iteratively applied the following seven-step procedure:Use each representative PDB sequence both as a master sequence to generate a subgroup profile alignment and as the “consensus” sequence for that subgroup. At this step, phylogenetically weighted consensus generation was done.Use pairwise structure based alignments to generate a template alignment of all PDB sequences. A Rio kinase-anchored template alignment was used as a starting point in the first iteration.Generate a consensus sequences from each profile alignment. At this step, an unweighted consensus master alignment was generated.Generate MAPGAPS profiles from both the template alignment and group alignments.Re-align sequences within each group and generate a consensus sequence; note that this consensus is different from the PDB representative, to which it nevertheless shares high sequence similarity.Align the new consensus sequences using MAPGAPS and the MAPGAPS profiles; this generates a new master alignment that is not Rio anchored.Re-generate MAPGAPS profiles using the new master alignment of consensus sequences as a template and re-aligned group alignments.Use each representative PDB sequence both as a master sequence to generate a subgroup profile alignment and as the “consensus” sequence for that subgroup. At this step, phylogenetically weighted consensus generation was done.Use pairwise structure based alignments to generate a template alignment of all PDB sequences. A Rio kinase-anchored template alignment was used as a starting point in the first iteration.Generate a consensus sequences from each profile alignment. At this step, an unweighted consensus master alignment was generated.Generate MAPGAPS profiles from both the template alignment and group alignments.Re-align sequences within each group and generate a consensus sequence; note that this consensus is different from the PDB representative, to which it nevertheless shares high sequence similarity.Align the new consensus sequences using MAPGAPS and the MAPGAPS profiles; this generates a new master alignment that is not Rio anchored.Re-generate MAPGAPS profiles using the new master alignment of consensus sequences as a template and re-aligned group alignments. [...] The representative sequences from each ELK group with known structures were taken from the PDB database. For families with no structural information (e.g. Kdo, MalK and RevK) a Uniprot or NCBI sequence was used. The alignment of the sequences of representative structures and Uniprot sequences was done using the MAPGAPS profiles. A maximum likelihood tree with bootstrap support was constructed with RAxML v7.0 []. Bootstrap values were estimated with 500 alternate trees generated from the alignment. The ML tree generation used a BLOSUM62 matrix and the consensus tree shown in Fig.  was generated using the extended majority rule of RAxML. The tree was colored and visualized using iTOL []. […]

Pipeline specifications

Software tools Matt, DeepAlign, MAPGAPS, RAxML, iTOL
Databases Pfam
Applications Phylogenetics, Protein structure analysis, Amino acid sequence alignment
Organisms Cervus elaphus
Chemicals Aminoglycosides, Carbohydrates, Hydroxyl Radical