Computational protocol: General overview on structure prediction of twilight-zone proteins

Similar protocols

Protocol publication

[…] Threading, also known as fold recognition, is used to identify protein templates in PDB bank for similar fold or similar structural motif to the target protein []. The concept for threading is similar to comparative modelling but comparative modelling only considers sequence similarity between target protein and template, while protein threading considers the structural information in the template []. The critical step of threading is to identify correct template proteins with similar folds to the target protein and make correct alignment []. Protein threading compares a target sequence against one or more protein structures to detect and obtain the best compatibility of sequence-structure template pair [, ]. They identify best fits of target sequence with the fold template based on the generated alignments and each template is calculated according to different scoring function. Commonly used alignment scores to identify precise target-template alignments include sequence profile-profile alignments (PPA), sequence-structural profile alignments, secondary structure match, hidden-Markov models (HMM) and residue-residue contact []. The alignment algorithms are able to search for remotely homologous sequences in the databases. Therefore, even if sequence similarity is low (<30 %), threading method can be used to obtain similar folds or structural motifs for the target sequence. Traditionally, pair-wise comparison is used for matching of single sequences of target and template in the database. PPA, which can be used to detect weak similarities between protein families, is most often used and popular threading approach (successfully used in CASP7 for I-TASSER) [, ]. The new threading algorithm MUSTER (Multi-Source ThreadER) showed that accuracy of PPA can be further improved by incorporating various sequence and structure information (e.g. sequence profiles, secondary structure prediction, torsion angles, solvent accessibility and hydrophobic scoring matrix). MUSTER showed a better performance with TM-score 5–6 % higher than PPA in the testing proteins [].The overall procedure for I-TASSER is illustrated in Fig. . In general, I-TASSER divided the protein structure prediction into four steps: i) template identification, ii) structural reassembly, iii) model construction and, iv) final model selection. In the first step, the query sequence is threaded through PDB library to identify appropriate fragment using LOMETS algorithm []. This will be followed by continuous fragments from the threading alignments are used to assemble full-length models that aligned well, with the unaligned regions (loops/tails) built by ab initio modelling []. The structure assembly simulations are guided by a knowledge-based force field, including: i) general knowledge-based statistics terms from the PDB, ii) spatial restraints from treading templates, iii) sequence-based contact predictions from SVMSEQ (a support vector machine based residue-residue contact predictor) []. After that, fragment assemble simulation is performed again and are clustered by SPICKER []. After superposition, all the clustered structures are averaged to obtain the cluster centroids. The final full atomic models are obtained by REMO which builds the full-atomic models from the selected I-TASSER decoys through the optimization of the hydrogen-bonding networks []. The forces in REMO protocol include H-bonding, clash/break-amendment, I-TASSER restraints and CHARMM22 potential []. For the final top 5 models selection, I-TASSER uses SPICKER to cluster and report up to five models corresponding to the five largest structure clusters. These steps are the essential advantage of TASSER for is its ability to drive the template structures closer to the native than the input templates by ~2–3 Å [–]. The confidence level of the predicted model was estimated by C-score (Eq. ).Eq.1C−score=lnMMtot×1RMSD×17∑i=17ZiZ0iFig. 1TASSER has been tested in CASP6 experiment and emerged as one of the most successful structure prediction methods. It is however, TASSER failed to correctly predict the relative orientation of multiple domain proteins. TASSER’s performance for free modelling targets is yet to be satisfactory as the success rate for non-homologous single-domain proteins is around two thirds [, ].Since no single program has been reported to be outperformed others (within all threading approach), the consensus structure prediction method (meta-server approach) is therefore developed. With this approach, a number of models by multiple threading programs are generated. The idea behind this approach is the models that are generated by different programs are closest to native and less likely to make a common inaccurate prediction []. Available meta threading servers include 3D-Jury [], and LOMETS []. 3D-Jury is a meta-server that collects and compares models from various remote protein structure prediction servers []. Therefore, the final performance is highly dependent on the inputs from the servers []. On the other hand, LOMETS locally installed all threading alignments programs, including PPA, HMM, structural profile and contact-based alignment. This will allow the users to obtain the predictions of all servers quickly compare with 3D-Jury []. The meta-server approaches have previously dominated the server prediction in CASP6 experiments. However, in CASP7 experiment, Zhang-Server (I-TASSER) showed better performance than all available meta-server (will be discussed in section ‘Current trend in protein structure prediction’) []. […]

Pipeline specifications

Software tools I-TASSER, MUSTER, LOMETS, REMO, TASSER
Databases CASP
Application Protein structure analysis
Chemicals Amino Acids