A guide for protein structure prediction methods and software

To exert their biological functions, proteins fold into one or more specific conformations, dictated by complex and reversible non-covalent interactions. Determining the structure of a protein can be achieved by time-consuming and relatively expensive technics such as crystallography, nuclear-magnetic resonance spectroscopy, and dual polarization interferometry. Bioinformatics software have been developed to compute and predict protein structures based on their amino acid sequences.

 

A recap on protein structure

 

As an alternative to experimental technics, structure analysis and prediction tools help predict protein structure according to their amino-acid sequences. Solving the structure of a given protein is highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes). The field of computational protein prediction is thus evolving constantly, following the increase in computational power of machines and the development of intelligent algorithms.

 

There are four levels of protein structure (figure 1). In protein structure prediction, the primary structure is used to predict secondary and tertiary structures.

 

Secondary structures of proteins are localized folding within the polypeptide chain that is stabilized by hydrogen bonds. The most common secondary protein structures are alpha helices and beta sheets.

 

Tertiary structure is the final form of the protein once the different secondary structures have all folded into a 3D structure. This final shape forms and is held together through ionic interaction, disulphide bridges and van de Waals forces.

 

protein structure primary secondary tertiary quaternary
Four levels of protein structure. Image from Khanacademy.org.

Protein structure prediction methods and software

 

A great number of structure prediction software are developed for dedicated protein features and particularity, such as disorder prediction, dynamics prediction, structure conservation prediction, etc. Approaches include homology modeling, protein threading, ab initio methods, secondary structure prediction, and transmembrane helix and signal peptide prediction.

 

Choosing the right method always begins by using the primary sequence of the unknown protein and searching the protein database for homologues (figure 2).

 

chart protein structure prediction software tool method
Decision making chart for protein structure prediction method.

 

Here are some detailed methods for protein structure prediction:

 

  • Secondary structure prediction tools

 

These tools predict local secondary structures based only on the amino acid sequence of the protein. Predicted structures are then compared to the DSSP score, which is calculated based on the crystallographic structure of the protein (more on the DSSP score here).

 

Prediction methods for secondary structure mainly rely on databases of known protein structures and modern machine learning methods such as neural nets and support vector machines.

Here are some great tool for secondary structure prediction.

 

  • Tertiary structure

 

Tertiary (or 3-D) structure prediction tools fall into two main methods: Ab initio, and comparative protein modeling.

 

Ab initio (or de novo) protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates.

 

All the information about a protein’s tertiary structure is encoded in its primary structure (that is, its amino acid sequence). However, an enormous number of them can be predicted, among which only one has the minimal free energy and stability required to be folded properly. Ab initio protein structure prediction thus requires vast amount of computational power and time to solve the native conformation of a protein, and remains one of the top challenges for modern science.

 

Most popular servers include Robetta (using the Rosetta software package), SWISS-MODEL, PEPstr, QUARK. Browse an exhaustive list here.

 

If a protein of known tertiary structure shares at least 30% of its sequence with a potential homolog of undetermined structure, comparative methods that overlay the putative unknown structure with the known can be utilized to predict the likely structure of the unknown. Homology modeling and protein threading are two main strategies that use prior information on other similar protein to propose a prediction of an unknown protein, based on its sequence.

 

Homology modeling and protein threading software include RaptorX, FoldX, HHpred, I-TASSER, and more.

 

 

References

 

De novo protein structure prediction. Wikipedia.

 

Protein structure prediction. Wikipedia