Contact map detection software tools | Protein structure data analysis
Protein residue–residue contact prediction is the problem of predicting whether any two residues in a protein sequence are spatially close to each other in the folded 3D structure. Contacts occurring between sequentially distant residues, i.e. long-range contacts, impose strong constraints on the 3D structure of a protein and are particularly important for structural analyses, understanding the folding process and predicting the 3D structure.
Introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks.
A method to learn an undirected probabilistic graphical model of the amino acid composition within the multiple sequence alignments. GREMLIN employs regularization to penalize complex models and thus reduce the tendency to over-fit the data. The strength of measured co-evolution is strongly predictive of residue-residue contacts in the 3D structure of the protein. GREMLIN has also been referred to as a maximum-entropy model or a global statistical model.
A computationally implementation of direct-coupling analysis (DCA), which allows to evaluate the accuracy of contact prediction by DCA for a large number of protein domains. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined.
Combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts-around 60% higher than PSICOV, and around 40% better than CCMpred.
Separates direct from indirect interactions in the context of protein sequences. plmDCA was applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins. It outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. plmDCA should provide a natural choice for analysts interested in applying state-of-the-art protein structure prediction (PSP) to their protein of interest, as well as for researchers looking to further extend the theory and practical applicability of direct-coupling analysis (DCA).
Identifies protein-like contact patterns to improve contact predictions. PconsC is a random forest (RF) approach that uses predictions from two methods of inferring direct information: PSICOV (inverse covariance matrix estimation) and plmDCA (pseudolikelihood with Potts models). It incorporates four different thresholds to capture evolutionary couplings in variably conserved areas of proteins.
A fast GPU and CPU implementation of a top-performing pseudo-likelihood maximization (PLM)-based contact prediction approach that runs in a fraction of the time of comparably accurate methods. The speed increase is particularly important for long proteins and large-scale applications.