Protein function prediction software tools | Sequence data analysis
Protein functions can be predicted or detected on the basis of their sequences, by comparing homologies with others known proteins in databases. Some prediction tools can determine proteins functions based on structural information, such as ligand-binding sites, gene-ontology terms, or enzyme classification.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Provides functional analysis of protein sequences. InterPro is a software which allows to classify sequences into protein families and to predict the presence of important domains and sites. The software combines predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains.
Provides a suite of methods important for the prediction of protein structural and functional features. predictProtein is a web server that incorporates over 30 tools. This software searches up-to-date public sequence databases, creates alignments, and predicts aspects of protein structure and function. It can help when little is known about the protein in question. For medium-to-high throughput analyses, downloadable software packages and the PredictProtein Machine Image (PPMI) are available.
A high-throughput tool for more reliable functional annotation. PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. It uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation.
Detects homology. FFAS includes adding optimized structural features (experimental or predicted), ‘symmetrical’ Z-score calculation and re-ranking the templates with a neural network. It has high success rate at the Structural Classification of Proteins (SCOP) family, superfamily and fold levels. The tool was tested on the Lindahl benchmark set for fold recognition and showed superior success rate on the family and superfamily levels.
Provides a solution for protein tagging in mammalian tissue culture cells. Mouse BAC finder is an efficient, generic and scalable approach for bacterial artificial chromosomes (BACs)-based transgenesis in mammalian tissue culture cells, which we term ‘BAC TransgeneOmics’. The use of bacterial artificial chromosomes (BACs) for transgenesis enables the expression of the transgene from its native genomic environment. The method is applicable to very large genes, which are difficult to obtain as cDNAs.
Allows protein sequence analysis. ANTHEPROT is able to interactively couple multiple alignments with secondary structure predictions. It can submit tasks on a remote server and retrieve data from a remote Web server. This tool is a complete solution for Intranet protein sequence analysis for universities, biological research institutes or biomedical companies. It permits users to integrate secondary structure predictions within multiple alignment and full interactive editing of alignments.
A structure-based method for biological function annotation of protein molecules. To use COFACTOR, user needs to provide a 3D-structural model of the protein of interest. COFACTOR will thread the structure through the BioLiP protein function database by local and global structure matches to identify functional sites and homologies. Functional insights, including ligand-binding site, gene-ontology terms, and enzyme classification, will be derived from the best functional homology template.
Allows users to study the proteomic scale inference of enzyme function. EFICAz can identify functionally discriminating residue (FDR) as residues that discriminate the members of a homo-functional family from a hetero-functional family. It combines the prediction from four independent methods, namely: (1) CHIEFc family-based (FDR) identification, (2) multiple PFAM-based FDR recognition, (3) CHIEFc SIT evaluation and (4) high-specificity multiple PROSITE patterns.
Allows Bayesian phylogenetic inference on protein data sets. GPU MrBayes, a modified version of MrBayes, is a task mapping strategy which makes use of GPU cores and GPU memory and reduces redundant operations. The software uses Kahan summation to improve accuracy, convergence rates, and consequently runtime. It was tested on protein data sets from a range of animals studied in phylogenetics research.
A user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree.
Provides a web server that predicts Gene Ontology (GO) terms from a list of query sequences. PFP manages a large prediction coverage by retrieving annotations widely including from weakly similar sequences. The software uses predict function method which allows to consider sequences with a lack of annotated homologs in the database, extract and infers functional information. The tool can be combined with Extended Similarity Group (ESG) in a single interface.
Provides an improving specificity by accumulating contribution of consistently predicted Gene Ontology (GO) terms in an iterative search. ESG predicts a GO term with a high score if it appears many times consistently in the multiple searches including the initial search and the second level searches. The method can be applied in several multiple-domains and aims to improve predictions about functions derived from different domains. The tool can be combined with PFP (Protein Functional Prediction) in the a single interface.
Includes cross-references to other biological resources such as Pfam, SCOP, CATH, GO, InterPro and the NCBI taxonomy database. The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS) is focused on standardization of taxonomy information in the PDB based on the NCBI taxonomy database, and on adding cross-references to UniProtKB for all the protein sequences in the PDB that are present in the UniProt database. It has two main components—the semi-automated process that identifies the correct and up-to-date UniProtKB cross-reference for protein chains in the PDB and the automated pipeline that generates residue-level correspondences between proteins in the PDB and the corresponding UniProtKB sequence.
Allows prediction of protein-to-protein and phenotype-to-protein functional associations based on phylogenetic profiling. ProtPhylo achieves flexibility and state-of-the-art taxonomic and functional coverage by generating phylogenetic profiles. It concerns more than 9 million non-redundant protein sequences across over 2000 organisms and implements four independent orthology detection algorithms. In summary, this tool allows prediction of subcellular localization, protein domains, membrane spanning regions, and complementary evidence of protein-protein interactions (PPIs).
A support vector machine (SVM)-based method has been developed for predicting families and subfamilies of GPCRs from the dipeptide composition of proteins. The method classified GPCRs and non-GPCRs with an accuracy of 99.5% when evaluated using 5-fold cross-validation. The method is further able to predict five major classes or families of GPCRs with an overall Matthew's correlation coefficient (MCC) and accuracy of 0.81 and 97.5% respectively.
Allows users to determine various properties of each protein in an entire proteome. PA permits researchers to perform several tasks: (1) prediction of the GeneQuiz general function and Gene Ontology (GO) molecular function of a protein; (2) prediction of the subcellular localization; or (3) creation of a custom classifier to predict a new property. Moreover, this tool can be used for any user-specified ontology.
Serves for the extraction of sequence-driven features from the primary protein sequence followed by the application of a classification system trained on known animal toxins. ClanTox is a systematic scheme for proteome-wide prediction of toxin-like proteins. ClanTox can be used to rank the statistically significant sequences matching toxin-like criteria. This allows the user to focus on a relatively small fraction of high-confidence candidates.
An automated method for the prediction of protein function. CombFunc incorporates ConFunc, a function prediction method, with other approaches for function prediction that use protein sequence, gene expression and protein-protein interaction data.
A web server developed to predict protein function from a combination of three orthogonal approaches. Sequence similarity and domain architecture searches are combined with protein-protein interaction network data to derive consensus predictions for GO terms using functional enrichment. The INGA server can be queried both programmatically through RESTful services and through a web interface designed for usability. The latter provides output supporting the GO term predictions with the annotating sequences. The method has been evaluated by the CAFA assessors (2014) among the best predictors.