Protein domain prediction software tools | Sequence analysis
Protein domains are conserved and distinct protein sequences and structures that can function independently of the rest of the protein. Protein domains often have specific function or interaction and contribute to the activity of the protein. Protein domain prediction tools use protein sequence and biochemical properties such as hydrophobicity combined with algorithm to predict and identify domains.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Finds evolutionarily related proteins and/or domains, close and remote homologs. HMMER is based on profile hidden Markov models (HMMs) and gathers four algorithms: phmmer, hmmscan, hmmsearch, and jackhmmer. It assists users in the detection of protein sequence conservation, function, and evolution. This tool can be useful for functional annotations. It offers a solution to make research on protein sequence databases.
A fast and simple method to identify globular domains in protein sequence, based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method successfully identifies sequence regions that will form a globular structure and those that are likely to be unstructured. The method does not rely on homology searches and, therefore, can identify previously unknown domains for structural elucidation.
Combines transmembrane topology and signal peptide predictions. Phobius provides an easy and accurate mean to predict signal peptides and transmembrane topology from an amino acid sequence. Phobius makes an optimal choice between transmembrane segments and signal peptides, and also allows constrained and homology-enriched predictions.
Offers search options for the identification of peptides and proteins in proteomics experiments. Comet implements the fast cross-correlation algorithm to score peptide sequences against experimental tandem mass spectra. The spectral pre-processing implemented in the fast-correlation algorithm eliminates the need for creating and storing theoretical spectra. Comet facilitates the analysis method of combining search results from multiple search engines.
Detects amino acid or regions under positive selection using a sliding window KA/KS analysis. SWAKK is a web application that can perform three features: (i) provides a 3D structures from sequence alignment and KA/KS calculation; (ii) analyze on the primary sequence for both comparison and structure unavailability; (iii) determine natural selection in an ancestral branch of a phylogenic tree from two inferred sequences.
Discovers complete domains within protein sequences. Global makes alignment of individual blocks with a query protein sequence containing gapless local alignment. It offers a bridge between hidden Markov models (HMMs) and the wealth of statistical and computational techniques available for classical alignment. This tool can compute its E-values by dynamic programming.
A method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins.
Predicts structure of continuous domain (CD) and discontinuous domain (DCD). ThreaDomEx identifies multiple structure templates and then, derives a profile of domain conservation score (DCscore) for domain-segment assignment. It was tested on a set of 1111 proteins and shows better results to extract normalized domain overlap score compared to other state-of-the-art methods.
Improves domain predictions for the genome of the poorly annotated malaria parasite plasmodium falciparum. dPUC incorporates pairwise context scores between domains, along with traditional domain scores and thresholds, and improves domain prediction across a variety of organisms from bacteria to protozoa and metazoa. Among the genomes tested, dPUC is most successful at improving predictions for the poorly-annotated malaria.
Allows prediction of helical linkers. Fast H-DROP is an accelerated version of H-DROP, a support vector machine (SVM)-based tool aiming at specifically predicting helical linkers. The software was tested using an independent dataset consisting of 76 visually inspected helical linkers containing multidomain proteins and a set of sequences classified as single domain proteins according to SCOP 1.73. It can assist users in analyzing novel domains connected by helical linkers.
A comprehensive domain visualization tool which combines the best available search algorithms and databases into a user-friendly framework. First, a given protein sequence is matched to domain models using high-specificity tools and only then unmatched segments are subjected to more sensitive algorithms resulting in a best possible comprehensive coverage. Bulk querying and rich visualization and download options provide improved functionality to domain architecture analysis.
Discovers position specific scoring matrices (PSSMs). PoSSuMsearch is a non-heuristic algorithm that can generate PSSMs from aligned sequences. It employs enhanced suffix arrays, a data structure which is as powerful as suffix trees. This tool is able to compute the E-value for a known background distribution and length of the database by exhaustive enumeration of all substrings. It can prevent rounding errors for integer based PSSMs.
A template-based algorithm for protein domain boundary prediction. Given a protein sequence, ThreaDom first threads the target through the PDB library to identify protein template that have similar structure fold. A domain conservation score (DCS) will be calculated for each residue which combines information from template domain structure, terminal and internal gaps and insertions. Finally, the domain boundary information is derived from the DCS profile distributions. ThreaDom is designed to predict both continuous and discontinuous domains.
An application that unifies protein domain annotation, domain arrangement analysis and visualization in a single tool. DoMosaics simplifies the analysis of protein families by consolidating disjunct procedures based on often inconvenient command-line applications and complex analysis tools. It provides a simple user interface with access to domain annotation services such as InterProScan or a local HMMER installation, and can be used to compare, analyze and visualize the evolution of domain architectures.
Allows to conduct global tests in proteomics experiments. RepeatedHighDim is based on a mixed linear model combined with a permutation procedure and missing values imputation. It is able to detect differences between possible omitted experimental groups by using standard protein-wise test in proteomics experiments. The tool aims to facilitate the biological interpretation of a proteomics experiment. It permits the ranking of Gene Ontology (GO) terms related to certain protein sets.
A program to predict inter-domain linker regions solely by amino acid sequence information. The prediction is made by using linker index deduced from a data set of domain/linker segments. The linker preference profile, which is the averaged linker index along a sequence, can be visualized in the graphical interface.
A support vector machine (SVM)-based domain linker predictor which was trained with 25 optimal features. The optimal combination of features was identified from a set of 3000 features using a random forest algorithm complemented with a stepwise feature selection. DROP demonstrated a prediction sensitivity and precision of 41.3 and 49.4%, respectively. These values were over 19.9% higher than those of control SVM predictors trained with non-optimized features, strongly suggesting the efficiency of our feature selection method.
Treats protein domain architecture prediction as a multi-objective optimization problem. By taking into account known architectural solutions, DAMA identifies them within the protein sequence and integrates new domains into them whenever possible. DAMA has been evaluated over a benchmark containing protein sequences extracted from the Protein DataBank (PDB), over the genome of the poorly annotated malaria parasite Plasmodium falciparum and over two datasets collecting known sequences characterized by large domain architectures and repeated blocks of domains. Our results show that, for all datasets, DAMA outperforms existing computational methods and detects domain architectures presenting co-occurrences.
Assists in prioritization and dereplication of nonribosomal peptide synthetases (NRPs) within large datasets. SANDPUMA can prioritize novel scaffolds and analogs within superfamilies of interest greatly increases the power of genomic natural product discovery efforts. It was created for automatic retraining to ensure its training data remains comprehensive as more NRPS biosynthetic gene clusters (BGCs) are experimentally characterized in Minimum Information about a Biosynthetic Gene Cluster (MIBiG) database.