Protein similarity network generation software tools | Protein interaction data analysis
Protein similarity networks are graphical representations of sequence, structural and other similarities among proteins for which pairwise all-by-all similarity connections have been calculated. Mapping of biological and other information to network nodes or edges enables hypothesis creation about sequence-structure-function relationships across sets of related proteins.
Creates and manages protein similarity networks. Pythoscape provides several options to calculate pairwise similarities for input sequences or structures, applies filters to network edges and defines sets of similar nodes and their associated data as single nodes (termed representative nodes) for compression of network information and output data or formatted files for visualization.
Gathers composite and component gene families and reduces the risk of outputting a large number of false positives. MosaicFinder is based on the graph theoretic tool of clique separator decomposition. It allows users to identify genomes families directly in the similarity network and to generate few false positives. It is reliable for studying fusion events for phylogenetic research as well as for functional biology.
Predicts specific protein function via a model that utilizes protein network and considers gene ontology (GO). Effusion exploits a network of partially characterized sequences to provides function predictions. In addition to the GO, it uses sequence similarity networks (SSNs) and applies probabilistic graphical models (PGMs) in a way that is extensible.
Serves for protein classification. MOCASSIN-prot utilizes quantitative sequence similarity information from all domains on the proteins and builds a network that houses clusters of similar protein sequences. It is scalable to the complete proteome level. It incorporates information from all domains, leading to clusters that are more consistent with the UniProt family assignments.
Detects composite gene families in large data sets. CompositeSearch is a program allowing identification of composite gene families in the range of several million sequences. This tool assists users in the investigation of the process of gene remodeling in large data sets, for example metagenomes and thousands of complete genomes. It provides descriptions regarding the distribution and primary sequence conservation of gene families, permitting critical biological analyses of data.
Generates protein similarity networks to be used with Cytoscape. PANADA allows the user to either automatically search similar sequences or to generate a network with a set of selected proteins. The similarity networks can be used for the visual analysis of similarity relationships among sequences or to assess functional annotation inferred from homology. PANADA complements other more traditional tools such as phylogenetic trees and multiple sequence alignments, making use of the user's visual skills to identify patterns that allow the inference of novel properties. The main advantages consist in the automatic search and annotation of proteins with gene ontology (GO) terms from the database and the ability to choose two different approaches to prune the network topology. This produces networks that only contain edges for those pairwise comparisons that represent the highest similarities above a given threshold.
Allows users to detect many of the family/superfamily relationships. SCPS provides an implementation of the spectral clustering algorithm requiring no background knowledge in programming or in the details of spectral clustering algorithms. It can calculate different cluster quality scores and it can produce publication-quality graphical representations of the clusters obtained.
Furnishes a quantitative measure of support values to the branching processes. SCANNET is able to discover communities in generally weighted complex networks. It leads to the phylogenetic classification for the organisms associated to the protein sequences for protein similarity networks. This tool constructs: similarity matrices, protein similarity networks, adjacency matrices, neighborhood matrices and can characterize the properties of the critical network.
Facilitates analysis of sequence function space in enzyme families using sequence similarity networks (SSNs). EFI-EST is a web-server that generates SSNs in a predominately automated manner. The software allows users to explore local sequence-function space defined by a user-specified sequence and to generate the SSN for any Pfam or InterPro entry. It can be used to analyze sequence-function space in a functionally diverse enzyme superfamily.
Retrieves unit cells sharing characteristics with data stored into the Protein Data Bank. CRYST offers a web application that permits researchers to submit personal files or manual parameters. Users also have the possibility to add information relative to the number of amino acids instead of the molecular weight.
Generates accurate protein families using the Markov Cluster (MCL) formalism for graph clustering by flow simulation. TRIBE-MCL is an algorithm that allows the efficient and rapid clustering of any arbitrary set of protein sequences, given a list of all pairwise similarities obtained by another method, such as BLAST. TRIBE-MCL does not require any explicit knowledge of protein domains to detect protein families.
Provides a degree-normalized network connectedness metric inspired by network communicability and suitable for analysis of large complex networks. NetComm is an implementation of this method for the human protein protein interaction (PPI) network. This package provides a straightforward approach to computation even on large networks. It can also be useful in the analysis of a variety of biological networks.