Main logo
?
tutorial arrow
×
Submit new tools
Share tools covering the current topic. Provide easy-to-follow guidelines to improve their usability.
Share new tools with the community
Sign up for free to promote the availability of bioinformatics tools

Protein sequence clustering software tools

Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed, sensitivity, and readability of homology searches.Source text:(Hauser et al., 2013) kClust: fast and sensitive clustering of large…
G T A T C G C T A
UCLUST
Desktop

UCLUST

A clustering method that exploits USEARCH to assign sequences to clusters.…

A clustering method that exploits USEARCH to assign sequences to clusters. UCLUST is superior to CD-HIT. It is usually significantly faster, uses significantly less memory, can cluster at lower…

KEGG OC
Dataset

KEGG OC KEGG Ortholog Cluster

Provides ortholog clusters (OCs) based on the whole genome comparison. KEGG…

Provides ortholog clusters (OCs) based on the whole genome comparison. KEGG Ortholog Cluster employs a clustering method that was applied to all possible protein coding genes in all complete genomes,…

KAPPA
Desktop

KAPPA Key Aminoacid Pattern-based Protein analyzer

Automatically searches sequence in the fields of discovery and clustering of…

Automatically searches sequence in the fields of discovery and clustering of ‘X-rich proteins’. KAPPA extracts and compares cysteine patterns by means of a quantitative similarity index called…

DACE
Desktop

DACE DP-means Algorithm for Clustering Extremely large sequencing data

Permits to efficiently cluster extremely large sequencing data for de novo…

Permits to efficiently cluster extremely large sequencing data for de novo operational taxonomic units (OTUs) picking. DACE is a scalable parallel DP-means algorithm with a distance preserving random…

OGCleaner
Desktop

OGCleaner

Filters putative homology clusters of amino acid sequences by using machine…

Filters putative homology clusters of amino acid sequences by using machine learning algorithms based on annotated orthology clusters. The OGCleaner is designed for homology cluster filtering by…

VSEARCH
Desktop

VSEARCH

Processes and prepares metagenomics, genomics and population genomics…

Processes and prepares metagenomics, genomics and population genomics nucleotide sequence data.. VSEARCH is an alternative to the USEARCH tool. It includes most commands for analysing nucleotide…

Gc
Desktop

Gc Granular clustering

A clustering algorithm to obtain partial protein models which is based on the…

A clustering algorithm to obtain partial protein models which is based on the granular clustering paradigm. The general principles of GC are as follows: primitive information granules are created…

USEARCH
Desktop

USEARCH

Allows to search and cluster algorithms that are often orders of magnitude.…

Allows to search and cluster algorithms that are often orders of magnitude. USEARCH is a sequence analysis software which contains different algorithms.

FastaHerder
Web

FastaHerder

Clusters protein databases by aggregating nearfull-length homologs. FastaHerder…

Clusters protein databases by aggregating nearfull-length homologs. FastaHerder is an application to gather sets of protein sequences and to mine those clusters. This web app adds two clustering…

DASP3
Desktop

DASP3

Identifies sequences from databases that share motifs similar to a query active…

Identifies sequences from databases that share motifs similar to a query active site profile. DASP3 is a modification of previously published software, Deacon Active Site Profiler (DASP). DASP3 is…

CABRA
Web

CABRA Cluster and Annotate Blast Results Algorithm

Provides a shortcut to the evaluation of a BLAST result where its clustering of…

Provides a shortcut to the evaluation of a BLAST result where its clustering of hits allows a quick classification. CABRA integrates the advantages of a BLAST search and FastaHerder clustering…

BASID2CS
Dataset

BASID2CS The basidiomycetes Two Componen Systems repository

A pipeline web server that extends the analysis to the complete genome…

A pipeline web server that extends the analysis to the complete genome sequences of basidiomycetes. BASID2CS has been specifically designed for the identification, classification and functional…

CD-HIT
Desktop
Web

CD-HIT

A widely used program for clustering biological sequences to reduce sequence…

A widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of…

Arion 4 Omics
Desktop

Arion 4 Omics

A high performance, ‘end-to-end’ analysis pipeline for the classification…

A high performance, ‘end-to-end’ analysis pipeline for the classification of omics profiles. Incorporating highly parallel architecture and sophisticated database technologies to overcome…

CLAP
Web

CLAP

A web server for automatic classification of protein sequences. It uses an…

A web server for automatic classification of protein sequences. It uses an alignment free approach to compute local similarities among sequences. This method is particularly useful for comparing…

CLUSS
Desktop

CLUSS

An alignment-free algorithm for clustering protein families. It is effective on…

An alignment-free algorithm for clustering protein families. It is effective on both alignable and non-alignable protein families.

kClust
Desktop

kClust

A method to cluster large protein sequence databases such as UniProt within…

A method to cluster large protein sequence databases such as UniProt within days down to 20%-30% maximum pairwise sequence identity.

PASS
Desktop

PASS Protein Assembler with Short Sequence peptides

A proteomics application for de novo assembly of millions of very short (6 aa)…

A proteomics application for de novo assembly of millions of very short (6 aa) to longer (100 aa) peptide sequences and beyond. PASS is derived from the popular genome assembler SSAKE, an…

fast protein…
Desktop

fast protein cluster

A toolkit to cluster 60 000 sets of protein models generated by the Nutritious…

A toolkit to cluster 60 000 sets of protein models generated by the Nutritious Rice for the World project. fast_protein_cluster implements k-means, and hierarchical clustering methods using root mean…

MagicMatch
Desktop

MagicMatch

Maps sequence identifiers across databases. MagicMatch uses the MD5 checksum…

Maps sequence identifiers across databases. MagicMatch uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map…

Information

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.