1 - 50 of 74 results

BLAST / Basic Local Alignment Search Tool

star_border star_border star_border star_border star_border
star star star star star
Allows to align query sequences against those present in a selected target database. BLAST is a suite of programs, provided by NCBI, which can be used to quickly search a sequence database for matches to a query sequence. The software provides an access point for these tools to perform sequence alignment on the web. The set of BLAST command-line applications is organized in a way that groups together similar types of searches in one application.

SMS / STING Millennium Suite

Provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). It is described in terms of a solution that brings together a number of protein analysis tools at a single web server. SMS is a very powerful tool which enables a quick estimate of the level of engagement for each amino acid within its own protein chain and functionally more importantly, in the mechanism of binding to substrate and/or inhibitor.


Acts as simple and intuitive interface between PyMOL and several bioinformatics tools (i.e., PSI-BLAST, Clustal Omega, MUSCLE, CAMPO, PSIPRED, and MODELLER). PyMod builds homology models through the popular MODELLER package. Starting from the amino acid sequence of a target protein, users may take advantage of PyMod to carry out the three steps of the homology modeling process (that is, template searching, target-template sequence alignment and model building) in order to build a 3D atomic model of a target protein (or protein complex). Additionally, PyMod may be used outside the homology modeling context, in order to extend PyMOL with numerous types of functionalities. Sequence similarity searches, multiple sequence-structure alignments and evolutionary conservation analyses can all be performed in the PyMod 2.0/PyMOL environment.

C-HMM / C-Hidden Markov Models

Identifies remote homologues from any protein sequence database. C-HMM approach, based on hidden markov models, has been demonstrated to be powerful in identifying connections across protein domains that share the same fold but are structurally and functionally very diverse. It could cover 94%, 83% and 40% coverage at family, superfamily and fold levels respectively, when applied on diverse protein folds. We recommend its usage for genome annotation pipelines due to its speed, reliability, efficiency and database independency.

SRD / Sequence Relation Drawing program

Supports the graphical visualization of results generated from sequence relationship analysis based on undirected graphs, for a wide range of either nucleic acid sequences or peptide sequences. SRD consists of two components: a Window-based application and a computerized database. It helps to investigate traits of the spread of the AIDS disease, which may help biologists or clinicians to control the AIDS disease transmission in molecular epidemiology study.


Explores simultaneously protein sequence space and protein structure space by cross-modal learning. CMsearch has several advantages over existing methods: (i) instead of exploring a single space built from the mixture of sequence and structure similarities, CMsearch builds two separate spaces and explores the two spaces simultaneously. (ii) CMsearch is completely different from threading methods because it uses not only sequence and structure information, but also Xuefeng Cuisequence and structure space information. (iii) CMsearch is a generic framework such that any sequence similarity metric and any structure similarity metric can be adopted.


A Python programming interface for the RCSB Protein Data Bank (PDB) that allows search and data retrieval for a wide range of result types, including BLAST and sequence motif queries. The API relies on the existing XML-based API and operates by creating custom XML requests from native Python types, allowing extensibility and straightforward modification. The package has the ability to perform many types of advanced search of the Protein Data Bank that are otherwise only available through the PDB website.

SparkBLAST / Spark Basic Local Alignment Search Tool

Allows to parallelize and manage the execution of BLAST either on dedicated clusters or cloud environments. SparkBLAST is based on cloud computing for the provisioning of computational resources and uses Apache Spark as the coordination framework. It was evaluated on both Google and Microsoft Azure Clouds, for several configurations and dataset sizes. The tool is able to achieve, in average, a maximum speedup of 41.78, reducing the execution time from 28,983 s in a single node, to 693 s in 64 nodes.

PFASUM / PFAm SUbstitution Matrix

Allows accurate detection of homologous protein sequences and for scoring and constructing high quality protein multiple sequence alignments (MSAs). PFASUM is based on the manually curated Pfam seed alignments using a novel algorithm. It relies on state-of-the-art expert ground truth data which covers a large and diverse sequence space. The tool is able to handle unfiltered MSAs and ambiguous amino acid symbols and thus prevents the loss of potentially important information.


A hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. CUDAMPF achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other acceleration attempts, comprehensive evaluations against high-end CPUs (Intel i5, i7 and Xeon) shows that CUDAMPF yields up to 440 GCUPS for SSV, 277 GCUPS for MSV and 14.3 GCUPS for P7Viterbi all with 100 % accuracy, which translates to a maximum speedup of 37.5, 23.1 and 11.6-fold for MSV, SSV and P7Viterbi respectively.

CABRA / Cluster and Annotate Blast Results Algorithm

Provides a shortcut to the evaluation of a BLAST result where its clustering of hits allows a quick classification. CABRA integrates the advantages of a BLAST search and FastaHerder clustering algorithm into a single pipeline by annotating BLAST results clusters. Simplification and annotation of the results of a typical similarity search, as well as its ease of use and speed, provide an appropriate method of one-dimensional similarity search. The ability to set the identity threshold to group together query-like sequences is also an improvement over current clustering approaches.


A fast protein similarity search tool that utilizes a filter step for candidate selection based on shared k-mers and a comparison measure using a binary matrix of co-occurrence of amino acid residues. RAFTS3 performed searches many times faster than those with BLASTp against large protein databases, such as NR, Pfam or UniRef, with a small loss of sensitivity depending on the similarity degree of the sequences. RAFTS3 is an alternative for fast comparison of protein sequences, genome annotation and biological data mining.

ASD / Amplitude Spectrum Distance

Compares protein fragments based on the discrete Fourier transform of their Cα distance matrix. ASD can be computed efficiently and provides a parameter-free measure of the global shape dissimilarity of two fragments. It inherits from nice theoretical properties, making it tolerant to shifts, insertions, deletions, circular permutations or sequence reversals while satisfying the triangle inequality. The practical interest of ASD with respect to RMSD, RMSDd, BC and TM scores is illustrated through zinc finger retrieval experiments and concrete structure examples.


A method for sequence-based protein homology detection that compares two protein sequences or families through alignment of two Markov random fields (MRFs), which model the multiple sequence alignment (MSA) of a protein family using an undirected general graph in a probabilistic way. The MRF representation is better than the extensively-used PSSM and HMM representations in that the former can capture long-range residue interaction pattern, which reflects the overall 3D structure of a protein family.

VOCS / Viral Orthologous Clusters

Searches and extracts genome, gene and protein data from the Viral Bioinformatics Resource Center (VBRC) database. VOCS includes Asfarviridae, Baculoviridae, Iridoviridae, and Poxviridae. It compares genomes and sets of genomes, finds gene families represented in all poxvirus genomes (core poxvirus genes). It also finds gene families present in variola viruses, but not in cowpox or vaccinia viruses (potential virulence genes). The VBRC uses an Administrator version of VOCs to add genomes, annotate genes and classify gene families.

SCI-PHY / Subfamily Classification In PHYlogenomics

A pipeline for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression.

DELTA-BLAST / Domain Enhanced Lookup Time Accelerated BLAST

Searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. DELTA-BLAST is a useful program for the detection of remote protein homologs. This tool employs a subset of NCBI’s Conserved Domain Database (CDD). First uses RPS-BLAST to align a query sequence to conserved domains in CDD, and then performs a sequence database search using a PSSM derived from the aligned domains.

BAM / BioAssemblyModeler

Constructs protein homo- and heterooligomers structures. BAM simulates protein complexes comprising several subunits of each sequence in accordance with their arrangement in the template biological assembly. Then, it attributes Pfam domains to the queries and identifies templates and the content of their biological assemblies. Lastly, the software uses profile-profile alignment of the targets and the templates and the SCWRL4 software for design the coordinates of the protein complexes.

MACSIMS / Multiple Alignment of Complete Sequences Information Management System

Allows management of all the information related to a protein family. MACSIMS is a multiple alignment-based information management system that combines knowledge-based methods with complementary ab initio sequence-based predictions for protein family analysis. The software can be used to integrate information from different domains, such as genetics, structural biology, proteomics or interactomics experiment. It incorporates the JalView alignment editor for graphically displaying the results.