1 - 6 of 6 results

GRASP / Guided Reference-based Assembly of Short Peptides

Identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASPx is a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. The program achieves >30X speedup compared to its predecessor GRASP. GRASPx was applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. It allows assembly and search of homologous reads with respect to all protein sequences encoded in a bacterial genome against a moderate-sized metagenomic data set (e.g. ~40 million reads and ~100 bp per read) within approximately 12 h using 16 threads.


Makes ancestral sequence reconstruction easy. PhyloBot provides interactive tools to explore evolutionary trajectories between ancestors, enabling the rapid generation of hypotheses that can be tested using genetic or biochemical approaches. Users can create and submit jobs on the free server, or use the open-source code to launch their own server. The tool has been used to discover genetic mechanisms underlying biochemical diversity in several protein families, including protein kinases, DNA-binding transcription regulators, and transmembrane ion pumps.

OPAR / Optimistic Protein Assembly from Reads

Salvages reads from metagenomics datasets which could be distantly related to another characterized species. OPAR is an online resource that contributes to the discovery and characterization of new viral species. This method is based on the assumption that the nucleotide sequences of genes encoding homologous proteins show faster phylogenetic change than the amino acid sequences. OPAR is designed to be simple and straightforward for all potential users.


Significant computational improvements to the short peptide assembly algorithm that make it practical to reconstruct proteins from large metagenomic datasets containing several hundred million reads, while maintaining accuracy. SFA-SPA has four stages: (1) construction of a de Bruijn (or k-mer) graph from the set of short peptide sequences (henceforth called reads), and its subsequent traversal in order to identify a set of initial paths, (2) extension and merging of these paths, (3) clustering of highly similar paths in the resulting path set, and (4) recruitment of unassigned reads to these paths.