Provides access to a variety of public and in-house bioinformatics tools. The MPI Bioinformatics Toolkit integrates a selected set of most useful methods for the analysis of protein sequences and structures. It offers more of 50 interconnected tools, so that the results of one tool can be forwarded to other tools. It also includes a useful platform for teaching bioinformatic enquiry to students in the life sciences.
Significant computational improvements to the short peptide assembly algorithm that make it practical to reconstruct proteins from large metagenomic datasets containing several hundred million reads, while maintaining accuracy. SFA-SPA has four stages: (1) construction of a de Bruijn (or k-mer) graph from the set of short peptide sequences (henceforth called reads), and its subsequent traversal in order to identify a set of initial paths, (2) extension and merging of these paths, (3) clustering of highly similar paths in the resulting path set, and (4) recruitment of unassigned reads to these paths.
Makes ancestral sequence reconstruction easy. PhyloBot provides interactive tools to explore evolutionary trajectories between ancestors, enabling the rapid generation of hypotheses that can be tested using genetic or biochemical approaches. Users can create and submit jobs on the free server, or use the open-source code to launch their own server. The tool has been used to discover genetic mechanisms underlying biochemical diversity in several protein families, including protein kinases, DNA-binding transcription regulators, and transmembrane ion pumps.
Identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASPx is a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. The program achieves >30X speedup compared to its predecessor GRASP. GRASPx was applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. It allows assembly and search of homologous reads with respect to all protein sequences encoded in a bacterial genome against a moderate-sized metagenomic data set (e.g. ~40 million reads and ~100 bp per read) within approximately 12 h using 16 threads.
Infers ancestral protein sequences accounting for selection on protein stability. ProtASR generates site-specific substitution matrices through the structurally constrained mean-field substitution model (MF), which considers both unfolding and misfolding stability. It can be applied to estimate the history of protein stability in protein families. It runs in the same time as empirical models that do not consider structural constraints.
Assembles protein sequences from their constituent peptide fragments identified on short reads. The SPA algorithm is based on informed traversals of a de Bruijn graph, defined on an amino acid alphabet, to identify probable paths that correspond to proteins.
Assembles short read (6-frame-translated) sequencing data on a protein level. PLASS leans on a graph-free, greedy iterative assembly strategy that enables overlap-based assembly on a single server. It suits for large-scale metagenomic works by facilitating soil metagenomics. This tool also enhances homology detection, protein function annotation and protein structure prediction with the enrichment of multiple sequence alignments with various homologs.