Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models.
Finds virtually all type IV pilins in a manually curated test set. PilFind can serve for genome-wide analyzing of user-supplied bacterial sequences. It can find genes that encode pilin candidates and prepilin peptidase cleavage sites in any given bacterial genome. This tool can be used to recognize candidate virulence factors among the thousands of proteins encoded by any given pathogen genome.
Uses transcript translations and reference gene annotations to identify the genomic loci of peptides and post-translational modifications. PoGo is a tool for mapping peptides identified through mass spectrometry to a reference genome to overcome these limitations. It exhibited superior performance over other tools on benchmarking with large-scale human tissue and cancer phosphoproteome datasets. Additionally, extended functionality enables representation of single nucleotide variants, post-translational modifications and quantitative features.
Predicts a peptide’s proteotypic propensity based on its physico-chemical properties. PeptideSieve (i) performs an in silico digest of the protein, (ii) converts each of the peptides into chemical property strings, and (iii) computes a likelihood function, which scores the likelihood each peptide is proteotypic. The resulting predictors have the ability to accurately identify proteotypic peptides from any protein sequence and offer starting points for generating a physical model describing the factors.
Allows users to build personalized protein databases (PPDs) using variant calls from next-generation genomic sequencing (NGS) data. GenPro was applied to the study of allelic-specific protein (ASP) abundance in human brain. It demonstrates that it is suitable to use PPDs in standard search algorithms. This tool assumes each base called have the same probability of being correct by imposing probability threshold for every base called.
A proteogenomic pipeline that delineates true in vivo proteoforms and generates a protein sequence search space for peptide to MS/MS matching. PROTEOFORMER can be combined with canonical protein databases or used independently for identification of novel translation products. The pipeline makes use of the recently developed next generation sequencing strategy termed ribosome profiling (RIBO-seq) that provides genome-wide information on protein synthesis in vivo.
A complete, open source pipeline for proteogenomics which incorporates a graphical user interface and implements the Proteomics Standards Initiative mzIdentML standard for each analysis stage. All steps are included as standalone modules with the mzIdentML library, allowing other groups to re-use the whole pipeline or constituent parts within other tools.
A proteogenomic approach, GenoMS, that draws on the strengths of database and de novo peptide identification methods. Protein sequence templates (i.e. proteins or genomic sequences that are similar to the target protein) are identified using the database search tool InsPecT. The templates are then used to recruit, align, and de novo sequence regions of the target protein that have diverged from the database or are missing.