Protein inference software tools | Mass spectrometry-based untargeted proteomics
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Due to the existence of degenerate peptides and 'one-hit wonders', it is very difficult to determine which proteins are present in the sample.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Provides machine learning and visualization methods for interrogating and analyzing on quantitative mass spectrometry (MS) data to infer protein sub-cellular localization. PRoloc is suited for spatial proteomics data analysis provided as an R package that performs sub-cellular localization prediction from experimental and condition-specific MS-based quantitative proteomics data. The software allows classification of proteins to tens of sub-cellular compartments.
Assists users in manual curation of tandem mass spectra. SILVER is a web application that utilizes an intensity-based machine-learning approach for scoring peptide-spectrum matches (PSMs). The software provides a visual representation of match quality between each candidate fragment ion and the observed spectrum. Moreover, this tool includes several examples illustrating its use.
An open source Java program for computational analysis of data independent acquisition (DIA) mass spectrometry-based proteomics data. DIA-Umpire enables untargeted peptide and protein identification and quantitation using DIA data, and also incorporates targeted extraction to reduce the number of cases of missing quantitation.
An open-source protein assembly tool that derives a minimum protein list from peptide identifications filtered to a specified false discovery rate. By segregating peptide identifications for thresholding using both the precursor charge state and the number of tryptic termini, IDPicker retrieves more peptides for protein assembly. The new version is more robust against false positive proteins, especially in searches using multispecies databases, by requiring additional novel peptides in the parsimony process.
A protein interaction network-assisted approach to improve protein identification in shotgun proteomics. The hypothesis of this method is that proteins eliminated as a result of insufficient experimental evidence are more likely to be present in the original sample if they exist in a clique enriched with confidently identified proteins.
Allows to analyse shotgun proteomics data. CPFP is a data analysis pipeline that aims to provide a simple interface for core facility staff and clients, and to fully automate the analysis of tandem mass spectrometry (MS/MS) data with multiple search engines. The software consists of a web application, relational database and collection of pipeline scripts. It can be installed locally or started in the Amazon Web Services (AWS) cloud.
Offers users a solution to transform protein sequence databases into peptide sequence databases. DBToolkit can serve to generate specialized databases for use as search-bases for protein identification. It furnishes option to automate the recognition of different database formats. This tool has been employed to construct a non-redundant, enzymatically digested peptide database with a bifunctional enzyme.
A Bayesian score that estimates the posterior probability of a protein's presence in the sample given its identification in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. MSpresso substantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by approximately 40%.
Combines peptide database P-values (DPVs) directly using a rigorous formula. RAId transforms the P-values of its associated peptides into Z-scores and the combined Z-score back to a final P-value with multiple hypotheses testing correction. It approximates DPVs by E-values, valid for small E-values, and retains all peptides whose E-values are less than one. This tool can be used to perform accurate type-I error control.
A linear programming model for protein inference. The main advantage of ProteinLP over other methods is the use of joint probability as the variable, which avoids modeling the protein probability and the conditional probability simultaneously so that the optimization formulation is greatly simplified.
A decoy-free protein-level false discovery rate (FDR) estimation method. Our method consists of three key steps: (i) generating random bipartite graphs with the same structure; (ii) calculating the protein scores on these random graphs; and (iii) calculating the permutation P value and final FDR.
Offers a web app and a corresponding application programming interface (API) service to define which peptide can be used to validate the existence of human proteins. NeXtProt peptide uniqueness checker is a comprehensive human-centric discovery platform, offering its users a seamless integration of and navigation through protein-related data. It validates the existence of human proteins based on several criteria, including peptide identification data from mass spectrometry (MS)-based proteomics experiments. This resource is designed to help researchers make sense of what all human proteins do.
Assists in processing precursor acquisition. PAcIFIC can process entire, predicted and soluble bacterial proteome without the need for any sample fractionation except than the C18-based liquid-chromatograph. It can reduce sample preparation to a minimum prior to fully automated liquid chromatography - mass spectrometry (LC-MS) and MS operation. This method is adaptable with standard instrumentation and software.
Matches de novo sequences to homologous proteins and overcomes many of the limitations of other sequence homology search algorithms. OpenSea was designed to align de novo sequences from all MS/MS spectra for a given experiment to database protein sequences, even in situations when de novo sequencing algorithms cannot account for an entire peptide sequence. The implementation of this algorithm can rapidly identify proteins from complex mixtures of peptides using de novo sequences derived via high mass accuracy tandem mass spectrometry.
Allows protein quantitation in mass spectrometry (MS) level. iTop-Q is an automated graphical tool that constructs extracted ion chromatograms (XICs) across multiple MS spectra for proteoform quantitation. The software also aligns the detected putative proteoforms across different replicates/samples for direct abundance comparison. It is implemented with a quantitation wizard that guides users to process the imported data step by step.
Assembles SEQUEST identifications and highlights the most significant matches. The accompanying Contrast tool compares DTASelect results from multiple experiments. The two programs improve the speed and precision of proteomic data analysis.