How to run proteomics analysis with mass-spectrometry

Proteins are an essential component of living organisms. In the systems biology era, the large-scale analysis of proteins and proteomes has a lot of applications, such as identification and quantification of proteins, study of post-translational modifications, protein structure, protein-protein or protein-nucleic acid interactions.


Here, we will focus on mass-spectrometry, the leading technology for high-throughput proteomics, and how it is used for protein characterization. 

High-throughput proteomics

Solving the complexity of gene expression at the whole-organism level with genomics and transcriptomics was thought to be the key to understanding the processes of life. However, protein expression does not always correlate with gene expression because mRNA is not always translated into protein, because a single transcript can code for several proteins, and because post translational modifications can drastically modify the function of a protein.


Proteomics are the next step in the study of biological systems. However, analyzing the proteome is much more difficult than the genome or transcriptome, because each cell expresses its own set of proteins.


Mass spectrometry (MS) has emerged as the most important and popular tool to identify, characterize, and quantify proteins and their post-translational modifications with high throughput and on a large scale (Zhang et al.).

Mass-spectrometry for proteomics

Several technics can be used to separate proteins and peptides. Gel-based techniques were traditionally used as a standard procedure for proteomics, however they are time-consuming and labor-intensive. Two-dimension liquid-chromatography (LC) coupled with mass-spectrometry (LC-MS) is the leading technology for high-throughput proteomics. LC is used to separate proteins from different samples in parallel, and is then followed by selection and staining of differentially expressed proteins to be identified by tandem mass spectrometry.


There are two main approaches in proteomics analysis, according to upstream sample preparation methods (figure 1):


  • Bottom-up (peptide-level approach) is the most common approach. Samples are digested into peptides prior to LC-MS analysis.


  • Top-down proteomics, where intact proteins are directly analyzed by LC-MS. This approach is used to study post-translational modifications at the whole-protein level.


Both approaches can be performed in a quantitative approach, when different protein mixture samples are labeled (usually with stable isotope tags) and pooled and analyzed together using LC-MS.

How to run proteomics analysis with mass-spectrometry
Figure 1: General workflow for MS-based high-throughput bottom-up and top-down proteomics. From Zhang et al.

LC-MS-based proteomics workflow and analysis steps

 In LC-MS-based bottom-up and untargeted proteomics, a typical workflow will consist of digesting proteins into peptides, followed by chromatographic separation and MS-based analysis, to yield a list of detected peaks characterized by their retention time, mass-over-charge ratio (m/z) values, and intensities (Tsai et al.).


LC-MS experimental steps and top software tools include:

  • Noise filtering: In this first step, electronic and chemical noises are filtered to facilitate peak detection. Top software: Mzmine 2.


  • Deisotoping: In LC-MS, each peptide is characterized by an envelope of ion peaks due to its constituent amino acids. Deisotoping is used to integrate sibling of ion peaks due to naturally occurring isotopes of chemical elements. Top software: DeconTools.


  • Peak detection: LC-MS generates peaks that are determined in a specific range of retention time and m/z value. Peak detection is a crucial step for further protein identification. Top software: OpenMS, MZmine.


  • Normalization: A normalization step is performed to remove any analytical and technical variability of the data. Various quality control methods are used to correct variability in LC-MS. Top software: MaxQuant.


  • Retention time alignment and peak matching: Peak matching step groups consensus peaks across multiple LC-MS runs prior to analysis, to ensure a valid comparison of the runs. Retention times are also aligned to compare different runs. Top software: OpenMS, msInspect, MZmine, MaxQuant.


Considering the number of preprocessing steps involved after LC-MS, a number of pipelines and all-in-one tools have been developed, including OpenMS, msINspect, MZmine, and MaxQuant.

Concluding remarks

Proteomics is a constantly improving field and is now approaching transcriptomics in terms of gene coverage. Advances in upstream separation methods and MS instrumentation will greatly improve proteome depth coverage. Future challenges will include the integration of different level of omics data, i.e transcriptomics, proteomics, and metabolomics at the system-level.


Zhang et al. (2014). High-Throughput Proteomics. Annu. Rev. Anal. Chem.

Tsai et al. (2016). Preprocessing and Analysis of LC-MS-Based Proteomic Data. Methods Mol Biol.