Computational protocol: MRCQuant- an accurate LC-MS relative isotopic quantification algorithm on TOF instruments

Similar protocols

Protocol publication

[…] Here we describe the MRCQuant algorithm for relative peptide isotope quantification on LC-MS. The input of the algorithm includes an LC-MS dataset and a list of peptides to be quantified annotated by their monoisotopic mass and/or amino acid sequence. The mass annotation can be obtained through an LC-MS peptide identification algorithm like msInspect. The output of the algorithm is a matrix of abundance measurements, with a maximum of P columns, where P is the total number of peptides to be quantified, and whose rows are indexed by cs * maxcs + iso, where cs ∈ [1, maxcs] represents charge state, maxcs is the maximum number of charge states considered, and iso represents the isotope position. For a given peptide, we need to first detect its LC peaks. A peptide at a given mass forms a series of 2 D peptide peaks at different isotope and charge state positions. These 2 D peaks form LC and MS peaks if they are viewed from the elution time and m/z dimension. To establish the connection between a group of 2 D peaks to a specific peptide mass, we need to verify that: 1. their LC peaks at different isotope and charge state positions should be the same; and 2. their MS peaks match a reference template translated to their expected m/z locations. After LC peak identification, we need to accurately detect LC peak boundaries and perform quantification. To accomplish these goals, the proposed algorithm performs the following: 1. Extracts or theoretically predicts reference templates. 2. For each peptide of interest, performs LC peak detection at its highest isotope and charge state position using a reference template. 3.Extracts peptide templates based on the MRC principle, which are used for accurate LC peak boundary detection and interference/noise removal at lower SNRs. Finally, quantification is performed based on peptide templates. The goal of the algorithm is to record accurate relative ion counts at all charge states and isotope positions.A flow diagram of the entire process is shown in Figure , which is explained in detail below. [...] We developed our algorithm based on an LC-MS dataset collected from a tryptic digest of horse myoglobin at a concentration of 600 fmol (unless noted, all illustrations in this paper are generated based on this dataset). For reference, we also obtained an LC-MS/MS dataset for peptide sequence information at 100 fmol. LC-MS/MS was performed with a splitless nanoLC-2 D pump (Eksigent), a 50 μm-i.d. column packed with 10 cm of 5 μm-o.d. C18 particles, and a linear ion trap tandem mass spectrometer (LTQ-XLS; ThermoFisher). The top 7 most abundant eluting ions were fragmented by (data-dependent) collision-induced dissociation (CID). The LC gradient was 2 to 98% 0.1% formic acid/acetonitrile in 60 min (60-120 min) at 400 nL/min. Tandem mass spectra were extracted by Mascot Distiller version 2.3.1. Charge-state-deconvolution and deisotoping were not performed. All MS/MS samples were analyzed using Mascot (Matrix Science, London, UK; version 2.3.2). Mascot was set up to search the Swiss-Prot database assuming the digestion enzyme trypsin. Mascot was searched with a fragment ion mass tolerance of 0.80 Da and a parent ion tolerance of 2.0 Da. Oxidation of methionine and iodoacetamide derivative of cysteine were specified in Mascot as variable modifications. LC-MS was performed with a splitless nanoLC-2 D pump (Eksigent), a 50 μm-i.d. column packed with 10 cm of 5 micro-o.d. C18 particles, and a time-of-flight mass spectrometer (MicrOTOF; Bruker Daltonics). The LC gradient was 2 to 98% 0.1%formic acid/acetonitrile in 60 min (60-120 min) at 400 nL/min. Mascot search correctly linked 13 peptides observed in the sample to horse myoglobin with an 80% sequence coverage.For algorithm verification, we downloaded a QTOF dataset from the repository of Seattle Proteome Center at http://regis-web.systemsbiology.net/PublicDatasets/. The repository was created for testing various algorithms. It contains LC-MS/MS datasets of an 18 protein digest. For details of data collection please refer to []. There are multiple LC-MS/MS datasets collected on various instruments within the repository. We downloaded datasets related to protein mixture 4 of the 18 protein mix. Among which, from a total of 21 runs on LTQ-FT, QStar and QTOF, we compiled a list of 784 LC-MS/MS-identified peptides for the same protein mixture. These peptides were all identified with a PeptideProphet™[] score greater than 0:9. We also performed LC-MS peak detection using msInspect on one of the QTOF datasets QT 20060925_mix4_23.mzxml (mix4_23) that identified 1952 peptides. Subsequently we quantify these peptides by MRCQuant. MsInspect was selected because it is the most representative LC-MS peptide identification and quantification algorithm and has been shown to outperform other peak detection algorithms []. It applies a conservative noise threshold initially. Subsequently, MS scans are centroided; XICs are smoothed; LC peak length filter is applied; and LC peaks that appear and disappear together are pooled and treated as signals registered by identical peptides at different isotope positions and charge states. Subsequently, peptides are identified by comparing their theoretically predicted isotope patterns and measured isotope patterns using Kullback-Leibler(KL) distance. Other popular software packages such as ASAPRatio [] differ slightly in the details, but the main procedure, MS peak detection in each MS scan followed by quantification based on XICs, is similar to that of msInspect. Among these software packages, msInspect provides relative quantification accuracy measurements in the form of KL distance, which enables us to compare performances. Other software packages do not provide this measurements, therefore, relative quantification accuracy cannot be accessed.When using the msInspect software package (Build 599) to process mix4_23 dataset, we tried to optimize the number of peptides being reported. We selected the "walksmooth" option when running the command "findPeptides", and we set msInspect parameters "minpeaks" to 2 and "maxkl" to 10. The "walksmooth" option greatly improves the number of features as well as the KL scores reported. A total of 1952 features were reported. In comparison, if the default settings of msInspect are used, 933 features were reported with worse KL scores.The peptides reported by msInspect were further processed by MRCQuant. We used extracted templates at high SNRs as reference MS templates. We rejected some msInspect reported features either because: their reported msInspect KL scores are negative, or our algorithm determines that the LC peaks reported by msInspect cannot be found. The latter case could be caused by inaccurate mass reporting by msInspect. When the mass is reported inaccurately, the reference template and the local MS signal would deviate from each other significantly, and our algorithm rejects LC peaks when the R statistic between the reference template and the local MS signal is less than 0.4. Correcting the incorrectly reported mass is a peptide identification problem which is beyond the scope of this paper. This results in a peptide list of length 964 with accurately reported mass values. […]

Pipeline specifications

Software tools msInspect, Mascot Distiller, PeptideProphet, ASAPRatio
Databases UniProt
Application MS-based untargeted proteomics
Diseases Multiple Sclerosis