Computational protocol: Comparative Analysis of Mass Spectral Similarity Measures on Peak Alignment for Comprehensive Two-Dimensional Gas Chromatography Mass Spectrometry

Similar protocols

Protocol publication

[…] The peak alignment R package mSPA [] provides five peak alignment algorithms for users ( The five peak alignment algorithms are PAD, PAS, SW-PAD, DW-PAS, and PAM. Here PAD is a peak alignment procedure using solely the peak distance without window, and PAS performs the peak alignment based on the spectral similarity without window. SW-PAD and DW-PAS are window-based peak alignments. SW-PAD stands for the peak alignment with a similarity-based window, and DW-PAS aligns peaks using a distance-based window. Kim et al. [] further developed a mixture similarity measure (M d). That is, the mixture similarity score between a target peak t j and a reference peak r h is defined by (1)Md(tj,ri)=w1+Dd(tj,ri)+(1−w)·S(tj,ri), where w  (0 ≤ w ≤ 1) is a mixture weight factor, S(t, r) and D d(t, r) are a spectral similarity score and a distance measure between two peaks t and r, respectively. PAM is the peak alignment method using this mixture similarity without any window. The main difference of PAM over other approaches is the ability to use both the retention time distance and the mass spectral similarity at the same time without window. In addition, an optimization-based peak alignment, OP-PAM, is also incorporated in mSPA. OP-PAM is the optimal version of PAM and optimizes the mixture weight w and the distance measure. For further details refer to Kim et al. [].mSPA uses the cosine correlation as the main mass spectral similarity measure, although a user can choose Pearson's correlation coefficient as an option. mSPA also includes four distance measures, such as Euclidean (D 1), Maximum (D 2), Manhattan (D 3), and Canberra (D 4). Kim et al. [] showed that Canberra distance performs the best among them. However, it still remains unknown which similarity measure performs better for peak alignment. [...] For a fair comparison with mSPA, we used the same data as those of mSPA, which are a mixture of 76 compound standards and a set of real biological samples extracted from rat plasma. A mixture of 76 compound standards is composed of 10 GC×GC-MS data sets (S1–S10), and the rat plasma sample consists of five GC×GC-MS data sets (P1–P5). For a more detailed description of the data, please refer to Wang et al. []. We call the mixture of 76 compound standards Data I, which has 10 data sets, and the rat plasma data set Data II, which has 5 data sets. Theoretically, one peak should be generated for each compound after peak picking. Multiple peaks, however, are usually detected for one compound by the spectral deconvolution software such as ChromaTOF, which will generate a set of peak lists. Therefore, we merged the multiple peaks by peak area. In other words, we selected the peak with the largest peak area among the multiple peaks having the same compound name. The number of peaks before and after peak merging is summarized in . The chromatograms and the densities of the first and the second dimension retention times of Data I and Data II are depicted in . Note that the data and source code are available at […]

Pipeline specifications

Software tools mSPA, MrR, ChromaTOF
Application MS-based untargeted proteomics