Computational protocol: Integrated Proteomic Analysis of Human Cancer Cells and Plasma from Tumor Bearing Mice for Ovarian Cancer Biomarker Discovery

Similar protocols

Protocol publication

[…] Data analysis was performed using the Computational Proteomics Analysis System . Searches were performed using cysteine alkylation modification with the light form of acrylamide as a fixed modification and the heavy form of acrylamide (+3.01884) as a variable modification. Spectra were searched using X!Tandem configured with the comet score module plug-in against the mouse IPI database version 3.29. A search for tryptic peptides was performed with a semitryptic refinement option where a second round of searching is performed for semitryptic peptides from proteins identified in the first round of searching.Quantitative ratios were obtained for peptides containing cysteine residues labeled with heavy and light acrylamide isotopes. Quantitative information was extracted from acrylamide labeled peptides using an in-house script (Q3); this allowed us to obtain the relative quantification from MS1 spectra for each pair of peptides identified by MS/MS that contains cysteine residues . Calculation of ratios between cancer and normal were fraction-centric (per LC-MS/MS run). All identified peptide measured acrylamide ratios were processed such that multiple measurements for a given peptide in one individual fraction were log2 averaged, resulting in a dataset containing one ratio per peptide per each individual fraction. A global normalization factor was then computed as the mode of the peptide ratio histogram. All peptide ratios for a specific protein present in a particular fraction were then normalized and log-averaged to obtain the local relative protein ratio. Statistical significance of protein quantitation was assigned by two methods as described below. [...] Data was interrogated using Ingenuity Pathways Analysis (Ingenuity Systems®, www.ingenuity.com) and MetaCore from GeneGo Inc (www.genego.com). A dataset containing IPI accession numbers and the corresponding cancer-to-control ratios was uploaded into each application where all 1031 proteins identified in the IPAS experiment were used as a reference set. Each accession number was mapped to its corresponding gene object in the Ingenuity's knowledge base or MetaCore's manually curated data base. A fold change cutoff of 1.5 with a p-value<0.05, was set to identify genes whose expression was significantly differentially regulated. For analysis with Ingenuity, these genes, were designated as focus genes and were overlaid onto a global molecular network developed from information contained in the Ingenuity knowledge base. Networks of these focus genes were then algorithmically generated based on their connectivity. A score is generated for each network based on the fit between the focus genes and each network. The score is the –log(p-value) calculated based on a hypergeometric distribution with the right-tailed Fisher's Exact Test. For analysis with MetaCore, the gene list of proteins found to be up-regulated in the mouse plasma and secreted/shed in mouse and human cancer cell lines (total of 58 genes) was submitted to an enrichment and network workflow. Enrichment analysis was conducted across three GeneGo curated ontologies along with Gene Ontology to provide a quantitative analysis of the most relevant biological functions represented by the data. Networks and the statistics for each, were generated using the analyze network algorithm, one of the nine network building algorithms in MetaCore.PeptideProphet , an empirical statistical modeling program, was used to estimate the accuracy of peptide identifications. Factors determined by the search algorithm were weighted to assign a single number for each peptide identification that can be then compared to other peptide identifications. ProteinProphet a program that applies a statistical model to infer protein groups from peptide identifications and validates these groups with a probability assignment, was also utilized. A protein group may contain one or more protein sequence, with each sequence being indistinguishable based on the identified peptides. Proteins with a ProteinProphet score corresponding to 5% error rate (∼3.5% false discovery rate as determined by ProteinProphet) were retained. In this study, for each protein group, henceforth referred to as “protein”, a representative gene symbol was chosen. […]

Pipeline specifications

Software tools CPAS, MetaCore, PeptideProphet, ProteinProphet
Application MS-based untargeted proteomics
Organisms Mus musculus, Homo sapiens
Diseases Ectromelia, Infectious, Neoplasms, Ovarian Neoplasms