Computational protocol: The sea urchin (Strongylocentrotus purpuratus) test and spine proteomes

Protocol publication

[…] Raw data were transformed to msm-files using the in-house-made software RAW2MSM, v.1.10 []. The msm-files were used for database searches with the MASCOT search engine (Matrix Science, London, UK; version 2.1) against the Strongylocentrotus purpuratus annotated gene models (Glean3) protein sequence database [] (see also for further information about Glean []), the corresponding reversed database, and the sequences of common contaminants, including human keratins from IPIhuman. Carbamidomethylation was set as fixed modification. Variable modifications were oxidation (M), N-acetyl (protein), pyro (N-term QC), amide (C-terminal) and hydroxylation (P). The peptide tolerance was set to 5 ppm and the MS/MS tolerance was set to 0.5 Da. One miss-cleavage was allowed. MS3 scoring, counting of unique and total peptides, and calculation of protein scores was done with MSQuant, v.1.4.2a13 . Each raw-file was analyzed separately. Msm-files containing data of accepted peptides were then merged into one single msm-file for another Mascot search to obtain summed sequence coverage, scores and peptide numbers. This file was also used to search the IPIhuman database v.3.13 to find shared peptides. The score threshold for peptide acceptance in each Mascot results file was chosen such as to eliminate any reversed hits at p < 0.05 and was between 38 and 54 for different searches. Peptide hits with one unique peptide were accepted only if confirmed by MS3 [] with a score at least twice the threshold value for MS/MS, and after manual validation. Quality criteria were the assignment of major peaks, occurrence of uninterrupted y- or b-ion series of at least 3 consecutive amino acids, preferred cleavages N-terminal to proline bonds and C-terminal to Asp or Glu bonds, and the presence of a2/b2 ion pairs. The minimal length required for a peptide was seven amino acids.The abundance of proteins was estimated by calculating the exponentially modified Protein Abundance Index (emPAI) [] without using retention time data. BLAST analysis was performed with the program provided by NCBI and searching against the non-redundant database for all organisms. FASTA and MPsrch search programs were used as provided by the European Bioinformatics Institute (EBI, ), searching against UniProt Knowledgebase and UniProtKB/Swiss-Prot protein sequence databases. Signal peptides were predicted with SignalP 3.0 and domains were predicted with NCBI Conserved Domain Search [] and PROSITE []. […]

Pipeline specifications

Software tools Mascot Server, MSQuant, SignalP
Databases PROSITE UniProt UniProtKB
Application MS-based untargeted proteomics
Organisms Strongylocentrotus purpuratus
Chemicals Cysteine, Guanosine Triphosphate