Computational protocol: An integrative analysis of post-translational histone modifications in the marine diatom Phaeodactylum tricornutum

[…] Samples were analyzed by nano-HPLC/MS/MS using an Ultimate3000 system (Dionex S.A.) coupled to an LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). Samples were loaded on a C18 pre-column (300 μm inner diameter × 5 mm; Dionex) at 20 μl/minute in 2 % acetonitrile, 0.1 % trifluoroacetic acid. After 3 minutes of desalting, the pre-column was switched on line with the analytical C18 column (75 μm inner diameter × 50 cm; C18 PepMap™, Dionex) equilibrated in 100 % solvent A. Bound peptides were eluted using a 0 to 30 % gradient of solvent B (80 % acetonitrile, 0.085 % formic acid) during 157 minutes, then a 30 to 50 % gradient of solvent B during 20 minutes at a 150 nl/minute flow rate (40 °C). Data-dependent acquisition was performed on the LTQ-Orbitrap mass spectrometer in the positive ion mode. Survey MS scans were acquired on the Orbitrap in the 400–1200 m/z range with resolution set to a value of 100,000. Each scan was recalibrated in real time by co-injecting an internal standard from ambient air into the C-trap (‘lock mass option’). The five most intense ions per survey scan were selected for collision-induced dissociation fragmentation and the resulting fragments were analyzed in the linear trap (LTQ). Target ions already selected for MS/MS were dynamically excluded for 20 s.Data were acquired using the Xcalibur software (version 2.0.7) and the resulting spectra were then analyzed via the Mascot™ Software created with Proteome Discoverer (version 1.4, Thermo Scientific) using an in-house database containing the sequences of histone proteins from P. tricornutum (PtH3_50695, PtH3_21239, PtH4_26896, PtH2A_34798, PtH2A_28445, PtH2B_11823, PtH1_54381) or the UniProtKB Phaeodactylum tricornutum database (15,832 proteins) with a Mascot score of 1 % FDR (or <5 %; shown in bold in Additional file ). Carbamidomethylation of cysteine, oxidation of methionine, acetylation of lysine and protein N-termini, methylation, dimethylation of lysine, arginine and trimethylation of lysine, methylation of aspartic and glutamic acid, di-glycine of lysine, propionylation of lysine and N-termini of peptides, phosphorylated histidine, serine, threonine and tyrosine were set as variable modifications for Mascot searches. The mass tolerances in MS and MS/MS were set to 5 ppm and 0.5 Da, respectively. The resulting Mascot files were further processed using myProMS []. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium [] via the PRIDE partner repository [] with the dataset identifier PXD002148. [...] For mapping and analysis we used P. tricornutum genome v.2.0 available at the Joint Genome Institute []. Reads obtained were quality controlled with a standardized procedure using FASTQC []. Trimmomatic [] was used for quality trimming. GO-based functional analysis on ChIP-marked genes and methylated genes were performed using BLAST2GO [] with a significant FDR cutoff of 0.05 % probability level. R [] and Biopython [] were extensively used for data analysis. For pattern-based analysis on genes and flanking regions, genes were normalized to equal size, and flanking 2-kb regions were selected as the average intergenic size of ~1500 bp. Data processing, analysis, and plotting were performed using Python, R/Bioconductor and Hyperbrowser []. Results of the analysis have been made available on the Gbrowse-based genome browser at []. [...] We mapped Bisulfite-Seq reads from an Illumina GAII from DNA extracted from both nitrate replete and depleted conditions after filtering through FASTQC to the Pt1.86 reference genome available using Bismark []. Five million reads for the replete nitrogen condition and 3.3 million reads for the low nitrogen condition were uniquely mapped and de-duplicated. Average fold coverage was 17. We extracted the methylation calls for each base and for calling a CpG/CHH/CHG site as methylated, we used a cutoff of at least three reads and a minimum of 20 % reads being methylated. [...] TopHat v.1.1.3 [] and Cufflinks [] were used to map and estimate the transcripts from the RNA-Seq data. Relative abundances of transcripts were measured as fragments per kilobase of exon per million fragments mapped (FPKM). […]

Pipeline specifications

Software tools FastQC, Bismark
Application BS-seq analysis
Organisms Phaeodactylum tricornutum