Computational protocol: Data quality of whole genome bisulfite sequencing on Illumina platforms

[…] Paired end sequencing (2 x 150) was performed on a HiSeq X system at the SNP&SEQ Technology Platform. The amount of library loaded on the instrument varied between 100 and 200 pM. A PhiX library was spiked in at 20–40% for HCS v3.3.39 /RTA:2.7.1 or v3.3.75/RTA:2.7.5 and at 2% for HD.3.4.0 /RTA:2.7.7). For comparison we also analyzed previously generated sequencing data from the HiSeq2500 system (HCS 2.2.38 / RTA 1.18.61) using the TruSeq v.4 chemistry PE125 (10% PhiX) [] and data generated on the an installation run of a NovaSeq 6000 instrument with 50% phiX spike-in (RTA:3.1.5).Per nucleotide quality scores were extracted using FASTX-Toolkitv 0.0.14 or reported by Sisyphus, an in-house pipeline used at the SNP&SEQ Technology Platform for processing and QC of Illumina sequence data. Sequence reads were quality filtered and adaptors were trimmed using TrimGalore. For Accel-NGS Methyl Seq libraries 18 bp was trimmed off the 5’-end of R2 and the 3’ end of R1 to remove bases derived from the sequence tag introduced in the library preparation procedure. Alignment to the human reference assembly GRCh37 and methylation calling was performed with the Bismark software [] and the pipeline tool ClusterFlow []. For TSDM libraries the initial 6 base pairs of each reads were ignored in the methylation calling procedure, to avoid random priming biases. Global methylation rates (∑C/∑(C+T) in CpG context) and methylation at individual CpG sites was obtained from Bismark methylation extractor output files. Average methylation in 100 kB non overlapping windows were determined using BEDTools. Methylation correlation at individual CpG sites and at 100 kB non overlapping windows and pair-wise root mean square error (RMSE) values were computed using custom R scripts. […]

Pipeline specifications

Software tools Trim Galore!, Bismark, BEDTools
Application BS-seq analysis
Diseases Huntington Disease