Computational protocol: Single-cell genome-wide bisulfite sequencing uncovers extensive heterogeneity in the mouse liver methylome

Similar protocols

Protocol publication

[…] Raw sequence data were subjected to quality control by FastQC v0.10.1 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and trimmed using trim galore v0.3.3 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with default parameters except additional trimming of the first four and last two base pairs of a read due to abnormal GC content. Trimmed sequences were mapped to the mouse reference genome (mm9) using Bismark 0.10.0 with the alignment tool Bowtie2 2.1.0. Sequence duplicates were further removed and single CpG methylation was called using Bismark []. A summary of data processing is shown in Additional file : Table S1.To estimate CpG methylation variations, a sliding window of 3 kb in size and 600 bp in step size was used to subdivide the genome, similar to Smallwood et al. []. Windows covering at least 5 CpGs were used in the analysis (Fig. ; Additional file : Figure S1). The methylation frequency of a window in one sample was estimated based on a binomial distribution.Heterogeneity levels were estimated in two ways”: (1) global difference between a cell and its bulk; and (2) local difference between cell–bulk pairs in each window. In both, heterogeneity level is quantified using a weighted variance value, for which mean methylation frequency is approximated using the corresponding bulk. Multiple downsamplings were performed to access potential noise due to technical artifacts (Fig. ). Annotations of genomic features were obtained from multiple resources (Additional file : Table S3).Of note, our definition of variance in genomic features is slightly different from Smallwood et al []. They plotted the lower bounds of the 95 % confidence interval and we plotted the estimated mean. Additionally, raw variance value is biased by sequencing depth. For example, if the methylation level of two 5mCs are both 0.5 but the sequencing depths are 3× and 20× separately, there will be a systematic bias comparing sequencing depth at 3× (most likely 0.67 or 0.33) of 5mC and 20× (close to 0.5) of the other. We therefore downsampled the data to reach the same sequencing depth in all genomic regions. The downsampling provides a less biased comparison at the cost of more noise. Thus, our reported variances are less quantitatively different than those in Smallwood et al. [] but more directly comparable.Finally, epivariation was defined as methylation difference between a single cell and its bulk at a single CpG site. To call an epivariation at a 5mC, we required (1) a sequencing depth at the 5mC site larger than 5 in both single cell and bulk; (2) more than 90 % of the reads in bulk showing the same methylation pattern (either methylation or unmethylation); and (3) more than three reads in the cell indicating a different methylation pattern than the bulk. Epivariation frequency is stable even with slight changes to the above criteria (Additional file : Figure S3). Further details of data analysis are described in Additional file : Supplemental Experimental Procedures. […]

Pipeline specifications

Software tools FastQC, Trim Galore!, Bismark, Bowtie2
Application BS-seq analysis
Organisms Mus musculus
Chemicals 5-Methylcytosine