Dataset features


Application: ChIP-seq analysis
Number of samples: 2
Release date: Apr 20 2012
Last update date: Aug 26 2014
Access: Public
Dataset link Nucleosome Position by MNase-seq from ENCODE/Stanford/BYU

Experimental Protocol

To isolate mononucleosome core DNA fragments from the GM12878 and K562 ENCODE cell lines we followed the micrococcal nuclease (MNase) digestion and isolation protocol as described in Johnson et al. (2006), Valouev et al. (2008), and Valouev et al. (2011) with the following modifications. The precise concentrations of the two flash-frozen cell samples received from the Snyder Lab were not known so, per our standard procedure, we performed a series of digestions titrating the amount of MNase to determine the concentration of MNase for optimal digestion of each sample. Final concentrations of 25 U/µL and 50 U/µL of MNase were used to digest the GM12878 cells and K562 cells respectively at 20°C for 12 min. All other steps in the digestion and isolation protocol were as described. Cells were grown according to the approved ENCODE cell culture protocols ( K562 and GM12878 were each grown to ~2.5×108 cells. The cells were harvested, frozen and the nucleosome core isolation followed (Valouev et al. 2008). The SOLiD reads were mapped in color-space with the probabilistic mapper, DNAnexus ( The DNAnexus mapper measures and propagates mapping uncertainty by including both quality values and mismatches in the alignment score calculation. The scores are then scaled across all possible mappings of the read to estimate the posterior probability for alignment to each genomic location. Reads corresponding to posterior probability of correct mapping > 0.9 were reported. Nucleosome density signal maps (bedgraph and bigwig files) were generated by first shifting reads by 74 bp in the 5´ to 3´ direction and counting the total number of reads starting at each genomic coordinate on both strands. These counts are then smoothed using un-normalized kernel density smoothing with a triweight kernel. A bandwidth of 30 bp is used which is equivalent to a smoothing window of 60 bp. The smoothed counts at each position are then divided by the expected number of reads from an equivalent uniform distribution of reads in a ± 30 bp window around that position. If less than 25% of the positions in a ± 30 bp window around a genomic location are uniquely mappable or if the location is part of an assembly gap, the signal value at that position is considered unreliable and not recorded in the signal files. Hence, genomic coordinates that do not have any associated signal value should be considered missing or unreliable data. Genomic coordinates associated with a signal value of 0 are reliably mapable but do not have any signal in the dataset.