Computational protocol: Structural and spatial chromatin features at developmental gene loci in human pluripotent stem cells

[…] ms4C-seq data sets were analyzed according to the following procedures. Bases with low-quality scores and the adapters in all sequenced reads were trimmed with Cutadapt-1.4.2. Read pairs with the HindIII site “AAGCTT” in the read1 (R1) sequence were used for the following analysis. The R1 sequences were separated into bait sequences and target sequences. To divide the reads into each bait, the former parts from HindIII of R1 were excised as bait sequence and mapped to 47 bait sequences by BWA (bwa-0.6.2) by using the default setting. The NANOG pseudogene gene loci were discriminated from NANOG bait loci on the basis of an SNP in the NANOG region. The latter parts from HindIII of the R1 and R2 read pairs were used for the subsequent mapping. These read pair sequences were mapped to the human reference genome (hg19) by BWA (bwa-0.6.2). The read pairs, which were derived from self-ligation and no-digestion events, were removed. PCR duplicates were removed by Picard (Picard-tools-1.97). The total read numbers within 1000 bp from each HindIII site were counted, and the counted numbers in the HindIII sites were collected as wiggle (.wig) format files. The total read numbers and the ratio of cis:trans reads were used for quality control (Supplementary Table ). To exclude the baits, which have insufficient sequenced reads for further analysis, percentages of the observed cis interactions in the expected total number of cis interactions were calculated for each bait. The total number of distinct cis interactions was estimated by using Preseq software. […]

Pipeline specifications

Software tools cutadapt, BWA, Picard, Preseq
Application WES analysis
Organisms Homo sapiens