Computational protocol: Chromatin structure analysis enables detection of DNA insertions into the mammalian nuclear genome

Similar protocols

Protocol publication

[…] Data workflow was performed following guidelines outlined by (b). ChIP-seq data was demultiplexed using CASAVA 1.8.2 (Illumina, CA) and high quality sequencing data was retained after a per-lane and per-sample data quality check. High quality data was defined as that with a mean quality score of at least 35. For mouse endogenous DNA insertions, the standalone ELAND2 (Illumina, CA) aligner was used for mapping the high quality demultiplexed sequence reads to the mouse genome (mm9). The quality of the alignment was assessed by examining the number of uniquely mapped reads (i.e. those reads that align to a single genomic location). For the three antibodies used (H3K4me3, H3K36me3, and H3K4me1), wild-type mice samples averaged ~12.2±1.5 M, ~12.6±1.3 M, and ~12.4±2.2 M uniquely mapped reads, respectively. GMO sample 1 generated ~17.7 M, ~17.0 M, and ~9.1 M; GMO sample 2 generated ~8.2 M, ~6.9 M, and ~9.0 M; and GMO sample 3 generated ~15.0 M, ~18.4 M, and ~17.1 M, uniquely mapped reads, for the three antibodies, respectively.ChIP-seq peak finding was performed with Model-based Analysis for ChIP-seq (MACS 1.4.2) using the built-in mouse genome-size setting . The total number of called peaks across the mouse genome for the four wild-type samples averaged 21,561±2007 reads for H3K4me3, 58,741±3310 reads for H3K36me3, and 74,988±9,157 reads for H3K4me1. The total number of peaks for GMO sample 1 was 24,647, 56,205, and 72,727 peaks; for GMO sample 2 was 11,420, 66,991, and 76,019 peaks; and for GMO sample 3 were 24,648, 50,402, and 88,566 peaks for the H3K4me3, H3K36me3, and H3K4me1 histone antibodies, respectively. Peaks were examined and graphically represented using MATLAB (R2013b). For each chromosomal region of interest, peak data for all three of the histone antibodies were plotted on a single graph, for each wild-type and each GMO sample. These peak plots were qualitatively examined to identify both similarities and differences between wild-type and GMO samples. [...] The GMO samples used in this research effort each contained an inserted transgene holding some exogenous, non-mouse, genomic sequences; specifically, the human ACTA1 gene region, the human TPM3 cDNA sequence, the four SV40 intronic regions, and the SB10 cassette (, ). Accordingly, high quality ChIP-seq reads that had failed to align to the mouse reference genome (i.e. unmapped reads) were aligned to a custom reference genomic library containing these exogenous genomic sequences (b). Sequence Alignment/Map tools (SAMtools 0.1.18; ) were used to generate FASTQ files for the unaligned reads, which were then aligned to the aforementioned reference library using the ELAND2 standalone aligner. Peak finding was executed on the resulting BAM files using MACS 1.4.2 to generate WIG files with the genome-size setting modified to represent the size of the reference used for alignment. The resulting peak data was then further examined and graphically represented MATLAB (R2013b). […]

Pipeline specifications

Software tools BaseSpace, ELAND, MACS, SAMtools
Application ChIP-seq analysis
Organisms Homo sapiens
Chemicals Amino Acids