Computational protocol: Genome-wide methylation analysis reveals differentially methylated loci that are associated with an age-dependent increase in bovine fibroblast response to LPS

Similar protocols

Protocol publication

[…] Following sequencing, bioinformatics analysis was performed at Zymo Research using a proprietary analysis pipeline written in Python. Prior to alignment, reads were assessed for quality using FastQC (v0.11.1, Babraham Bioinformatics, UK) and Trim Galore (v0.3.7, Babraham Bioinformatics, UK). Bases with a Phred score greater than 20 were kept for downstream analysis (−−paired --phred33 -q 20). Illumina adaptor trimming was done using Trim Galore with the default settings which are automatically set to standard Illumina adaptors unless otherwise specified. Trim Galore also contains a setting specifically for trimming RRBS data (−−rrbs) which was used to further modify our reads. RRBS introduces artificial CpG sites which require trimming in order to avoid them being used in methylation calling. To do so, Trim Galore trims the first 2 bases from the 3′ end of the sequence so the C closest to the second enzyme cut site is not included in methylation calling. Reads were then mapped to an in-silico bisulfite converted reference genome (BosTau8/BTau_4.6.1) with the Babraham Bismark software (v0.13.1, Babraham Bioinformatics, UK) using the bismark_genome_preparation command, the entire reference genome and with the --non_directional parameter applied. Babraham Bismark is designed for aligning bisulfite sequencing data while simultaneously making methylation calls. In the process of bisulfite sequencing, un-methylated cytosines are converted to uracil, while methylated cytosines are unaffected. The result of this is four sets of potential sequences at any given locus. To determine which of these four potential sequences is correct, Bismark creates two bisulfite converted reference genomes, one that is a C ➔ T conversion and one that is a G ➔ A conversion to account for conversion on the reverse strand. Each sequence read is also bisulfite converted in silico and is aligned to the pre-converted version of the reference genome. The best alignment is then identified and used to make a methylation call. Alignments that are uniquely mapped were kept for analysis. Alignments that mapped to multiple regions were discarded. Due to enrichment-constraints of RRBS libraries, duplicates measured as a result of shorter, overlapping paired end inserts were not removed in the sequence data and were counted as two reads.On average, mapping efficiencies were 25% and ranged between 23 and 27% and the average number of unique CpGs identified were 6.0 million and ranged between 4.8 and 6.6 million. Differentially methylated CpGs falling within −2.5 kb, +1.0 kb of an annotated transcription start site (TSS) were defined as being within the promoter region of a gene. Methylation ratios were used as the comparison parameter to test statistical differences in DNA methylation and these were defined as the number of reads overlapping a particular CpG site which contained either a cytosine or thymine nucleotide. Ratios were then calculated as Methylation Ratio (Mr) = (C)/(C + T). [...] KEGG (Kyoto Encyclopedia for Genes and Genomes) pathway, GO (Gene Ontology) and UCSC (University of California Santa Cruz) Transcription Factor Binding Site (TFBS) analyses were performed using the DAVID (the Database for Annotation, Visualization, and Integrated Discovery) 6.8 platform [, ]. Four lists of official gene IDs (all IDs or promoter only with ≥5× or ≥10× coverage) were generated for sites located within annotated genes that were hyper-methylated in either young or old cultures (Additional files and ). A suggestive analysis was performed on all differentially methylated genes with ≥5× coverage and a conserved analysis was performed with only genes that had ≥10× coverage. Gene IDs from all three gene regions (exon, intron and promoter) were inputted into DAVID, converted to Entrez gene IDs and default parameters were used for KEGG and GO analyses. Only those genes that were covered by RRBS, either at the 5× or 10× level, were set as the background. Promoter gene IDs were used in transcription factor enrichment analysis and TFBS analysis was similarly run with default parameters and Homo sapiens as the background due to a lack of support for Bos taurus within the UCSC_TFBS tool. […]

Pipeline specifications

Software tools FastQC, Trim Galore!, Bismark, DAVID
Databases Gene KEGG
Application BS-seq analysis