Computational protocol: Deposition of Histone Variant H2A.Z within Gene Bodies Regulates Responsive Genes

Similar protocols

Protocol publication

[…] Approximately 30 ug total RNA was isolated from 4 week post germination mature rosette leaves using the RNeasy Plant Extraction Kit (Qiagen) with the optional on-column DNAse treatment. mRNA was purified from total RNA by two treatments of poly-A enrichment using the Oligotex kit (Qiagen #72022), followed by a rRNA removal step using the RiboMinus Plant Kit for RNA sequencing (Invitrogen #A1083702). Illumina library construction and RNA sequencing were performed as described in . We used single ends (SE) Illumina sequencing for RNA sequencing on the GAII platform and sequence alignments were performed using Bowtie and the TAIR8 Genome Annotation and cDNA Annotation (http://www.arabidopsis.org/) as in . [...] For differential expression analysis of the RNA sequencing datasets, a strategy was employed to account for expression differences between WS and Col ecotypes. In brief, we used the recently published list of 144,879 SNPs between the WS and Col ecotypes to obtain reads per kilobase of exon model per million reads (RPKM) scores for each gene in h2a.z and WT from either the WS or Col backgrounds.First, using Bowtie with no tolerance for mismatches, reads from each of the three h2a.z and WT RNA sequencing datasets were mapped to small 75 bp scaffolds containing either the WS or Col SNP around each SNP locus that mapped within an exon of a gene greater than 200 bp in length and with at least 10 mapped reads. We removed all SNPs that were less than one read-length (36 bp) from the end of the exon, which left approximately 5,000 SNPs across the genome. The number of reads mapping to the WS and Col scaffolds were compared at each SNP locus and used to determine whether the region was homozygous for WS, Col or heterozygous for the two ecotypes in each dataset. For SNPs at heterozygous loci, a Read Count Contribution from each WS or Col genome was determined by dividing the number of reads mapping to either WS or Col genome by the total reads mapping to the SNP scaffold for each ecotype. As SNPs within a given heterozygous region generally exhibited similar ratios of WS to Col mapped reads, a rolling 20-window (where the windows are the 5,000 SNPs) smoothing function was applied to these read count contribution values.Next, the six RNA sequencing datasets were mapped to the TAIR cDNA scaffold, and each cDNA model was assigned a score equal to the number of mapped RPKM. For both the h2a.z and WT datasets, the normalized read counts of the three replicates were partitioned into reads contributed by WS and by Col using the smoothed read count contribution value obtained from the nearest SNP. In this way, approximate WS and Col read count scores were determined for each gene in both h2a.z and WT.To test for statistical significance of the difference between the h2a.z and WT, we repeated the above partitioning process using read counts normalized to the size of the smallest library, rather than per million of reads. This alternate normalization less drastically underestimates the number of reads per locus, which is important as the statistical significance is dependent on the number of reads. We calculated the probability that a gene's expression deviates from expectation using a Fisher's two-tailed exact test of h2a.z vs. WT scores for each ecotype. Genes were determined to be differentially expressed if for either ecotype they exhibited a two-fold change in expression between h2a.z and WT and had a P-value<0.001, or if for both ecotypes they exhibited a two-fold change in expression and had p-values<0.005. Gene Ontology analysis was performed on the up- and downregulated gene lists using the GO FAT Ontology on the DAVID web server (http://david.abcc.ncifcrf.gov) , and categories with P-values<1×10−5 were considered enriched. […]

Pipeline specifications

Software tools Bowtie, DAVID
Databases TAIR
Application RNA-seq analysis
Organisms Arabidopsis thaliana