Computational protocol: Structural basis for LIN54 recognition of CHR elements in cell cycle-regulated promoters

[…] LIN54 DBD was prepared for crystallization by elution from a Superdex-75 (GE Healthcare) column in buffer containing 20 mM Hepes pH 7.0 and 200 mM NaCl. CHR13 duplex DNA (5′-GAGTTTGAAACTG-3′) was added in twofold molar excess to 11 mg ml−1 of LIN54 DBD. The protein–DNA complex was crystalized either by sitting-drop or hanging drop vapour diffusion at 22 °C. Crystals formed over several days in 313 mM magnesium chloride hexahydrate and 21% PEG 3,350. Crystals were frozen in a solution of mother liquor and 25% ethylene glycol.Data were collected at the Advanced Photon Source, Argonne National Laboratory at Beamline 23-IDB and at the Advanced Light Source, Lawrence Berkeley National Laboratory Beamline 8.3.1. Diffraction spots were integrated using MOSFILM and data were merged and scaled using Scala. Phases were solved by molecular replacement using Phaser. A homology model for half of the LIN54 DBD was constructed using MSL2 (PDB: 4RKH) and deleting a small seven amino-acid loop region. First, generic B-form DNA was used as a search model, then the DNA solution was set as a fixed partial solution and one copy of MSL2 was used as a search model. That solution (DNA plus protein) was fixed as a partial solution and MSL2 was used as a search model for the additional half of the LIN54 DBD. The final model was built with Coot and the models were refined with PHENIX. Coordinates and the corresponding structure factors have been deposited in PDB under the accession code 5FD3. [...] Raw reads from E2F4 ChIP-seq (GSE31477) and MNase-seq (GSE35586) experiments performed on the GM12878 lymphoblastoid cell line (ENCODE tier 1) were downloaded from the NCBI Gene Expression Omnibus. These data were generated by the ENCODE Consortium. Raw reads were mapped to the human genome (hg38) using BOWTIE. Combined replicate data resulted in a total of 41,253,604 and 1,788,468,710 mapped reads for E2F4 and nucleosomes, respectively. Average E2F4 binding and nucleosome positioning profiles at specified promoter regions were generated using ngs.plot. Late cell cycle (1,408), 745 DREAM-bound and 3,804 housekeeping gene promoter regions used in this analysis were defined previously. GM12878 gene expression data (GSE26386) were used to define the 1,732 low expression gene promoter regions. Motif locations were generated by searching for TTYRAA (CHR IUPAC code) and SGCGCS (CDE IUPAC code) within the specified promoter regions using the HOMER motif analysis tool. This analysis identified two gene sets used in the promoter region profiles: 155 late cell cycle gene promoters bound by DREAM and containing TTYRAA and 900 late cell cycle gene promoters not bound by DREAM and not containing TTYRAA. […]

