Computational protocol: Discovery of Intermediary Genes between Pathways Using Sparse Regression

Similar protocols

Protocol publication

[…] In adaptive immune response, dendritic cells act as intermediary between antigens and mammalian immune mechanism by processing and presenting antigens to lymphocytes. One of the most important pathways involved in the activation of innate immune response is the Toll-like receptor 4 (TLR4) signaling pathway. TLR-4 signaling pathway is activated when lipopolysaccharide (LPS) found on the surface of Gram-negative bacteria is bound to the extracellular domain of TLR4, which eventually leads to the activation of proinflammatory cytokines and type-1 interferons []. After LPS binding, TLR4 signaling branches into two pathways, independently utilizing the adaptor proteins MyD88 and TRIF []. MyD88-dependent pathway is utilized for the rapid activation of IRAK1, IRAK4, and TAK1, which are important for the activation of MAPK and NF-κB genes, whereas the TRIF-dependent pathway is essential for the production of interferon-β and late-phase activation of NF-κB []. Understanding how the two independent pathways interact with downstream activities, and finding genes that are involved in signal transduction between the upstream and downstream pathways, are important steps for further understanding of mammalian adaptive immune response. In this work we use a dataset that consists of wild type, MyD88 KO, and TRIF KO mouse dendritic cell samples. Each sample was extracted from bone-marrow cells under the presence of GM-CSF. All three types of cells were then stimulated with LPS to elicit immune response. Samples from the stimulated cells were collected at 0hr, 0.5hr, 1hr, 2hrs, 3hrs, 4hrs, 6hrs, 8hrs, 16hrs, and 24hrs after stimulation, and RNA-Seq was performed on each sample. The time-series RNA-Seq data is currently available in Sequence Read Archive with accession number DRA001131 [].Prior to analysis by the proposed method, the mouse dendritic cell time-course RNA-Seq dataset was checked for read quality using FastQC []. The resulting reads for each of the three cell types were mapped to M. musculus mm10 genome RefSeq gene annotations using Bowtie1 [] and Tophat2 []. Indices and annotations for Bowtie1 and Tophat2 were downloaded from the respective programs’ websites. Per-base read quality scores and mapping rates for each sample are shown in . Reads that were successfully mapped by Tophat2 to the mouse transcriptome were then used to estimate the gene expressions in each time sample using Cufflinks []. Gene expression across different time samples in the same cell type were normalized as FPKM (fragment per kilobase of exon per million fragments mapped) and as a time series using Cuffdiff with option -T [].Before analyzing the processed RNA-Seq data, we first filtered out genes that have no expression or show limited changes in expression throughout the time series in all three cell types. We kept for subsequent analysis only those genes that in at least one of the cell types have a greater than 2-fold change between the maximum and minimum expressions, and have a maximum expression of greater than 5 fpkm. The remaining 5,676 genes were then z-normalized to mean of zero and variance of one. […]

Pipeline specifications

Software tools FastQC, Bowtie, TopHat, Cufflinks
Databases SRA
Application RNA-seq analysis
Organisms Mus musculus