Computational protocol: Identification of Genes Critical for Resistance to Infection by West Nile Virus Using RNA Seq Analysis

Similar protocols

Protocol publication

[…] Hi-Seq sequence data from 10 donors (mock or WNV infected) was processed to obtain estimates expression level of genes and transcripts. Using the fastq files, all sequencing reads were mapped back to human genome (hg19) using default setting of TopHat (v1.1.4) []. Cufflinks (v0.9.3) [], was used to assemble the mapped reads against the ENSEMBL (release 57) gene structure annotation, and estimate expression levels for each transcript [], and was not allowed to modify the gene annotations. To estimate isoform-level abundances, Cufflinks uses a probabilistic model of paired-end sequencing to derive a likelihood for the abundance of transcripts to calculate ambiguous isoforms. To analyze the gene expression of human macrophages we first converted the estimated expression levels in FPKM unit (# reads × 109/transcript length/library) to pseudocounts (# of reads originated from each transcript isoform). The effective transcript length reported by Cufflinks is used for the conversion. The pseudocounts were further normalized across samples using the trimmed mean of M‑values (TMM) method []. Differentially expressed transcripts were identified using edgeR and Bayesian DE paired analysis. Transcripts were excluded if <10 in paired conditions or expressed in ≤5 subjects. The resulting transcript set was 36,409 transcripts. To be considered differentially expressed, a transcript may be present in either mock or WNV-infected samples. For the functional annotation clustering, the 1,514 differentially expressed transcripts were submitted for analysis using the DAVID web server []. The “Medium” default settings were used for classification stringency. The functional annotation clustering is based on the occurrence of annotation terms e.g., Gene Ontology (GO) [] in the list of sample genes as compared to the occurrence of these terms in the “background” or “population”. The known annotations for the entire human genome in the categories “GO—Biological Process”, “GO—Cellular Component”, “GO—Molecular Function”, “SP_PIR_KEYWORDS”, “PIR_SUPERFAMILY”, “INTERPRO”, “SMART” and “KEGG pathways” were selected as background []. […]

Pipeline specifications

Software tools TopHat, Cufflinks, edgeR, DAVID
Application RNA-seq analysis
Organisms West Nile virus, Homo sapiens
Diseases Virus Diseases, HIV Infections