Computational protocol: Hypoxia-driven splicing into noncoding isoforms regulates the DNA damage response

Similar protocols

Protocol publication

[…] All statistical analysis including t-tests and Wilcoxon’s tests, were performed in R. Time-course data: 100mer paired reads were aligned to the human genome (hg19) using Mapsplice (v2.1.4; default parameters), which has previously been shown to perform well for de novo splice junction identification. An average of 43.8 M (27.5–58.2 M) read-pairs per sample mapped to the genome in the correct orientation and appropriately spaced, corresponding to ~ 90% of the total reads sequenced. Alternative splicing analysis can be affected by contamination from premature and pre-mRNA transcripts in the poly(A) selected pool. This is manifested by increased numbers of reads mapping to introns. We calculated the read distribution in different regions including exons, introns, untranslated regions, promoters and intergenic regions derived from the genomic annotations in Ensembl (hg19), and found the majority of reads fell within exons and untranslated regions, whereas only a very small proportion of reads mapped to introns. Further, these data were consistent across the samples and there were no significant differences in these overall QC values between replicate groups (). Transcript models were derived for each sample independently using Cufflinks (v2.2.0; with default parameters, except to specify strand specificity; Resultant models were then merged using Cuffmerge to provide a global model and to classify transcripts as novel, or known, when they mapped to ENSEMBL (v74; In order to minimise false positives, a deliberately stringent filtration was used in order to call novel transcripts: Gene models were first filtered to keep transcripts only when an exon junction was supported by at least 2 reads in at least two samples. This filtration step removed a large proportion of potentially unreliable exon junctions, such that following this step 68.2% of exon junctions completely matched Ensembl annotations, whereas 30% differed in either the start or end site but not both and only 1.8% of which were completely novel (). The transcripts were also filtered for overall normalised expression levels, with only transcripts with fragments per kilobase mapped (FPKM) >0.5 in at least three samples being retained for analysis. These transcripts were subsequently classified according to transcript type and provided in the . The 53,936 transcripts (15,334 genes) were used to obtain gene-level counts using the RsubRead package in R and supplied to edgeR to call differential expression (absolute fold-change >2 relative to 0 h; false discovery rate (FDR) <1%) at each time point. Annotation was supplied by the Bioconductor package annmap. […]

Pipeline specifications

Software tools MapSplice, Cufflinks, Subread, edgeR, Annmap
Application Gene expression microarray analysis
Organisms Homo sapiens
Diseases Anemia, Neoplasms
Chemicals Nucleotides