Computational protocol: Comparison of total and cytoplasmic mRNA reveals global regulation by nuclear retention and miRNAs

[…] All sequences were aligned to the human genome reference hg19 with tophat [,] version 1.1.4 and samtools [] version 0.1.8 using tophat standard parameters except for: --solexa1.3-quals -p 8 --GTF Homo_sapiens.GRCh37.59.gtf. Annotations from ensembl and RefSeq, downloaded from UCSC Genome Browser, were used to assign features to genomic positions. Sequences aligned to the human genome were assigned to features and counted by HTSeq version 0.4.6 with parameters: -m intersection-strict -s no -t exon (Additional file : Table S3). The R/Bioconductor package DESeq [] was used to call differential gene expression on counts generated by HTSeq. All biological replicates had R2 (Spearman) correlation of gene expression (read counts) greater than 0.94.Reads per kilobase of exon per million mapped sequence reads (RPKM) values for features were calculated by using the parameters: -sam -gffann –readcount. Estimations of intergenic expression levels for each replicate were calculated by and the R script cut_off.1.0.R (Additional file : Table S4) [].Reads were trimmed to determine the effect of sequencing length on the number of called differentially expressed genes using a custom perl script:, which is available on github ( of gene categories and pathways was performed by WebGestalt2 [] with parameters: Id Type: ensembl_gene_stable_id, Ref Set: entrezgene, Significance Level: Top10, Statistics Test: Hypergeometric, MTC: BH, Minimum: 2.5’ and 3’ UTR lengths and coding sequences were downloaded from UCSC. Lengths and fold energies were calculated with the Vienna RNA Package []. […]

Pipeline specifications

Software tools TopHat, SAMtools, HTSeq, DESeq, rpkmforgenes, WebGestalt
Databases UCSC Genome Browser
Application Genome data visualization
Organisms Homo sapiens