The expected expression level of each transcript is limited by the sequencing depth or total number of reads, which is pre-determined by the experimental design and budget before sequencing. Since the expression level of the transcripts within the sample is dependent upon the other transcripts present (Rapaport et al., 2013), given a fixed total read count, higher expressed transcripts will have a greater proportion of total reads (Robinson et al., 2010, Mortazavi et al., 2008). Furthermore, longer transcripts have more reads mapping to them compared with shorter transcripts of a similar expression level (Oshlack et al., 2009). Therefore, a number of normalization methods for RNA-seq data have been proposed to correct for library size bias as well as length and GC-content bias.
(Rapaport et al., 2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biology.
(Robinson et al., 2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology.
(Mortazavi et al., 2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods
(Oshlack et al., 2009) Transcript length bias in RNA-seq data confounds systems biology. Biology Direct.
(Xiaohong et al., 2017) A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data. PLoS One.