Computational protocol: Pathway Based Analysis of Genes and Interactions Influencing Porcine Testis Samples from Boars with Divergent Androstenone Content in Back Fat

Similar protocols

Protocol publication

[…] The first step in expression data analysis was a quality control and filtering step. In this step, PCR primers and bad quality sequences (Phred score <20) identified in the raw reads using FASTQC quality control application were trimmed off. The selection of threshold cut-off (Phred score >20) was arbitrary and yet this cut-off threshold ensured that only the reads with a base quality score of 99% or more were retained for further analysis. The filtered raw reads were mapped to latest Sus scrofa genome build, Sscrofa10.2 from NCBI using a “splice aware” mapping algorithm TopHat to generate individual genome mapping files for each sample. The expression set (expression matrix) was created by calculating read counts (expression values) for each gene from these genome mapping files using BEDTools . It has been shown that the read count expression data set generated from an RNA-seq experiment follows a negative binomial distribution , but the classical linear modeling analysis procedures developed for microarray data sets assumes the data to be normally distributed. Although various non parametric procedures (distribution free methods) can be used in this context, we found that the results given by such analysis procedures were statistically non significant, owing to the small sample size of our data set and the limited power of non parametric methods to draw significant conclusions from data sets with small sample sizes. Recently, Law et al. proposed applying normal distribution based microarray like statistical analysis methods to RNA-seq read count data. In order to overcome the limitations of small sample sizes and non parametric methods to an extend and also following the proposed idea in of using normal distribution based microarray like statistical analysis methods to RNA-seq read count data, we normalized and log2 transformed our expression data set using “voom” function implemented in limma R package . Comparison of various normalization and differential expression analysis methods for RNA-seq data have shown that voom normalization combined with limma package is relatively unaffected by outliers and performed well under many conditions . An additional study concluded that modeling RNA-seq gene count data as log normal distribution with appropriate pseudo counts (limma voom modeling) is a reasonable approximation of the data. Mean-variance modeling at the observational level (voom) estimates mean-variance relationship in the read count data and computes weights for each observation based on this relationship . Our expression dataset was generated and normalized based on the above mentioned procedure. […]

Pipeline specifications

Software tools FastQC, TopHat, BEDTools, limma
Application RNA-seq analysis
Chemicals Glutathione