Computational protocol: Transcriptional profiling of the epigenetic regulator Smchd1

Similar protocols

Protocol publication

[…] The FastQC software was used to assess the quality of the raw sequence data. displays the distribution of sequencing quality (Phred) scores at each base position across reads from a representative RNA-seq sample from each data set. Although variation in base quality is observed across the read, with slightly lower quality at the beginning and end, median quality is above 34 (corresponding to a probability of an incorrect base call below 0.0004) for the entire read. Similar boxplots of base quality scores were observed for other samples (data not shown).Sequences were then mapped to the mouse reference genome (mm10) using the Rsubread program and gene-level counts were obtained by the featureCounts procedure .Further analysis was carried out using the edgeR and limma R/Bioconductor packages. Counts-per-million (CPM) were calculated for each gene to standardize for differences in library-size and filtering was carried out to retain genes with a baseline expression level of at least 0.5 CPM in 3 or more samples. For each data set, TMM normalization was applied and a multidimensional scaling (MDS) plot based on the log2(CPM) was generated to show relationships between samples (). In both data sets, we observe samples that do not cluster well with their respective replicates of the same genotype. Sample 6 in the NSC data (A) and samples 1 and 7 in the Lymphoma data (B) are more variable than the other replicates of the same type. For NSC sample 6 and Lymphoma sample 7, there was no experimental factor that could be identified to explain this phenomenon. Lymphoma sample 1 on the other hand was the only single-end sample in this experiment that was processed on a different day to the other samples, leading us to conclude that batch processing differences was the likely cause of the additional variation. […]

Pipeline specifications

Software tools FastQC, Subread, edgeR, limma
Application RNA-seq analysis