Computational protocol: Probing the changes in gene expression due to α-crystallin mutations in mouse models of hereditary human cataract

Similar protocols

Protocol publication

[…] RNA-seq reads were aligned to the GRCm38.76 assembly from Ensembl using Spliced Transcripts Alignment to Reference (STAR) version 2.0.4b []. The gene counts were derived from the number of uniquely aligned unambiguous reads that were detected using Subread-featureCounts version 1.4.5 [], and the transcript counts were produced using Sailfish version 0.6.3. The sequencing performance was assessed based on the total number of aligned reads, total number of uniquely aligned reads, genes and transcripts detected, ribosomal fraction known junction saturation, and read distribution over known gene models using RSeQC version 2.3 []. All gene-level and transcript counts were then imported into the R Bioconductor edgeR package [, ]. The trended mean of M values (TMM) were normalized to account for differences in library size, and the genes or transcripts not expressed in any sample were excluded from further analysis. The raw counts and fastq.gz files are available at NCBI GEO accession number GSE98027.The performance of the samples was assessed using a Spearman's rank-order correlation matrix, and multi-dimensional scaling plots and generalized linear models with robust dispersion estimates were created to test for differential expression at both the gene and transcript levels. The fit of the trended and tagwise dispersion estimates were then plotted to confirm the proper fit of the observed mean to the variance relationship in which the tagwise dispersions were equivalent to the biological coefficients of variation in each gene. Differentially expressed genes and transcripts were then filtered for false discovery rate (FDR)-adjusted p-values ≤0.05.To enhance the biological interpretation of the large set of transcripts, genes and transcripts were grouped based on known Kyoto Encyclopedia of Genes and Genomes (KEGG) biological interactions and pathways. We used the R Bioconductor package GAGE [] to examine the log 2 fold-changes reported by Limma to determine if genes expressed in each contrast, regardless of statistical significance, were up or down regulated in comparison to global background log 2 fold-changes as well as tests for perturbations of log 2 fold-changes within pathways. For any pathway that was significantly up or down regulated or perturbed with a p-value less than or equal to 0.05, annotated KEGG graphs were downloaded and rendered with the R/Bioconductor package Pathview [] so that gene and protein complexes were color coded by the mean log 2 fold-change with a rescaled minimum of -2 and maximum of 2 for ease of interpretation. Heat maps of genes with altered expression in knock-in mutant lenses were displayed by the R package heatmap3 []. […]

Pipeline specifications

Software tools STAR, Subread, RSeQC, edgeR, GAGE, limma, Pathview
Databases KEGG
Application RNA-seq analysis
Organisms Homo sapiens, Mus musculus
Diseases Blood Protein Disorders, Cataract, Genetic Diseases, Inborn