Computational protocol: Gene expression changes in damaged osteoarthritic cartilage identify a signature of non-chondrogenic and mechanical responses

Similar protocols

Protocol publication

[…] Strand specific RNA-seq libraries were generated from 0.5 to 1 μg RNA using the TruSeq® Stranded mRNA Sample Preparation Kit (Illumina, Inc.) and 101 bp paired-end reads were generated, yielding at least 39 million reads per sample. The fastq files generated by HiSeq Illumina 2000 platform were analysed with FastQC and scanned against other genomes for possible contamination. Low quality reads, contaminated barcodes and primers were further trimmed with Trimmomatic, . All libraries were aligned to hg19 assembly of human genome using Tophat-2 with the best score matches reported for each read. The mapped reads were counted by genes with HTSeq against gencode v16 to reflect gene abundance, . Inter gene expression comparisons were based on calculated fragments per kilobase of transcript per million mapped reads (FPKM). Within the 16 datasets reads from 33,960 (60%) of 56,562 human genes in gencode v16 were detected. Following removal of those with lowest reads, to optimise detection of DEGs, the analysis was on 17,160 genes.A standard method for estimation of fold change and dispersion for RNA-seq data (DESeq2) was used to initially identify DEGs. The false discovery rate for the analysis (10%) was selected to provide the maximum number of DEG (1575 DEG) with a reasonable level of confidence to best inform the subsequent analysis. For comparison, a lower false discovery rate 5%, gave 1375 DEG (identified in red in , Sheet 2). The 5000 genes with most significant changes by P-value were removed and the remaining genes used as in silico negative controls for batch effect factor calculation with RUVg. DESeq2 was then used with batch correction to identify DEGs. The resulting P-values were adjusted for multiple testing with Benjamini–Hochberg (BH) correction. Data access to R code to reproduce the bioinformatics analysis is at https://github.com/soulj/Dunnetal2015. The RNA-seq data is available from ArrayExpress (E-MTAB-4304). […]

Pipeline specifications

Software tools FastQC, Trimmomatic, TopHat, HTSeq, DESeq2
Databases ArrayExpress GENCODE
Application RNA-seq analysis
Organisms Homo sapiens
Diseases Osteoarthritis