Computational protocol: Transcriptomic profiling of cartilage ageing

Similar protocols

Protocol publication

[…] The 100 bp paired-end raw reads in FASTQ format obtained by RNA-Seq were compiled using manufacturer provided pipeline software CASAVA 1.8.2. Quality control of the reads in each lane was undertaken in FASTQC . Between 116 and 235 million reads were obtained per sample. Low quality reads were eliminated resulting in 7–58 million mapped reads (equal to 6.5–35% of the total reads). In total, 3–49 million uniquely mapped read pairs were obtained per sample after aligning to the reference sequence of the equine genome (Equus caballus; EquCab2.56.pep, downloaded from ftp://ftp.ensembl.org/pub/current_fasta/equus_caballus/pep/) () with TOPHAT 1.3.2 using default settings, except for the option “-g 1”. The option “-g 1” instructs TopHat to allow a maximum of 1 alignment to the reference for a given read, choosing the alignment with the best alignment scores if there is more than 1 or discarding the read if there is more than 1 equally good alignment. Read counts per gene were calculated using HTSeq-count (http://www-huber.embl.de/users/anders/HTSeq/doc/count.html) with default settings. Differential gene expression analysis was applied to the read count data for reads mapped to annotated genes using the R (version 2.15.1) Bioconductor package edgeR (version 2.13.0) . EdgeR models data as a negative binomial distribution to account for biological and technical variation using a generalisation of the Poisson distribution model. Prior to assessing differential expression, data were normalised across libraries using the trimmed mean of M-values normalisation method . Genes were deemed differentially expressed with a Benjamini–Hochberg false discovery rate (FDR)-corrected p-value < 0.05 and fold change ≥ 1.4log2 using a generalised linear model (GLM) likelihood ratio test. This represents a 50% linear fold change i.e. log2 1.4 = 0.5 or 50%. All sequence data produced in this study has been submitted to NCBI GEO under Array Express accession number E-MTAB-1386.The number of genes per read was normalised to ‘reads per kilo base of exon model per million mappable reads’ (RPKM); therefore the values were considered the final expression level for each gene . In total 16,635 genes were expressed in cartilage, which represented 66% of the equine genome.The expression of 396 transcribed elements including mRNAs, small non-coding RNAs, pseudogenes, and a single microRNA was significantly different in old compared to young cartilage. Of these, 93 were at higher levels in the older cartilage and 303 were at lower levels in the older cartilage. […]

Pipeline specifications

Software tools BaseSpace, FastQC, TopHat, HTSeq, edgeR
Application RNA-seq analysis
Organisms Equus caballus
Diseases Arthritis, Osteoarthritis