Dataset features


Application: Gene expression microarray analysis
Number of samples: 45
Release date: Jan 1 2007
Last update date: Feb 18 2018
Access: Public
Dataset link Comparison of Hybridization-based and Sequencing-based Gene Expression Technologies on Biological Replicates

Experimental Protocol

METHODS Biological samples RNA samples were isolated from three sources: two pools of C57/B6 adult mouse retina (MRP1 and MRP2, n=700) and Swiss-Webster post-natal day one (P1) mouse cortex (MC) (n=19). Retinas were dissected, collected and stored in Trizol (one pair of retinas per eppendorf tube) at -80°C prior to pooling. During the RNA extraction process, two pools of adult mouse retina (MRP1, MRP2) were created (700 retinas per pool), aliquoted. All samples were stored at -80°C prior to conducting the experiments. The animal experiments were approved by the Institutional Animal Care Facility at Harvard University. Microarray platforms, data processing and consistency assessment Whole-genome mouse gene expression arrays (one-dye oligonucleotide microarrays) were investigated in this study, including: Affymetrix GeneChip®, Amersham (now GE Healthcare) CodeLink®, Mergen ExpressChip®, Applied Biosystems (ABI) microarrays, and Illumina BeadArray®. Microarray experiments are composed of sample preparation, hybridization, scanning and image quantitation, which are a series of integrative procedures being conducted at a laboratory, generally according to the manufacturer’s recommended protocols. To obtain sufficient statistical confidence in the data analysis, for each biological replicate (MRP1 and MRP2), five technical replicates on each platform were obtained, with an exception on Illumina, in which a single experiment for each sample was performed (included to make data available to the public). For details of the experimental protocols and laboratories, we refer to Kuo[6], except for Illumina, which can be found in the Supplementary Material A. The raw data sets of 63 chips after image scanning and quantification in each platform were collected. For Illumina data, we set the filtering threshold as “Detections” ≥ 0.9. Filtering for the other microarray platforms are described in Kuo[6]. We also performed percentile transformation of intensities, quantiles normalization and log2ratio calculation, as described[6]. Data repeatability and reproducibility[7] are two important aspects of microarray data consistency assessment. The former evaluates the degree of data variations among technical replicates of a platform, and the latter refers to data agreement across different microarray platforms when using the same biological samples. Two popularly used metrics, coefficient of variations (CV) among replicated measurements per gene and correlation coefficient (Pearson and Spearman correlations) between any pair of replicated experiments, were adopted to measure microarray performance. For intra-platform data consistency, the mean and standard deviation of CVs or correlation coefficients were used as summations of each platform’s performance. For inter-platform data agreement, either the mean (for normalized log2ratios) or the median after percentile transformation (for intensities) of repeated measurements on each platform were used in calculating correlation coefficients. MPSS experiment and data processing Total RNA of MRP1 and MRP2, which were identical to those used in microarray experiments, was sent to Lynx Therapeutics, Inc. (now Solexa, Hayward, CA) for MPSS experiments. Following an RNA quality test on a Agilent 2100 BioAnalyzer (Agilent Technologies, Palo Alto, CA), cDNA libraries were generated according to the Megaclone protocol[5, 21]. Signatures adjacent to poly (A) proximal DpnII restriction sites (“GATC”) were cloned into a Megaclone vector. The resulting library was amplified and yielded about 1.6 million loaded microbeads, which were loaded onto a flow cell. Thereafter, an iterative series of enzymatic reactions decoded the signatures as 17-bp or 20-bp sequences (including DpnII recognition sites “GATC”)[22]. The abundance of each signature was converted to transcripts per million (tpm) as supplied by Lynx Therapeutics. The mapping of signatures to genes was done based on the mouse genome sequence (Release #3, Feb 2003, UCSC Golden Path genome browser, and the mouse UniGene sequences (, Build #122). The mapping procedure include: extraction of ‘virtual’ signatures from genomic sequence, classification of ‘virtual’ signatures from genomic sequence, and matching of MPSS expressed signatures to genomic signatures[22]. In this study, we included only the reliable signatures which were located close to polyadenylation signal (“A[A/T]TAAA” at the 155 nts at the 3’-end of the cDNA sequence) or poly(A) tail (15 base sequences containing 12 or more “A”s that occur within the 114 3’-most bases of the sequence) on a mRNA sequences with known orientation information. Gene mapping among microarray platforms and between microarray and MPSS Two approaches to match probes across different microarray chips, annotation-based and sequence-based probe matching were used[6]. Briefly, by the annotation-based approach, we obtained UniGene (UG) and LocusLink (LL) based matching, whereas probe matches at the RefSeq (RS) and RefSeq-exon (RSEXON) levels by utilizing actual sequence information belong to the latter. MPSS signatures were mapped to UniGene clusters, using an in silico constructed “virtual tags” library, as described above. Thus, the gene expression data measured by microarrays and by MPSS were paired up for comparisons via UniGene clusters.








Winston Kuo

Dataset Statistics


Citations per year

Dataset publication