Computational protocol: Integration of Data from Omic Studies with the Literature-Based Discovery towards Identification of Novel Treatments for Neovascularization in Diabetic Retinopathy

Similar protocols

Protocol publication

[…] Initially, a search and selection of amenable studies were performed based on data deposited in the Gene Expression Omnibus (GEO) and ArrayExpress (AE) databases. Studies found were in all cases performed on retinal tissue from animal models of DR, where diabetes was artificially induced by streptozocin ().Subsequently, studies were selected for meta-analysis based on sufficient number of samples and compatibility of study design. Based on these criteria, two studies (GSE19122 and GSE12610) reporting transcriptome alterations in mouse models of DR were incorporated in the meta-analysis in order to determine a set of most consistently differentially expressed genes in retinal samples of animal models of DR. All the following steps in this section were performed in the R statistical environment version 2.7.1 (http://cran.r-project.org/), in the Bioconductor environment (available at http://bioconductor.org/, []). Raw data from microarray experiments were obtained from the GEO repository (http://www.ncbi.nlm.nih.gov/geo/, []) and were examined using the arrayQualityMetrics package, followed by normalization and nonspecific filtering with affyPLM and genefilter packages, where necessary. Ultimately, 12,177 genes with expression values measured for 19 samples (10 mice with DR and 9 controls) met our filtering criteria and were included in the meta-analysis step. Differential expression of genes across all three studies was calculated using meta-analysis algorithms implemented in the RankProd package []. RankProd uses a nonparametric statistical measure to detect genes constantly highly ranked across different microarray datasets and is therefore a feasible meta-analysis tool, enabling fusion of omic data from different studies and allowing for inclusion of data from different laboratories and performed on differing platforms. Significance values and false discovery rate (FDR) values were calculated by performing 1000 permutations of the source dataset. Mouse gene Entrez identifiers were then converted to their human counterparts using homology information for mouse genes collected in the hom.Mm.inp Bioconductor annotation library. Human orthologs of mouse genes with highest differential expression in mouse models of DR were therefore included as targets for novel therapeutic discovery by the SemBT algorithm.We also performed gene set enrichment analysis of genes scoring highest in meta-analysis of two datasets against a background of all human genome genes; for this reason Gene Ontology functional gene annotations [] were utilized, and the DAVID tool (http://david.abcc.ncifcrf.gov/, []) was used for estimating the enriched functional categories, where overrepresentation was called after the significance scores were below 0.05 after adjustment for multiple testing according to Benjamini-Hochberg correction []. […]

Pipeline specifications

Software tools arrayQualityMetrics, affyPLM, genefilter, RankProd, DAVID
Databases ArrayExpress GEO
Application Gene expression microarray analysis
Diseases Diabetes Mellitus, Retinitis