Computational protocol: Tissue-Specific Functional Networks for Prioritizing Phenotype and Disease Genes

Similar protocols

Protocol publication

[…] We collected diverse functional genomics data to use as input for the integration. All data used in phenotype analysis were acquired as of Jan 2011. All data were processed into pair-wise similarity scores S(i,j), which reflect the similarity between proteins i and j:Protein physical interactions: We acquired protein-protein physical interaction data from MiMI (Michigan Molecular Interactions) , BIND , BioGRID , DIP , IntAct , IPI, MINT and Reactome . This included 76749 interactions. These interactions are grouped by interaction and experiment type, such as affinity capture, two-hybrid, indirect complex, or co-purification. Each pair may be involved in multiple, different interaction or experiment types. Protein pairs of small experiment types (less than 1000 pairs) are grouped together, so that we have enough examples to learn probabilities in our Bayesian framework. These groups represent six binary datasets, representing the presence/absence of evidence for a physical interaction between a pair of proteins (Complete description of the grouped datasets is included in ).Expression data: To utilize the signals represented by diverse microarray data, we acquired mouse microarray datasets from GEO (977 datasets, 960 of them have more than or equal to three samples, totaling 13632 arrays) . For each dataset, we calculated the Pearson correlation coefficient, ρ, to assess levels of co-expression between pairs of genes. The correlation coefficients were Fisher z-transformed and normalized to ∼N(0,1) to ensure normal distribution of datasets and comparability across different datasets and platforms, as previously described , .Homologous functional relationship predictions: Previous analysis indicates that homologous functional relationships in simpler model organisms are a good indicator of functional relationship in higher model organisms . We acquired the yeast functional network from and mapped proteins and relationships to their corresponding laboratory mouse ortholog using InParanoid . A single average score was taken in the case of multiple mappings.Phenotype and disease: We acquired data from MGI and the Online Mendelian Inheritance in Man (OMIM) database annotations (mapped to orthologous mouse genes using InParanoid ). The similarity score (S) for the protein pair i, j of the phenotype and disease data is given by:(5)Where aik = 1 if protein i has phenotype k and aik = 0 otherwise, and Nk is the number of proteins involved in this phenotype/disease; and n is the total number of phenotypes and diseases. In this way, co-occurrence of phenotypes or diseases with less annotated genes will be given more weight than well-studied, broadly-defined phenotypes.Phenotype and disease data are included in the networks displayed on our web interface, but were excluded from the networks used to predict phenotype-related genes to prevent circularity.The above data are integrated together using the Bayesian framework (formulas 1–4) to generate both global and tissue-specific networks. The evaluation of each of the input datasets against each tissue-specific gold standard is included in . [...] To allow dynamic visualization and cross-network comparison of our integration results, we developed the mouseMAP software (http://mouseMAP.princeton.edu), based on the open-source viewing framework Graphle that we developed in . MouseMAP is based on the Prefuse Java visualization library, the Args4j command line parsing tool, and the SQLiteJDBC SQLite database driver. The basic functionality of mouseMAP allows querying one or multiple genes and retrieving the local network surrounding the query, with user-variable node number and confidence level cutoffs.Our public, web-based system features cross-comparison of different networks that highlights connections in the newly queried network vs. the previously queried network, which allows us to compare the connections between different tissues of the same query gene(s). Gene information, including annotation, phenotype and disease association is retrievable through the interface. To facilitate general public use, mouseMAP also dynamically generates figure descriptions based on the current query and network structure. […]

Pipeline specifications

Software tools MiMI, FNTM, Graphle, Prefuse
Databases BioGRID Reactome
Applications Miscellaneous, Protein interaction analysis
Organisms Mus musculus
Diseases Genetic Diseases, Inborn