Permits exploration and integration of highly dimensional datasets. mixOmics proposes multivariate statistical approaches to identify similarities between two heterogeneous datasets. It summarizes information in a smaller data set and aims to highlight the biological entities that are of potential relevance with a strong focus on graphical representation. This tool assists in finding signatures of vaccine effect and allows a better understanding of immunological mechanisms activated by the intervention.
A user-friendly web server for inferring a sub-network based on probabilistic logical querying. PheNetic extracts from an interactome, the sub-network that best explains genes prioritized through a molecular profiling experiment. Depending on its run mode, PheNetic searches either for a regulatory mechanism that gave explains to the observed molecular phenotype or for the pathways (in)activated in the molecular phenotype. The web server provides access to a large number of interactomes, making sub-network inference readily applicable to a wide variety of organisms. The inferred sub-networks can be interactively visualized in the browser.
An integrative platform with a web-based interface, which integrates four different modules for processing, annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and/or drugs of interest. IIS organizes the analysis in a project context and the user can create several projects protected by password. The project is a structure inside the system where researchers can develop and organize their thematic studies, choosing between two types: (i) chromatogram project or (ii) genes/metabolites/drugs project.
A web-based tool that enables functional enrichment analysis and visualization of high-throughput datasets. SAMNetWeb can analyse two distinct data types (e.g. mRNA expression and global proteomics) simultaneously across multiple experimental systems to identify pathways activated in these experiments and then visualize the pathways in a single interaction network. Through the use of a multi-commodity flow based algorithm that requires each experiment 'share' underlying protein interactions, SAMNetWeb can identify distinct and common pathways across experiments.
Identifies methylation quantitative trait loci at high sensitivity. tICA detects biological sources of data variation and gene modules whose expression variation across tumours is driven by copy-number of DNA methylation changes in a cancer context. This software can be applied to any multi-way data tensor to pinpoint complex patterns of variation correlating with phenotypes of interest and the underlying features driving these variations patterns.
A straightforward approach for the integrative analysis of data from different high-throughput technologies based on pathway and interaction models from public databases. pwOmics performs pathway-based level-specific data comparison of coupled human proteomic and genomic/transcriptomic datasets based on their log fold changes. Separate downstream and upstream analyses results on the functional levels of pathways, transcription factors and genes/transcripts are performed in the cross-platform consensus analysis. These provide a basis for the combined interpretation of regulatory effects over time. As high-throughput data are increasingly used to follow time-dependent biological regulation after pertubation, the main benefit of pwOmics is the cross-platform time series analysis functionality, but consensus analysis can be performed also on single time point measurements.
Identifies correlative modules in multi-dimensional genomics data. jNMF aims to detect subsets of Messenger RNA (mRNAs), micro-ARN (miRNAs) and methylation markers. It represents features across multiple datasets and reduces the complexity of the data. Moreover, this tool selects associations among sets of different types of variables. It highlights vertical associations between multiple regulatory levels and can reveal significantly disrupted pathways.
Provides a convenient tool for using a powerful constrained optimization method to reconstruct signaling and response pathways by integrating multiple ‘omic’ data. SteinerNet seeks a network composed of high-confidence interactions that ultimately link a subset of the omic hits either directly or through intermediate proteins. This is achieved by solving the prize-collecting Steiner tree (PCST) problem. SteinerNet serves a diverse range of researchers who would like to integrate multiple ‘omic’ data sources to reconstruct biologically meaningful pathways.
Statistically ranks predicted feedforward loops (FFLs) by their explanatory power to account for differential gene and miRNA expression between two biological conditions. dChip-GemiNi combines gene and microRNAs (miRNA) expression profiles available for a disease process and also incorporates regulatory network structure in the form of computationally identified transcription factors (TFs)-miRNA FFLs.
Contributes to a framework for easy integration of new analysis algorithms and simple interface for biologists to run and compare algorithms. miXGENE is a tool that permits users to learn from heterogeneous genomic measurements that make use of prior knowledge (PK). It can also give specific learning methods and suggests sample workflows relevant to the given task.
Studies classification-based investigations that leads to increased biological interpretability. LogMiNeR was applied to transcriptional profiling data to better understand differential influenza vaccination responses. It can be applied to classification of many immune as well as non-immune-mediated diseases. This tool presents distinct aspects of the underlying biology while maintaining predictive accuracy.
Simultaneously models the dependence and the heterogeneity of various data sources. BCC is a flexible clustering approach that models both an overall clustering and a clustering specific to each data source. In addition to multisource data, it may be used to compare clustering from different statistical models for a single homogeneous dataset. The software was applied to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas (TCGA).
A variety of learning strategies to boost prediction performance based on the use of all available data. We consider data integration via the use of multiple kernel learning supervised learning methods. We propose a scheme in which feature selection by statistical score is performed separately per data type and by pathway membership. We further consider the introduction of a confidence measure for the class assignment, both to remove some ambiguously labeled datapoints from the training data and to implement a cautious classifier that only makes predictions when the associated confidence is high.
Allows to integrate different data types. rMKL-LPP is an extension of the multiple kernel learning with dimensional reduction (MKL-DR) method. The Locality Preserving Projections (LPP) allows to conserve the sum of distances for each sample's k-Nearest Neighbors.
Classifies genes as differential or not differential based on a generalized correlation measure for multiple sequencing-based genomic variables. MARIO is a hierarchical Bayesian model approach for the parallel, integrative analysis of data from several genomic types. It also enables to facilitate the incorporation of information from functional genomic networks. It also allows to perform inference on the gene level even when the sample size is very small.
Automates the extraction of genes or single nucleotide polymorphisms (SNPs) information. GENEASE permits users to query multiple databases and retrieve SNP(s), gene(s), CpG site or disease information. It allows in-silico discovery of the non-coding regions. This tool is useful for multi-omic and phenotype data exploration, statistical enrichment, overlap analysis and visualization.
Facilitates interactive exploration and interpretation of multi-layered high-dimensional data. Grimon is a data visualization method that helps users to understand complex interplay across multi-omics networks by visualizing data on a graphical interface. It can be used on different data such as principal component analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (tSNE).
Allows integration of multiple datasets of various types into a single exploratory analysis. mixKernel is a framework that can be used for multi-omics datasets. It realizes an analysis in two steps: (1) first it improves interpretability of the kernel principal component analysis (PCA) thanks to a generic procedure; and (2) then multi-omics breast cancer datasets are analyzed according to a kernel Self-Organizing Maps.
Permits analysis for high dimensional genetic and genomic data. sRDA is an algorithm that can identify multiple latent variables associated with the explanatory and response data set, with high sensitivity and specificity measures. This software implementation is compliant with parallel computing and therefore computational time can be further reduced. sRDA was developed to become an optimal statistical method for multiple omics sets analysis.
Improves the understanding of complex molecular interactions and disease mechanisms for integrative analysis, differential network analysis, and community detection. xMWAS recognizes and displays associations between genes, cytokines, and metabolites. It is based on existing algorithms and provides an automated framework for integrative and differential network analysis of up to four datasets from unpaired or paired study designs.
Integrates and analyzes large-scale gene expression and metabolite profile data sets along with diverse biological information such as biochemical pathways and Gene Ontology (GO) terms. Plant MetGenMAP is an analysis and visualization package for plant systems biology. The software consists of three functional components: data management, pathway browser, and data set analyzer. It can assist researchers to generate novel biological hypotheses and derive new conclusions from high-throughput omics data sets.
Allows general decomposition of variation for the integrated analysis of datasets. JIVE decomposes a dataset into a low-rank approximation capturing joint structure between data types, low-rank approximations capturing structure individual to each data type, and residual noise. It is applicable to datasets with more than two data types and has a simple algebraic interpretation. A JIVE analysis of gene expression and micro-RNA (miRNA) data on Glioblastoma Multiforme tumor samples reveals gene–miRNA associations and provides characterization of tumor types.
Allows users to work on the integrative genomic study. sMB2I incorporates multiple genomic data and intra/inter-block interactions by using a multi-block bipartite graph. It also assists the prediction of quantitative traits (survival time, gene expression) from the multi-block genomic data. This method can be applied to the human brain data of psychiatric disorders.
Allows users to work on the integrative genomic study. MB2I incorporates multiple genomic data and intra/inter-block interactions by using a multi-block bipartite graph. It also assists the prediction of quantitative traits (survival time, gene expression) from the multi-block genomic data. This method can be applied to the human brain data of psychiatric disorders.
A Mathematica package written in the Wolfram Language that provides bioinformatics utilities for analyzing dynamic omics datasets. MathIOmica addresses the necessity to integrate multiple omics information arising from dynamic profiling in a personalized medicine approach. It provides multiple tools to facilitate bioinformatics analysis, including importing data, annotating datasets, tracking missing values, normalizing data, clustering and visualizing the classification of data, carrying out annotation and enumeration of ontology memberships and pathway analysis. MathIOmica not only helps in the creation of new bioinformatics tools, but also in promoting interdisciplinary investigations, particularly from researchers in mathematical, physical science and engineering fields transitioning into genomics, bioinformatics and omics data integration.
Allows to discover prognostic cancer subtypes. PSDF groups patient samples on the basis of both gene expression and copy number alteration data. It also simultaneously distinguishes, on a sample-by-sample basis, between samples that can share concordant signal across the data types (fused) and those for which there is contradiction (unfused). The model was exemplified in prostate cancer and breast cancer.
Detects putative underlying molecular pathways. Omics Integrator allows integration of transcriptomic data together with other omic data using interactome data. It permits expression analysis and network analysis, this simplifies the work on multiples types of omic data. This package consists of two tools: Garnet (takes chromatin accessibility data) and Forest (identifies a parsimonious interaction network connecting a subset of userdefined omic data hits).
Implements state-of-the-art ensemble methods for module network inference. Lemon-Tree is a “one-stop shop” software suite for module network inference based on previously validated algorithms. The software is able to associate co-expression modules to multiple “regulator” types (expression regulators, structural DNA variants, phenotypic states, etc.) by assigning each of those independently as regulators of a module. It was benchmarked using large-scale datasets of somatic copy-number alterations and gene expression levels measured in glioblastoma samples from The Cancer Genome Atlas (TCGA).
A computational pipeline to retrieve biological pathways, gene networks, and central regulators critical for disease development. The Mergeomics web server pre-populates a wide range of publically available data sources. It provides curated genomic resources including tissue-specific expression quantitative trait loci, ENCODE functional annotations, biological pathways, and molecular networks, and offers interactive visualization of analytical results. Multiple computational tools including Marker Dependency Filtering (MDF), Marker Set Enrichment Analysis (MSEA), Meta-MSEA, and Weighted Key Driver Analysis (wKDA) can be used separately or in flexible combinations. Users can apply the pipeline to their own data in conjunction with any preloaded data to identify disease-associated pathways, gene networks, and key regulators.
Enables to examine the multi-omics integrated analysis and supplies users a way to study their own multi-omics data. It works on the integrated analysis of gene expression, DNA methylation, and genetic variations. BioVLAB-mCpG-SNP-EXPRESS allows user to explore the analysis result at the multiple levels such as the gene, gene set, pathway, and network, and also from the multiple perspectives such as DNA methylation, gene expression, and sequence variation in terms of phenotype differences.
Generates patient-specific predictions for the time to recurrence and survival, along with outcome risk predictions. CoxPath model is a path following algorithm for the L1-regularized Cox proportional hazards model. The algorithm handles copy number variation (CNV), gene expression, DNA methylation, and micro-RNA (miRNA) expression data. The prognostic signatures, may be useful for interpreting therapeutic outcomes for clinical trials.
Discovers disease subtypes that have distinct survival profiles. iBFE integrates diverse genomic data types such as the DNA methylation, mRNA expression and miRNA expression datasets of lung and kidney cancers produced by TCGA. It is designed to improve the disease diagnosis and prognosis. This tool follows three steps: (1) extracting features from individual type of datasets; (2) concatenating the extracted features; (3) extracting new features from concatenated features.
Provides a suite for trans-omics approaches. ECOMICS is composed of four components: (i) E-class, for taxonomic classification of prokaryotic and eukaryotic DNA sequences, (ii) FT2B to process nuclear magnetic resonance (NMR) spectra for downstream chemical analyses, (iii) Bm-Char which perform an assignment of specific compounds in lignocellulose, and (iv) HetMap to identify and visualize heterogeneous data and correlation matrices.
Allows processing and analysis of different types of omics data and combination of their results following the non-parametric combination (NPC) principles. omicsNPC is a program that can be used for the integration of different omics data. It is able to include co-variates into the analysis and process sets of datasets that share only part of the samples. This tool produces biological insights with respect to analyzing data modalities in isolation.
Allows comparison of metagenomics and other meta-omics data. MetaComp is a graphical software that incorporates metagenomics, metatranscriptomics, metaproteomics and metabolomics data. The software provides a series of statistical analysis and the visualization for the comparison of functional, physiological and taxonomic signatures in two-, multi- and two-group sample tests. It can automatically select the proper statistical method in two-group sample test. MetaComp can be used in for revealing the relationship between environmental factors and meta-omic samples directly through a nonlinear regression analysis.
Aims to improve first-pass screening capabilities for large datasets. MIPHENO is a program that bring samples into the same distribution allowing for dataset-wide comparisons. It assists in processing of large datasets prior to Meta analyses combining different data sets from high-throughput experiments. It provides in summary, this tool is a valuable processing platform that can be applied to very diverse measurement types (e.g. gene expression, enzyme kinetics, metabolite amounts).
Integrates vertical multi-omics data for disease subtype discovery using overlapping group lasso and tight clustering in the iCluster latent variable model. GSTiCluster fits a latent variable model of integrative clustering to accommodate the information of inter-omics regulation flows, and to exclude outlier samples scattering away from the tight clusters.
Achieves the purpose of meta-analysis as jointly leveraging multiple omics data. Meta-SVM is a meta-analytic support vector machine (SVM) that can accommodate multiple omics data, making it possible to detect consensus genes associated with diseases across studies. The objective function of Meta-SVM applies the hinge loss and the sparse group lasso. It also facilitates identifying potential biomarkers and elucidating the disease process.
Assists users in detecting active pathways. MILPs is a mixed integer linear programming model that identifies cellular differentially expressed pathways responding to external conditions or diseases by using the molecular network topology and gene expression information. This method classifies differentially expressed pathways that can give deep insights into underlying molecular mechanisms. Finally, it can ensure connectivity of the identified pathway.
Permits curation, validation, and exploration of knowledge assemblies encoded in Biological Expression Language (BEL). BEL Commons implements several functions: (1) a network uploader and validator, (2) user rights and project management, (3) a query builder, (4) an interactive biological network explorer, and (5) an analytical service. It can be used to compare multiple unbiased candidate mechanism perturbation amplitude (UCMPA) experiments across different data sets.
Allows integrative study of heterogeneous omics data. OmicsNet is based on a multilayer network, composed of multiple layers. The method is associated with a weight calculated based on the correlation between those features. It computes path coefficients between genotype and phenotype through the weighted multilayer network. This tool can be useful for biological applications such as functional analysis, biomarker discovery or predictive analysis.
Represents a weighted power scoring framework. The WPBS method entails: (1) extraction of pairwise similarity of yeast Saccharomyces cerevisiae genes, (2) separately rescoring the similarities, obtained from different data sources, (3) power and weight coefficient estimation and then integration of the positive predictive values (PPVs), (4) predicting functions of classified as well as unclassified genes from clusters. The function of a gene is predicted by calculating the functional enrichment of the cluster using Munich Information for Protein Sequences (MIPS) annotation.
Creates interactomes for both transcriptomes and proteomes. inteGREAT studies their network structures to estimate the extent by which a gene product and its related partners are similar across different sources of data while different between cellular phenotypes. It can be used to discover differentiating tumor biomarkers. This tool can attribute confidence of integration to each gene representing its transcript and protein expressions.
Deduces topologies between datasets. iTOP is composed of two parts: (1) a measure of similarity between datasets and (2) an algorithm that employs the similarity measure to conduct structure learning. It employs a RV coefficient extended for partial matrix correlations to work. This tool can serve to generate hypotheses and evaluates them by inspecting the matrix correlations and their confidence intervals returned by the method.
Fragments pathways into functional modules and uses these to predict phenotypes. SIMMS uses arbitrary molecular data types to identify dysregulated pathways and create functional biomarkers. It enables integration of molecular profiles with functional networks such as protein-protein interaction (PPI) networks. Moreover, this tool requires patient outcome data as dependent variable.
Allows efficient decomposition of any type of genomic data represented as a numerical matrix. Bratwurst provides functionalities to identify patterns in different types of omics data with non-negative matrix factorization (NMF). It selects characteristics very specific to the different patterns and integrates the patterns extracted from the different layers of omics layers. This software can be applied to any omics data type and is useful for multi-omics integration.
Decomposes the sources of heterogeneity in multi-omics data sets. MOFA simplifies a range of different downstream analyses, including automatic characterization of the inferred factors. It was applied to a study of chronic lymphocytic leukaemia (CLL). This tool has recognized the drivers of variation in CLL, a clinically and biologically heterogeneous disease. It allows for disentangling molecular variation into distinct components.
Analyzes the genome dynamics. This method employs network-based approaches including von Neumann graph entropy, network centrality, and multilayer network theory to proceed. It includes the dynamic genome architecture and transcription from experimental data to study the 4D nucleome (4DN). This tool is useful to recognize important topological properties of genome structure. It can serve to build a multi-way dynamical system from genome-wide form-function data.
Dr. Yashwanth Subbannayya obtained his M.Sc. degree in Medical Biochemistry from Manipal University. He qualified the competitive CSIR-UGC National Eligibility Test and joined the Institute of Bioinformatics, Bangalore as a UGC Junior Research Fellow. As part of his Ph.D. work, he studied the molecular mechanisms of gastric cancer in clinical specimens using quantitative proteomic technologies. This study, the results of which were published in Cancer Biology and Therapy, yielded a novel therapeutic target for gastric cancer- CAMKK2. Further, he also studied the serum proteome of gastric cancer patients and developed assays for potential markers using the revolutionary multiple reaction monitoring approach. The results of this study were published in Journal of Proteomics. In addition to his research work, he also trained extensively in sample preparation for mass spectrometry, fractionation techniques and gained expertise in quantitative proteomic techniques and data analysis. In addition, he also trained extensively in various validation platforms including immunohistochemsitry, multiple reaction monitoring and Western blot. He has also worked as a curator for several biological databases including NetPath, Human Protein Reference Database (HPRD) and Breast cancer database. His work in various research projects have yielded him 23 publications either as lead author or co-author in peer reviewed journals. He is a reviewer for the journal Proteomics.
Dr. Yashwanth Subbannayya joined the YU-IOB Center for Systems Biology and Molecular Medicine in June, 2015. During the initial period, his job consisted of assisting other personnel of the university in the establishment of YU-IOB Center for Systems Biology and Molecular Medicine. He was also involved in training of Ph.D. students in biological aspects. After the establishment of the center, he trained in cell culture techniques and metabolomics analysis. At YU-IOB CSBMM, he is studying the molecular mechanisms in various cancers including oral cancer. In addition, he is studying the molecular mechanisms as well as the metabolic constituents of traditional medicine formulations using mass spectrometry technologies. In June 2016, he convened the national symposium “Genomics in clinical practice: Future of precision medicine” held at Yenepoya University on June 1 and 2, 2016. The resource persons included 16 individuals from various academic organizations as well as industry. The symposium was attended by 218 participants from 24 institutions around India. He is a member of the Scientific Review Board of Yenepoya Research Centre where he facilitates timely scientific review of research projects.