Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Permits exploration and integration of highly dimensional datasets. mixOmics proposes multivariate statistical approaches to identify similarities between two heterogeneous datasets. It summarizes information in a smaller data set and aims to highlight the biological entities that are of potential relevance with a strong focus on graphical representation. This tool assists in finding signatures of vaccine effect and allows a better understanding of immunological mechanisms activated by the intervention.
A user-friendly web server for inferring a sub-network based on probabilistic logical querying. PheNetic extracts from an interactome, the sub-network that best explains genes prioritized through a molecular profiling experiment. Depending on its run mode, PheNetic searches either for a regulatory mechanism that gave explains to the observed molecular phenotype or for the pathways (in)activated in the molecular phenotype. The web server provides access to a large number of interactomes, making sub-network inference readily applicable to a wide variety of organisms. The inferred sub-networks can be interactively visualized in the browser.
An integrative platform with a web-based interface, which integrates four different modules for processing, annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and/or drugs of interest. IIS organizes the analysis in a project context and the user can create several projects protected by password. The project is a structure inside the system where researchers can develop and organize their thematic studies, choosing between two types: (i) chromatogram project or (ii) genes/metabolites/drugs project.
A web-based tool that enables functional enrichment analysis and visualization of high-throughput datasets. SAMNetWeb can analyse two distinct data types (e.g. mRNA expression and global proteomics) simultaneously across multiple experimental systems to identify pathways activated in these experiments and then visualize the pathways in a single interaction network. Through the use of a multi-commodity flow based algorithm that requires each experiment 'share' underlying protein interactions, SAMNetWeb can identify distinct and common pathways across experiments.
Identifies methylation quantitative trait loci at high sensitivity. tICA detects biological sources of data variation and gene modules whose expression variation across tumours is driven by copy-number of DNA methylation changes in a cancer context. This software can be applied to any multi-way data tensor to pinpoint complex patterns of variation correlating with phenotypes of interest and the underlying features driving these variations patterns.
A straightforward approach for the integrative analysis of data from different high-throughput technologies based on pathway and interaction models from public databases. pwOmics performs pathway-based level-specific data comparison of coupled human proteomic and genomic/transcriptomic datasets based on their log fold changes. Separate downstream and upstream analyses results on the functional levels of pathways, transcription factors and genes/transcripts are performed in the cross-platform consensus analysis. These provide a basis for the combined interpretation of regulatory effects over time. As high-throughput data are increasingly used to follow time-dependent biological regulation after pertubation, the main benefit of pwOmics is the cross-platform time series analysis functionality, but consensus analysis can be performed also on single time point measurements.
Identifies correlative modules in multi-dimensional genomics data. jNMF aims to detect subsets of Messenger RNA (mRNAs), micro-ARN (miRNAs) and methylation markers. It represents features across multiple datasets and reduces the complexity of the data. Moreover, this tool selects associations among sets of different types of variables. It highlights vertical associations between multiple regulatory levels and can reveal significantly disrupted pathways.
Statistically ranks predicted feedforward loops (FFLs) by their explanatory power to account for differential gene and miRNA expression between two biological conditions. dChip-GemiNi combines gene and microRNAs (miRNA) expression profiles available for a disease process and also incorporates regulatory network structure in the form of computationally identified transcription factors (TFs)-miRNA FFLs.
Provides a convenient tool for using a powerful constrained optimization method to reconstruct signaling and response pathways by integrating multiple ‘omic’ data. SteinerNet seeks a network composed of high-confidence interactions that ultimately link a subset of the omic hits either directly or through intermediate proteins. This is achieved by solving the prize-collecting Steiner tree (PCST) problem. SteinerNet serves a diverse range of researchers who would like to integrate multiple ‘omic’ data sources to reconstruct biologically meaningful pathways.
Contributes to a framework for easy integration of new analysis algorithms and simple interface for biologists to run and compare algorithms. miXGENE is a tool that permits users to learn from heterogeneous genomic measurements that make use of prior knowledge (PK). It can also give specific learning methods and suggests sample workflows relevant to the given task.
Studies classification-based investigations that leads to increased biological interpretability. LogMiNeR was applied to transcriptional profiling data to better understand differential influenza vaccination responses. It can be applied to classification of many immune as well as non-immune-mediated diseases. This tool presents distinct aspects of the underlying biology while maintaining predictive accuracy.
Simultaneously models the dependence and the heterogeneity of various data sources. BCC is a flexible clustering approach that models both an overall clustering and a clustering specific to each data source. In addition to multisource data, it may be used to compare clustering from different statistical models for a single homogeneous dataset. The software was applied to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas (TCGA).
A variety of learning strategies to boost prediction performance based on the use of all available data. We consider data integration via the use of multiple kernel learning supervised learning methods. We propose a scheme in which feature selection by statistical score is performed separately per data type and by pathway membership. We further consider the introduction of a confidence measure for the class assignment, both to remove some ambiguously labeled datapoints from the training data and to implement a cautious classifier that only makes predictions when the associated confidence is high.
Allows to integrate different data types. rMKL-LPP is an extension of the multiple kernel learning with dimensional reduction (MKL-DR) method. The Locality Preserving Projections (LPP) allows to conserve the sum of distances for each sample's k-Nearest Neighbors.
Improves the understanding of complex molecular interactions and disease mechanisms for integrative analysis, differential network analysis, and community detection. xMWAS recognizes and displays associations between genes, cytokines, and metabolites. It is based on existing algorithms and provides an automated framework for integrative and differential network analysis of up to four datasets from unpaired or paired study designs.
Classifies genes as differential or not differential based on a generalized correlation measure for multiple sequencing-based genomic variables. MARIO is a hierarchical Bayesian model approach for the parallel, integrative analysis of data from several genomic types. It also enables to facilitate the incorporation of information from functional genomic networks. It also allows to perform inference on the gene level even when the sample size is very small.
Performs penalized co-inertia analysis (CIA). pCIA is an R package that implements sparse co-inertia analysis and structured sparse co-inertia analysis models with two sparse CIA methods : sparse CIA (sCIA) and the structured sparse CIA (ssCIA), that both impose penalties on the CIA loading vectors. The software also allows cross validation for the selection of optimal tuning parameters in each model.
Predicts multivariate modeling with minimally biased variable selection incorporated into a repeated double cross-validation framework. MUVR is compatible with partial least squares (PLS) and random (RF) core modeling techniques and enables regression, classification and multi-level modeling. This software selects both minimal-optimal variable sets and all-relevant variables for classification, regression and multilevel analyses.
Permits detection of variables linked to disease diagnosis and progression in various multi-modal datasets. CausalMGM allows users to learn undirected and directed graphs over mixed data types. It processes by two steps: it calculates the undirected graph, and then utilizes local search strategies to prune-and-orient this graph. This method can serve for recovering known interactions between gene expression and clinical variables.
Allows the imputation of trans-omics “block missing” datasets. TOBMIkNN is based on an k-nearest neighbor (kNN) weighted algorithm able to preserve correlation structure. It provides an imputation strategy which aims to ease the integrative analysis of trans-omics data. This application was tested with a set of simulations and data from the Cancer Genome Atlas (TCGA).
Allows integration of multiple datasets of various types into a single exploratory analysis. mixKernel is a framework that can be used for multi-omics datasets. It realizes an analysis in two steps: (1) first it improves interpretability of the kernel principal component analysis (PCA) thanks to a generic procedure; and (2) then multi-omics breast cancer datasets are analyzed according to a kernel Self-Organizing Maps.
Automates the extraction of genes or single nucleotide polymorphisms (SNPs) information. GENEASE permits users to query multiple databases and retrieve SNP(s), gene(s), CpG site or disease information. It allows in-silico discovery of the non-coding regions. This tool is useful for multi-omic and phenotype data exploration, statistical enrichment, overlap analysis and visualization.
Facilitates interactive exploration and interpretation of multi-layered high-dimensional data. Grimon is a data visualization method that helps users to understand complex interplay across multi-omics networks by visualizing data on a graphical interface. It can be used on different data such as principal component analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (tSNE).
Permits analysis for high dimensional genetic and genomic data. sRDA is an algorithm that can identify multiple latent variables associated with the explanatory and response data set, with high sensitivity and specificity measures. This software implementation is compliant with parallel computing and therefore computational time can be further reduced. sRDA was developed to become an optimal statistical method for multiple omics sets analysis.
Permits multi-sample integration and processing of (1) known annotations from next generation sequencing (NGS) and (2) big heterogeneous genomic datasets to extract candidate targets for biomedical knowledge discovery. GMQL System supports queries (i.e., scripts) comparing billions of genomic regions, mainly on the basis of metric properties but also of arbitrary region attributes and of metadata content.
Performs feature selection, model prediction, and missing data incorporation. FBM provides a method inspired by integrative Bayesian analysis of genomics data (iBAG) model, to includes a third layer of missingness model to incorporate samples with missingness. The software applies spike-and-slab for feature selection and allows Bayesian false discovery rate (FDR) control. It was assessed using extensive simulations.
Integrates and analyzes large-scale gene expression and metabolite profile data sets along with diverse biological information such as biochemical pathways and Gene Ontology (GO) terms. Plant MetGenMAP is an analysis and visualization package for plant systems biology. The software consists of three functional components: data management, pathway browser, and data set analyzer. It can assist researchers to generate novel biological hypotheses and derive new conclusions from high-throughput omics data sets.
Allows general decomposition of variation for the integrated analysis of datasets. JIVE decomposes a dataset into a low-rank approximation capturing joint structure between data types, low-rank approximations capturing structure individual to each data type, and residual noise. It is applicable to datasets with more than two data types and has a simple algebraic interpretation. A JIVE analysis of gene expression and micro-RNA (miRNA) data on Glioblastoma Multiforme tumor samples reveals gene–miRNA associations and provides characterization of tumor types.
Allows users to work on the integrative genomic study. sMB2I incorporates multiple genomic data and intra/inter-block interactions by using a multi-block bipartite graph. It also assists the prediction of quantitative traits (survival time, gene expression) from the multi-block genomic data. This method can be applied to the human brain data of psychiatric disorders.
Allows users to work on the integrative genomic study. MB2I incorporates multiple genomic data and intra/inter-block interactions by using a multi-block bipartite graph. It also assists the prediction of quantitative traits (survival time, gene expression) from the multi-block genomic data. This method can be applied to the human brain data of psychiatric disorders.
A Mathematica package written in the Wolfram Language that provides bioinformatics utilities for analyzing dynamic omics datasets. MathIOmica addresses the necessity to integrate multiple omics information arising from dynamic profiling in a personalized medicine approach. It provides multiple tools to facilitate bioinformatics analysis, including importing data, annotating datasets, tracking missing values, normalizing data, clustering and visualizing the classification of data, carrying out annotation and enumeration of ontology memberships and pathway analysis. MathIOmica not only helps in the creation of new bioinformatics tools, but also in promoting interdisciplinary investigations, particularly from researchers in mathematical, physical science and engineering fields transitioning into genomics, bioinformatics and omics data integration.
Allows to discover prognostic cancer subtypes. PSDF groups patient samples on the basis of both gene expression and copy number alteration data. It also simultaneously distinguishes, on a sample-by-sample basis, between samples that can share concordant signal across the data types (fused) and those for which there is contradiction (unfused). The model was exemplified in prostate cancer and breast cancer.
Detects putative underlying molecular pathways. Omics Integrator allows integration of transcriptomic data together with other omic data using interactome data. It permits expression analysis and network analysis, this simplifies the work on multiples types of omic data. This package consists of two tools: Garnet (takes chromatin accessibility data) and Forest (identifies a parsimonious interaction network connecting a subset of userdefined omic data hits).
Implements state-of-the-art ensemble methods for module network inference. Lemon-Tree is a “one-stop shop” software suite for module network inference based on previously validated algorithms. The software is able to associate co-expression modules to multiple “regulator” types (expression regulators, structural DNA variants, phenotypic states, etc.) by assigning each of those independently as regulators of a module. It was benchmarked using large-scale datasets of somatic copy-number alterations and gene expression levels measured in glioblastoma samples from The Cancer Genome Atlas (TCGA).
A computational pipeline to retrieve biological pathways, gene networks, and central regulators critical for disease development. The Mergeomics web server pre-populates a wide range of publically available data sources. It provides curated genomic resources including tissue-specific expression quantitative trait loci, ENCODE functional annotations, biological pathways, and molecular networks, and offers interactive visualization of analytical results. Multiple computational tools including Marker Dependency Filtering (MDF), Marker Set Enrichment Analysis (MSEA), Meta-MSEA, and Weighted Key Driver Analysis (wKDA) can be used separately or in flexible combinations. Users can apply the pipeline to their own data in conjunction with any preloaded data to identify disease-associated pathways, gene networks, and key regulators.
Enables to examine the multi-omics integrated analysis and supplies users a way to study their own multi-omics data. It works on the integrated analysis of gene expression, DNA methylation, and genetic variations. BioVLAB-mCpG-SNP-EXPRESS allows user to explore the analysis result at the multiple levels such as the gene, gene set, pathway, and network, and also from the multiple perspectives such as DNA methylation, gene expression, and sequence variation in terms of phenotype differences.
Generates patient-specific predictions for the time to recurrence and survival, along with outcome risk predictions. CoxPath model is a path following algorithm for the L1-regularized Cox proportional hazards model. The algorithm handles copy number variation (CNV), gene expression, DNA methylation, and micro-RNA (miRNA) expression data. The prognostic signatures, may be useful for interpreting therapeutic outcomes for clinical trials.
Discovers disease subtypes that have distinct survival profiles. iBFE integrates diverse genomic data types such as the DNA methylation, mRNA expression and miRNA expression datasets of lung and kidney cancers produced by TCGA. It is designed to improve the disease diagnosis and prognosis. This tool follows three steps: (1) extracting features from individual type of datasets; (2) concatenating the extracted features; (3) extracting new features from concatenated features.
Provides a suite for trans-omics approaches. ECOMICS is composed of four components: (i) E-class, for taxonomic classification of prokaryotic and eukaryotic DNA sequences, (ii) FT2B to process nuclear magnetic resonance (NMR) spectra for downstream chemical analyses, (iii) Bm-Char which perform an assignment of specific compounds in lignocellulose, and (iv) HetMap to identify and visualize heterogeneous data and correlation matrices.
Creates dynamic genome-wide integrative omics models with two time points. SHIN+GO utilizes an unsupervized machine learning to build models. It can be used to estimate the frequency of gene functional annotations present in nodes made of clustered co-regulated genes with corresponding co-secreted proteins. This tool is useful for comparative transcriptomics of different strains or species.
Allows processing and analysis of different types of omics data and combination of their results following the non-parametric combination (NPC) principles. omicsNPC is a program that can be used for the integration of different omics data. It is able to include co-variates into the analysis and process sets of datasets that share only part of the samples. This tool produces biological insights with respect to analyzing data modalities in isolation.
Allows comparison of metagenomics and other meta-omics data. MetaComp is a graphical software that incorporates metagenomics, metatranscriptomics, metaproteomics and metabolomics data. The software provides a series of statistical analysis and the visualization for the comparison of functional, physiological and taxonomic signatures in two-, multi- and two-group sample tests. It can automatically select the proper statistical method in two-group sample test. MetaComp can be used in for revealing the relationship between environmental factors and meta-omic samples directly through a nonlinear regression analysis.
Aims to improve first-pass screening capabilities for large datasets. MIPHENO is a program that bring samples into the same distribution allowing for dataset-wide comparisons. It assists in processing of large datasets prior to Meta analyses combining different data sets from high-throughput experiments. It provides in summary, this tool is a valuable processing platform that can be applied to very diverse measurement types (e.g. gene expression, enzyme kinetics, metabolite amounts).
Assists users in implementing bootstrap imputation with variable selection. BISS is an R package that includes variable selection data with missing values. This built-in imputation model is provided for continuous data (normal linear regression), binary data (logistic regression) and poisson data (log-linear regression). It can also be applied to a wide range of settings.
Integrates vertical multi-omics data for disease subtype discovery using overlapping group lasso and tight clustering in the iCluster latent variable model. GSTiCluster fits a latent variable model of integrative clustering to accommodate the information of inter-omics regulation flows, and to exclude outlier samples scattering away from the tight clusters.
Achieves the purpose of meta-analysis as jointly leveraging multiple omics data. Meta-SVM is a meta-analytic support vector machine (SVM) that can accommodate multiple omics data, making it possible to detect consensus genes associated with diseases across studies. The objective function of Meta-SVM applies the hinge loss and the sparse group lasso. It also facilitates identifying potential biomarkers and elucidating the disease process.
Consists of a Bayesian shrinkage approach and an associated expectation maximization (EM) algorithm for variable selection with structured covariate. EMSHS attributes independent Laplace priors on the regression coefficients, while incorporating the underlying graph information. This tool iteratively adjusts the weights based on the current coefficients and the underlying graph knowledge.
Characterizes associations between two data types. sSCCA is a method based on canonical correlation analysis (CCA) approaches. The application is based on a simultaneous combination of structural information integration for each data type and investigation of overall dependency structure between transcripts and metabolites. It can also be used with sparse covariance matrices.