1 - 30 of 30 results

CONEXIC / COpy Number and EXpression In Cancer

Integrates matched copy number (amplifications and deletions) and gene expression data from tumor samples to identify driving mutations and the processes they influence. CONEXIC is inspired by Module Networks (Segal et al, 2003), but has been augmented by a number of critical modifications that make it suitable for identifying drivers. CONEXIC uses a score-guided search to identify the combination of modulators that best explains the behavior of a gene expression module across tumor samples and searches for those with the highest score within the amplified or deleted region.

SNF / Similarity Network Fusion

A computational method for data integration. Briefly, SNF combines many different types of measurements (such as mRNA expression data, DNA methylation, miRNA expression and more - clinical data, questionnaires, image data, etc) for a given set of samples (e.g. patients). SNF first constructs a sample similarity network for each of the data types and then iteratively integrates these networks using a novel network fusion method. Working in the sample network space allows SNF to avoid dealing with different scale, collection bias and noise in different data types. Integrating data in a non-linear fashion allows SNF to take advantage of the common as well as complementary information in different data types.


Identifies differentially expressed genes driven by Copy Number Alterations (CNA) from samples with both gene expression and CNA data. iGC supports multiple input formats and users can define their own criteria for identifying differentially expressed genes driven by CNAs. In addition to microarray datasets, next-generation sequencing (NGS) data can be analyzed. By simultaneously considering both comparative genomic and transcriptomic data, it can provide better understanding of biological and medical questions.

RGCCA / Regularized and Sparse Generalized Canonical Correlation Analysis

Permits the analysis of several sets of variables (blocks) observed on the same group of individuals. RGCCA is a multiblock data analysis that extracts the information which is shared by the J-blocks of variables taking into account an a-priori graph of connections between blocks. The main aims of this package are: (i) to study the relationships between blocks and (ii) to identify subsets of variables of each block which are active in their relationships with the other blocks.

PREDA / Position RElated Data Analysis

Detects regional variations in genomics data. PREDA implements a procedure to analyze the relationships between data and physical genomic coordinates along chromosomes with the final aim of identifying chromosomal regions with likely relevant functional role. The software integrates high-throughput signals and structural information using a non-linear kernel regression with adaptive bandwidth. The integrative analysis is performed through a modular and flexible framework accommodating different types of functions and statistics.

PINS / Perturbation clustering for data INtegration and disease Subtyping

Provides an alternative to Consensus Clustering, a technique in machine learning, with the additional ability to integrate multiple types of data. PINS can be used to integrate many other high-throughput data types for disease characterization, understanding of disease mechanisms, or biomarker detection. It can also be used to integrate pharmacokinetic data and drug response data for drug development and repurposing.

RSNF / Robust Similarity Network Fusion

Improves the clustering performance significantly. RSNF is a multi-view clustering algorithm based on Similarity Network Fusion (SNF) and using robust affine graph construction. This method may be a useful tool for analysing human microbiome data by integrating different measurement of microbiome samples. The clustering performance of microbiome samples is significantly improved in synthetic and several real datasets according to several evaluation metrics.

PLRS / Piecewise Linear Regression Splines

Provides flexible modelling of the association between DNA copy number and mRNA expression. PLRS is particularly useful for (i) a detailed understanding of the relationship between DNA copy number and mRNA expression and (ii) powerful detection of copy number-induced sample subgroup-specific effects, thereby acknowledging heterogeneity of many cancers. The package can also be used for studying the effect of DNA copy number on microRNA expression.

SODEGIR / Significant Overlaps of Differentially Expressed and Genomic Imbalanced Regions

Allows the integration of copy number (CN), obtained from SNP mapping arrays, with transcriptional data, the identification of genome-wide, concurrent alterations of CN and regional GE in single tumor samples, and the extension of the integrative analysis to entire cancer datasets. These two issues are achieved in three steps, i.e. (i) the statistical estimation of CN and transcriptional scores at common gene positions from microarray probe-data; (ii) the identification of sets of consecutive genes along the genome characterized by an unusually large number of concurrently altered CN and GE across a single-sample; and (iii) the aggregation of SODEGIRs from different samples to obtain global signatures of tumor types.

Consensus Clustering

Provides a method to represent the consensus across multiple runs of a clustering algorithm. Consensus Clustering is a methodology that determines the number of clusters in the data and assess the stability of the discovered clusters. This method can be used to represent the consensus over multiple runs of a clustering algorithm with random restart to account for its sensitivity to the initial conditions. It also provides for a visualization tool to inspect cluster number, membership, and boundaries.

PFA / Pattern fusion analysis

Aligns local sample-patterns derived from each data type into a global sample-pattern to characterize phenotypes. PFA is a computational framework and a data-driven integration approach. Firstly, it obtains the local sample-patterns from all data types by principal component analysis (PCA). Then, it aligns those local sample-patterns to a common feature space and synthesizes the global sample-pattern across most data types. The adaptive optimal PFA could be extended to uncover more sophisticated biological features by integrating multi-layer heterogeneous data in time course.

ACE-it / Array CGH Expression integration tool

Helps to detect genes whose expression is affected by gene dosage within a series of samples. ACE-it is a statistical tool that allows a user-defined cut-off for contaminating samples within the groupings. This application assumes that expression increases with increased gene dosage. It was tested using array expression and array Comparative Genomic Hybridization (CGH) datasets from various institutes and platforms, including a breast tumor series.