1 - 50 of 86 results


Provides class infrastructure and associated methods to construct an Illumina analysis workflow pipeline starting with raw data through functional analysis. Besides supporting the existing algorithms for microarray data, the lumi package includes several unique parts: (i) a variance-stabilizing transformation that utilizes the technical replicates available on the Illumina microarray; (ii) normalization algorithms designed for Illumina microarray data and; iii) the nucleotide universal identifier annotation packages.

RPA / Robust Probabilistic Averaging

Analyzes the reliability of individual probes directly from gene expression data. A major advantage of the proposed approach is its capability to detect unreliable probes independently of physical models or external, constantly updated information such as genomic sequence data. RPA can be useful in many applications, including evaluation of the end results of gene expression analysis, and recognition of potentially unknown probe-level error sources. It can be also used to quantify the uncertainty in the measurements and in designing the probes, and is also utilized by our model to provide robust estimates of differential gene expression.


An easy-to-use application for microarray, RNA-Seq and metabolomics analysis. For splicing sensitive platforms (RNA-Seq or Affymetrix Exon, Gene and Junction arrays), AltAnalyze will assess alternative exon (known and novel) expression along protein isoforms, domain composition and microRNA targeting. In addition to splicing-sensitive platforms, AltAnalyze provides comprehensive methods for the analysis of other data (RMA summarization, batch-effect removal, QC, statistics, annotation, clustering, network creation, lineage characterization, alternative exon visualization, gene-set enrichment and more).


A package that provides a report with diagnostic plots for one or two colour microarray data. After preparation of the data, a single command line is used to create the report. The quality metrics assess reproducibility, identify apparent outlier arrays and compute measures of signal-to-noise ratio. arrayQualityMetrics handles most current microarray technologies and is amenable to use in automated analysis pipelines or for automatic report generation, as well as for use by individuals. Its main benefits are its simplicity of use, the ability to have the same report for different types of platforms, and the opportunity for users or developers to extend it for their needs. arrayQualityMetrics can be used for individual data analyses or in routine data production pipelines, to provide fast uniform reporting. The diagnosis of quality remains, in principle, a context-dependent judgement, but this tool provides powerful, automated, objective and comprehensive instruments on which to base a decision.


A correction method whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations. GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip. Given a reasonable number of dye-swapped pairs of hybridizations, or of same vs. same hybridizations, both the gene- and slide-biases can be estimated and corrected using the GASSCO method.

CoGAPS / Coordinated Gene Activity in Pattern Sets

An R/C++ package to identify patterns and biological process activity in transcriptomic data. CoGAPS provides an integrated package for isolating gene expression driven by a biological process, enhancing inference of biological processes from transcriptomic data. It improves on other enrichment measurement methods by combining a Markov chain Monte Carlo (MCMC) matrix factorization algorithm (GAPS) with a threshold-independent statistic inferring activity on gene sets. coGAPS infers biological activity by identifying overlapping, coregulated sets of genes and applying Z-score based statistics. It can be used to isolate transcription factor (TF) or BP activity in datasets of thousands of genes and tens to thousands of samples. The software is provided as open source C++ code built on top of JAGS software with an R interface.

ISVA / Independent Surrogate Variable Analysis

Identifies features correlating with a phenotype of interest in the presence of potential confounding factors. Using simulated data, we show that ISVA performs well in identifying confounders as well as outperforming methods which do not adjust for confounding. Using four large-scale Illumina Infinium DNA methylation datasets subject to low signal to noise ratios and substantial confounding by beadchip effects and variable bisulfite conversion efficiency, we show that ISVA improves the identifiability of confounders and that this enables a framework for feature selection that is more robust to model misspecification and heterogeneous phenotypes. Finally, we demonstrate similar improvements of ISVA across four mRNA expression datasets. Thus, ISVA should be useful as a feature selection tool in studies that are subject to confounding.

snm / Supervised Normalization of Microarrays

Provides a modeling strategy especially designed for normalizing high-throughput genomic data. The underlying premise of our approach is that the data is a function of what they refer to as study-specific variables. These variables are either biological variables that represent the target of the statistical analysis, or adjustment variables that represent factors arising from the experimental or biological setting the data is drawn from. The SNM approach aims to simultaneously model all study-specific variables in order to more accurately characterize the biological or clinical variables of interest.


Facilitates the analysis of two colour cDNA microarray data. It aims to provide quality assured and normalized data. arrayMagic bridges the gap between the image quantification software and subsequent statistical and explorative analyses like testing for differential expression or classification. It simplifies the task of building processing pipelines that are reproducible, which means that even for idiosyncratic experimental designs and non-trivial combinations and selections of the data the whole procedure, from raw data to normalized, quality-controlled, annotated and summarized data, is documented in a not too verbose script that can at any time be re-run or extended.


A package for the treatment and analysis of batch effects based in high-dimensional molecular data via batch effect adjustment and addon quantile normalization. bapred implements a plot for the visualization of batch effects using principal component analysis. The main functions of the package for batch effect adjustment are ba and baaddon which enable batch effect removal and addon batch effect removal respectively. Another important function is bametric which is a wrapper function for all implemented methods for evaluating the success of batch effect removal.

MDQC / Mahalanobis Distance Quality Control

A package which provides a multivariate approach to evaluate the quality of an array that examines the ‘Mahalanobis distance’ of its quality attributes from those of other arrays. MDQC flags problematic arrays based on the idea of outlier detection, i.e. it flags those arrays whose quality attributes jointly depart from those of the bulk of the data. Its advantage is that it has a clear statistical foundation, it uses the correlation structure of the various QC measures, it is easy to apply, and it is computationally lightweight. These properties make MDQC a useful diagnostic technique suitable for large datasets. MDQC performs a robust multivariate analysis of the quality measures provided in the QC report while taking into account their correlation structure.


A package for the automatic detection and masking of blemishes in HDONA microarray chips. Harshlight’s algorithm combines image analysis techniques with statistical approaches to recognize three types of defects frequent in Affymetrix microarray chips: extended, compact, and diffuse defects. It provides a way to safely identify blemishes of different nature and correct the intensity values of the batch of chips provided by the user. The corrections made by Harshlight improve the reliability of the expression values when the chips are further analyzed with other programs, such as GCRMA and MAS5.


A package based on a single-array preprocessing algorithm that retains the advantages of multiarray algorithms and removes certain batch effects by downweighting probes that have high between-batch residual variance. By using a large biologically diverse database of microarrays from a large number of different laboratories spanning several years, the fRMA algorithm is able to differentiate between outliers and probes that show a consistent susceptibility to batch effects. These batchy probes are downweighted during summarization to minimize their effect on expression estimates. The frmaTools package which allows users to create their own frozen parameter vectors, has also been updated to work with oligo GeneFeatureSet and ExonFeatureSet objects. This allows users to create custom vectors for the HuEx and HuGene platforms and to implement fRMA on other Affymetrix Exon and Gene ST platforms.


A web-based program for processing microarray data. In completely automated fashion, ExpressYourself will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts.


A package for ranking differentially expressed gene expression time courses through Gaussian process regression. gprege fits two GPs with the an RBF (+ noise diagonal) kernel on each profile. One GP kernel is initialised wih a short lengthscale hyperparameter, signal variance as the observed variance and a zero noise variance. It is optimised via scaled conjugate gradients (netlab). A second GP has fixed hyperparameters: zero inverse-width, zero signal variance and noise variance as the observed variance. The log-ratio of marginal likelihoods of the two hypotheses acts as a score of differential expression for the profile. Comparison via ROC curves is performed against BATS.

XPN / cross-Platform Normalization

Allows to merge two gene expression studies. XPN is a block model-based method that has three parameters: the number of row and column clusters (K and L) and the number of basic iterations B. In principle, the XPN method procedure can be used with any clustering method that produces a pre-specified number of clusters from a given set of vectors, or with resampling, based improvements of such methods. The software was applied the three datasets without incurring substantial overfitting.


forum (1)
Combines raw data of different microarray platforms into one virtual array. virtualArray consists of several functions that act subsequently in a semi-automatic way. Doing as much of the data combination and letting the user concentrate on analysing the resulting virtual array. Using this software package, researchers can easily integrate their own microarray data with data from public repositories or other sources that are based on different microarray chip types.


Removes unwanted variation from gene expression data. RUVnormalize combined methods to estimate and modify variation from gene expression data when the factor of interest is also unobserved. These methods are as follows: (1) one uses the negative control gene-based estimator of unwanted factors and estimates the effect of these factors on gene expression under a random effect model; the second (2) relies on replicate samples and estimates the unwanted variation in accordance with the variation observed in differences of replicates.


A package for normalization of two-color microarray data. CALIB is based on the measurements of external controls and estimates an absolute target level for each gene and condition pair, as opposed to working with log-ratios as a relative measure of expression. Moreover, it makes no assumptions regarding the distribution of gene expression divergence. The underlying method relies on the presence of external control spikes to estimate the parameters of a calibration model, which are then used to obtain absolute expression levels for all genes. It provides an alternative solution to the standard ratio-based normalization, which is particularly applicable in cases where, either the GNA is violated and no alternative solutions exist, or for applications where absolute expression levels are more convenient than ratios. Besides the normalization procedure, CALIB provides some convenient visualization tools for quality control of the experimental protocol based on externally added control spikes.


A package that assesses RNA quality of Affymetrix expression data. The AffyRNADegradation package extends the Bioconductor package affy and integrates well in a typical microarray analysis workflow. All calculations are performed directly on the AffyBatch object and carried out separately for each particular microarray hybridization in a single-chip approach. Our approach corrects the 3′/5′-bias on the level of raw probe intensities, which can afterward be processed with any method. The runtime is about 2 min and 3 min per sample for index and distance based corrections, respectively. Because each chip is processed independently, arbitrarily large data sets can be processed.

EXPANDER / EXpression Analyzer and DisplayER

An integrated software platform for the analysis of microarray gene expression data. EXPANDER is designed to support all the stages of microarray data analysis, from raw data normalization to inference of transcriptional regulatory networks. The microarray analysis starts with importing the data into and is followed by normalization and filtering. Then, clustering and network-based analyses are performed. The gene groups identified are tested for enrichment in function, co-regulation (using transcription factor and microRNA target predictions) or co-location.


Provides the implementation of distance weighted discrimination (DWD) using an interior point method for the solution of second order cone programming problems. DWD is related to, and has been shown to be superior to, the support vector machine in situations that are fundamental to bioinformatics, such as very high dimensional data. DWD has proven to be very useful for several fundamental bioinformatics tasks, including classification, data visualization and removal of biases, such as batch effects.


Implements Partial Least Squares regression to extract the hidden signals of sample-specific heterogeneity in the data and uses them to find the genes that are actually correlated with the phenotype of interest. svapls that can be used to identify several types of unknown sample-specific sources of heterogeneity in a gene expression study and adjust for them in order to provide a more accurate inference on the original expression pattern of the genes over different varieties of samples.

VBMP / Variational Bayesian Multinomial Probit Regression

Features multinomial probit regression with Gaussian Process priors and estimates class posterior probabilities employing fast variational approximations to the full posterior. VBMP is an R package for Gaussian Process classification of data over multiple classes. It incorporates feature weighting by means of Automatic Relevance Determination. The vbmp package implements a VB approach to classification of multi-class datasets. This non-parametric approach is developed within a probabilistic framework for Bayesian inference, which yields to efficient sparse approximations by optimizing a strict lower bound of the marginal likelihood of a multinomial probit regression model.


Contains functions for pre-processing Affymetrix data using the RMA+ (the Extrapolation Strategy) and the RMA++ (Extrapolation Averaging) methods. RefPlus is implemented in the R language. RMA+ is an extension of the RMA algorithm that calculates the probe set intensities of a microarray using a pre-stored RMA model fitted on previously obtained microarrays, e.g. reference microarrays. RMA++ is a further extension based on the RMA+ method. This package depends on the affyPLM package.

Oligo package

Implements a unified framework for preprocessing microarray data and interfaces with other BioConductor tools for downstream analysis. The Oligo package provides array coordinates, feature types, sequences, feature names and other relevant information for preprocessing. Developers can use oligo solutions to facilitate the integration of their tools with BioConductor. They also benefit from the unified model that the package makes available, as the consistency in data delivery and handling improves efficiency.


Offers reliable and automated analysis of large-scale SRM differential expression studies. To quantify monitored targets, Ariadne exploits metadata imported from the transition lists, and targets can be filtered according to mProphet output. Signal processing and statistical learning approaches are combined to compute peptide quantifications. To robustly estimate absolute abundances, the external calibration curve method is applied, ensuring linearity over the measured dynamic range.


A package based on an intensity-dependent normalization method for microarrays that is fast, simple and can include weighing of observations. TurboNorm is based on the P-spline scatterplot smoother using all data points for normalization, it uses invariant features without requirement to cover the complete intensity range. The method compensates for unequal coverage by using weights for all features, while the invariant features are given a higher weight. TurboNorm contains, besides a loess-like function for scatterplot smoothing, wrapper functions for both single and two-colour microarray data normalization. It also contains a function for adding the fitted curves on plots produced by the lattice package. The method has comparable properties to lowess and loess, but its lower computational complexity allows it to be faster and more memory efficient.

AMEN / Annotation Mapping Expression and Network

Enables biological and medical researchers with basic bioinformatics training to manage and explore genome annotation, chromosomal mapping, protein -protein interaction (PPI), expression profiling and proteomics data. AMEN provides modules for (i) uploading and pre-processing data from microarray expression profiling experiments, (ii) detecting groups of significantly co-expressed genes, and (iii) searching for enrichment of functional annotations within those groups. AMEN facilitates the design and execution of optimized procedures for processing, analysis and interpretation of multifaceted high-throughput data.

maskBAD / masking BAD microarray probes

Detects and removes probes with different binding affinity in Affymetrix array expression data. The method implemented in maskBAD performs better than other methods in detecting binding affinity different (BAD) probes. Identification and removal of BAD probes removes spurious gene expression differences and helps to reveal real ones. In clustering analysis of gene expression, identification of BAD probes guides interpretation of discriminating probe sets.


Offers a platform for producing Robust Multiarray Average (RMA) expression values from Affymetrix files. RMAExpress includes features for reading raw files, determining RMA expression values and to perform a quality check. The application also includes two additional modules: (i) RMADataConv that allows the conversion of CDF and CEL files and; (iii) RMAExpressConsole that gives access to a command-line interface for extracting RMA expression values from a set of CEL files. The software is part of the Affy package.