Batch effect correction software tools | RNA sequencing data analysis
It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis.
Allows differential expression analysis of digital gene expression data. edgeR implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi likelihood tests. The package and methods are general, and can work on other sources of count data, such as barcoding experiments and peptide counts.
Investigates data from gene expression experiments. limma contains features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. It can perform both differential expression and differential splicing analyses of RNA-seq data. This tool is useful for studying expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures.
Allows to remove batch effects and other unwanted variation in high-throughput experiment. SVA is a package containing several functions permitting to identify and build surrogate variables for large data sets. Artifacts can be removed in three ways: (i) identification and estimation of surrogate variables, (ii) direct removal of known batch effect with ComBat and (iii) removal of batch effect with known probes.
An easy-to-use application for microarray, RNA-Seq and metabolomics analysis. For splicing sensitive platforms (RNA-Seq or Affymetrix Exon, Gene and Junction arrays), AltAnalyze will assess alternative exon (known and novel) expression along protein isoforms, domain composition and microRNA targeting. In addition to splicing-sensitive platforms, AltAnalyze provides comprehensive methods for the analysis of other data (RMA summarization, batch-effect removal, QC, statistics, annotation, clustering, network creation, lineage characterization, alternative exon visualization, gene-set enrichment and more).
Uses surrogate variable analysis (sva) for estimating unwanted noise and unmodeled artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. svaseq contains functions for removing batch effects and other unwanted variation in high-throughput experiment. It was specifically created for count data or Fragments Per Kilobase Of Exon Per Million Fragments Mapped (FPKM) from sequencing experiments based on appropriate data transformation.
Identifies differentially expressed genes from count data or previously normalized count data. NOISeq empirically models the noise distribution of count changes by contrasting fold-change differences (M) and absolute expression differences (D) for all the features in samples within the same condition. This reference distribution is then used to assess whether the M-D values computed between two conditions for a given gene are likely to be part of the noise or represent a true differential expression.
Corrects batch effects (from multiple confounding variables) and library depth. ImpulseDE2 is a differential expression algorithm for longitudinal count data sets which arise in sequencing experiments such as RNA-seq, ChIP-seq, ATAC-seq and DNaseI-seq. This method is based on a negative binomial noise model with dispersion trend smoothing by DESeq2 and uses the impulse model to constrain the mean expression trajectory of each gene.
Provides guided principal components (PCA) analysis for the detection of batch effects in high-throughput data. gPCA provides a statistic method particularly useful to test whether batch effects exist after applying a global normalization procedure such as quantile or loess normalization. It can also be used on other problems and types of data as well, including B-allele frequency data and expression data.
Offers a model to correct distance matrices and to improve the performance of clustering algorithms in real data analysis. QuantNorm provides a bridge between raw data and clustering and other pattern detection techniques. This method modifies the distance matrix obtained from data with batch effects. It can also be combined with other clustering approaches.
Detects hidden batch effects in large consortium datasets. DASC measures the “batch-free” data with a data-adaptive shrinkage and a semi-Nonnegative Matrix Factorization (NMF) methods. The application of data adaptive shrinkage method and consensus matrix improves the stability of the output results. This software can be applied to identify batch effects in other types of genomic data where quantitative values are measured.
Enables batch effect identification. BatchI performs assignment of samples to batches without prior knowledge, using the assumption that the analyzed data are sorted on a time scale. The basic functionality in this package relies on the use of the batchI function which finds the optimal partition of samples based on a quality index summarizing each sample with the use of a dynamic programing algorithm. The software was tested on several microarray gene expression, RNA-seq deep sequencing and proteomics mass spectrometry experiments.
Evaluates or diagnoses batch effect(s) in genomic data at the level of individual principal components (PCs). exploBATCH is a batch evaluation and correction approach based on probabilistic principal component and covariates analysis (PPCCA). The software includes two methods: (i) findBATCH that evaluates and detects the presence of significant batch effects and (ii) correctBATCH for batch correction. The software was evaluated using evaluate examples from breast and colorectal cancer and normal sample gene expression profiles.
Adjusting batch effects in microarray expression data using Empirical Bayes methods. The modified ComBat (M-Combat) is designed specifically in the context of meta-analysis and batch effect adjustment for use with predictive models that are validated and fixed on historical data from a ‘gold-standard’ batch.
Assists users to extract information from microarray data. BART offers a method for analyzing a variety of microarray experiments, and contains six modules. These modules enable researchers to process raw microarray data from gene expression omnibus (GEO) or locally into a list of differential genes and associated pathways, permitting everyone to interpret microarray data in terms of underlying biological processes.
Allows users to perform on different subjects: alignment steps, quality control (QC), differential gene expression and pathway analyzes. VIPER utilizes a computational workflow management system named Snakemake to combine many tools currently employed in RNA-seq analysis. Moreover, it includes a variety of optional steps for variant analysis, fusion gene detection, viral DNA detection and evaluation of potential immune cell infiltrates.
Investigates RNA-seq data with potential batch effects. cbcbSEQ serves to normalize quantile, calculate voom weights. It is useful for log-transformation of counts and ComBat location batch correction. This tool evaluates and removes batch effects.
Allows users to perform Empirical Bayesian linear modelling. Fitnoise authorizes to experiment measurements related with genes or genomic features by deducing differential expression testing. The software is available both as a standalone application and as a part as the nesoni software and provides four regular and two experimental noise models. The package is able to analyze PAT-Seq data to determine the differential poly(A) tail length.