Allows studying of spatial patterning of gene expression at the single-cell level. Seurat is an R package that enables quality control (QC), analysis, and exploration of single cell RNA-seq data. The software includes three computational methods: (1) unsupervised clustering and discovery of cell types and states, (2) spatial reconstruction of single cell data, and (3) integrated analysis of single cell RNA-seq across conditions, technologies, and species. It can also localize rare subpopulations, and map both spatially restricted and scattered groups.
Facilitates the analysis of cellular heterogeneity, the identification of cell types, and comparison of functional markers in response to perturbations, based on a versatile method. SPADE helps to organize high-dimensional cytometry data in an unsupervised manner, and to investigate natural and pathogenic cellular heterogeneity for biological insight. The SPADE algorithm consists of four components: (i) density-dependent downsampling, (ii) clustering, (iii) linking clusters with a minimum spanning tree, and (iv) upsampling to restore all cells in the final result. This modularized process allows more efficient sub-algorithms to replace the current components. In this sense, SPADE can be viewed as a framework for cytometric data analysis and visualization that has the capacity to be evolved and adapted.
Allows to analyze single-cell gene expression experiments. Monocle can realize differential expression analysis, clustering, visualization, and other useful tasks on single cell expression data. The software orders individual cells according to progress through a biological process, without knowing ahead of time which genes define progress through that process. It is designed to work with RNA-Seq and qPCR data, but could be used with other types as well. The tools Census and BEAM are implemented in Monocle.
An algorithm for the identification of rare and abundant cell types from single cell transcriptome data. RaceID is based on transcript counts obtained with unique molecular identifies. We demonstrate that this algorithm can resolve cell types represented by only a single cell in a population of randomly sampled organoid cells. We use this algorithm to identify Reg4 as a novel marker for enteroendocrine cells, a rare population of hormone-producing intestinal cells.
A divisive biclustering method based on sorting points into neighborhoods (SPIN). In contrast to the SPIN algorithm which does not identify clusters, here the aim was to identify groups of cells/genes in an unsupervised manner. SPIN is a powerful method to sort a distance/correlation matrix without reducing dimensionality, and it converges to a 1D order of the features.
Permits to compare, validate and substantiate cell type transcriptional profiles across scRNA-seq datasets. MetaNeighbor can readily identify cells of the same type across datasets, without relying on specific knowledge of marker genes. The tool returns a performance score for each gene set and task that is the mean area under the receiver operator characteristic curve (AUROC) across all folds of cross-dataset validation.
Projects single-cell transcriptomes into a space defined by variability in a reference data set. RCA is an R package for robust clustering analysis of single cell RNA sequencing data (scRNAseq). This method outperforms existing algorithms for clustering single-cell transcriptomes and generates tight cell clusters consisting almost entirely of cells of the same type. It also identifies multiple cell types in CRC tumors and normal mucosa, despite the strong batch effects in clinical samples.
Allows to cluster single cell RNA-seq data. SC3 integrates many different clustering solutions through a consensus approach, thereby increasing its accuracy and robustness against noise. To enhance the accessibility to users with limited bioinformatics expertise, SC3 features an interactive graphical implementation, which aids the biological interpretation by identifying marker genes, differentially expressed genes and outlier cells.
Allows to reconstruct gene regulatory networks (GRNs). SCENIC uses single-cell RNA-seq data to identify stable cell states. It analyzes all the co-expression modules using cis-regulatory motif analyses. The tool reduces data dimensionality by using transcription factor (TF) regulons rather than principal components. It accounts for noise and removes technical biases, and uncovers master regulators and gene regulatory networks for each cell type.
Quantifies fate bias, manifested by subtle lineage specific transcriptome modulations within a multipotent progenitor population. FateID is based on prior knowledge and a random forests-based classification method. It can differentiate committed stages of all lineages and tracks differentiation trajectories backward in time. This tool enables prediction of the likelihood of multipotent progenitors to give rise to each lineage.
An easy-to-use application for microarray, RNA-Seq and metabolomics analysis. For splicing sensitive platforms (RNA-Seq or Affymetrix Exon, Gene and Junction arrays), AltAnalyze will assess alternative exon (known and novel) expression along protein isoforms, domain composition and microRNA targeting. In addition to splicing-sensitive platforms, AltAnalyze provides comprehensive methods for the analysis of other data (RMA summarization, batch-effect removal, QC, statistics, annotation, clustering, network creation, lineage characterization, alternative exon visualization, gene-set enrichment and more).
Processes Chromium single cell 3’ RNA-seq output to align reads, generates gene-cell matrices and performs clustering and gene expression analysis. Cell Ranger combines Chromium-specific algorithms with the widely-used RNA-seq aligner STAR. It is delivered as a single, self-contained tar file that can be unpacked anywhere on the system. The tool includes four pipelines: cellranger mkfastq; cellranger count; cellranger aggr; cellranger reanalyze.
Serves for single-cell data analysis. Granatum is a program that provides biologists with access to single-cell bioinformatics methods, and software developers with the opportunity to promote and combine their tools with various others in customizable pipelines. Its architecture simplifies the incorporation of cutting-edge tools and enables handling of large datasets. Moreover, it can eliminate inter-module incompatibilities by isolating the dependencies of each module.
Permits analysis of single-cell RNA-seq data. Dpath divides the expression profiles with the awareness of the dropout events. It quantitatively evaluates the cellular state and prioritizes genes for both progenitor and committed cellular states. This tool simplifies and decodes the biological mechanisms that control stem cell and progenitor cell populations. It was tested on haematopoietic, endocardial and endothelial lineages.
Reconstructs cell cycle time-series using single-cell transcriptome data. reCAT is a computational method consists of four steps: (i) the data processing, including quality control, normalization, and clustering of single cells, (ii) the order of the clusters is then recovered by finding a traveling salesman cycle, (iii) two scoring methods, Bayes-scores and mean-scores subsequently discriminate among cycle stages and (iv) a hidden Markov model (HMM) and a Kalman smoother finally estimate the underlying gene expression levels of the single-cell time-series.
Simulates experiment-specific technical replicates. BEARscc improves the unsupervised classification of cells and facilitates the biological interpretation of single-cell RNA-seq experiments. It provides additional insights for the interpretation of single-cell sequencing experiments. The tool models technical variance based on spike-ins, simulates technical replicates and clusters simulated replicates.
Allows analysis of single-cell gene expression data. Scanpy integrates preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. It enables interfacing of advanced machine learning packages. This tool provides pseudotemporal-ordering and the reconstruction of branching trajectories. It allows simulating single cells governed by gene regulatory networks.
Assists in navigating through the expression profile. SAKE is an R package that uses non-negative matrix factorization (NMF) method for unsupervised clustering. It offers (i) quality controls modules to compare total sequenced reads to total gene transcripts detected, (ii) sample correlation heatmap plot, (iii) heatmap of sample assignment from NMF run, with dark red indicating high confidence in cluster assignments, and (iv) t-distributed stochastic neighbor embedding (t-SNE) plot to compare NMF assigned groups with t-SNE projections.
Offers a method for rare cell type identification into single-cell RNA-seq. GiniClust can perform its detection on both in normal tissues and disease samples. This program is based on a modification of the Gini index which was normalized and defined as bidirectional to allows the identification of genes specifically unexpressed in a rare cell type and the removal of a systematic bias toward lowly expressed genes.
A toolkit designed for the analysis of short reads obtained from end-sequence RNA-seq. ESAT addresses mis-annotated or sample-specific transcript boundaries by providing a search step in which it identifies possible unannotated ends de novo. It provides a robust handling of multi mapped reads, which is critical in 3’ DGE analysis. ESAT provides a module specifically designed for alternative start or 3’ UTR (untranslated region) differential isoform expression. It also includes a set of features specifically designed for the analysis of single-cell RNA-seq data.
Offers a universal, efficient and accurate solution for extracting information from single-cell RNA-seq experiments. In the same way that single-cell analysis can be viewed as the ultimate resolution for transcriptomics, transcript-compatibility counts are the most direct way to “count” reads. Our method departs from standard analysis pipelines, comparing and clustering cells based not on their transcript or gene quantifications but on their transcript-compatibility read counts.
Provides an analytical framework for the sensitive detection of population markers and differentially expressed genes. bigSCale aims to improve detection in large scRNAseq datasets. The software uses large sample sizes to estimate a highly accurate and comprehensive numerical model of noise and it determines the extent of the variation between cells without estimating actual gene expression value.
Rebuilds dynamic regulatory networks from single cell time series data. SCDIFF exploits the cell differentiation process that uses time-series single cell RNA-seq data. This tool is appropriate to predict transcription factors that regulate the cell differentiation process. It uses static information about targets of transcription factors (TF). This method enhances both the learning of a branching model and the identification of TF that adjust various stages in the process.
Performs a simultaneous detection of common and rare cell types from single-cell gene expression data. GiniClust2 is a cluster-aware, weighted ensemble clustering method that combines Gini index- and Fano factor-based clustering methods. This software clusters the targeted cells using Gini index-based features followed by a second clustering, using then Fano factor-based features, to lastly combine each result via a cluster-aware, weighted ensemble approach.
A software tool developed to better support in silico pseudo-time reconstruction in single-cell RNA-seq analysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods.
A computational method for the statistical inference of cell lineage relationships from single-cell gene expression data. ECLAIR uses an ensemble approach to improve the robustness of lineage predictions, and provides a quantitative estimate of the uncertainty of lineage branchings. We show that the application of ECLAIR to published datasets successfully reconstructs known lineage relationships and significantly improves the robustness of predictions. In conclusion, ECLAIR is a powerful bioinformatics tool for single-cell data analysis. It can be used for robust lineage reconstruction with quantitative estimate of prediction accuracy.
Preserves distinct structural properties of the data. dropClust uses Locality Sensitive Hashing (LSH), a logarithmic-time algorithm to determine approximate neighborhood for individual transcriptomes. It employs an exponential decay function to select higher number of expression profiles from clusters of relatively smaller sizes. This tool is able to detect principal components (PCs) with multi-modal distribution of the projected transcriptomes by using mixtures of Gaussians.
Provides a linear model and normality based transformation method. Linnorm is an R package for the analysis of RNA-seq, scRNA-seq, ChIPseq count data or any large-scale count data. It transforms such datasets for parametric tests. Some pipelines are implemented: (i) library size/batch effect normalization, (ii) cell sub-population analysis and visualization, (iii) differential expression analysis or differential peak detection, (iv) highly variable gene discovery and visualization, (v) gene correlation network analysis and visualization, (vi) stable gene selection for scRNA-seq data and (vii) data imputation.
Proposes a top-down hierarchical clustering method for scRNA-seq data to characterize cells. CellBIC clusters scRNA-seq data based on modality in the gene expression distribution. This software deploys a top-down approach by dividing the datasets and dissecting cells based on bimodal memberships recursively to reconstruct a hierarchical tree structure. It does not require multi-modal patterns but uses bimodal patterns to detect clusters.
Allows differential clustering (DC) analysis. SparseDC is suitable for data with thousands or tens of thousands of genes. It is based on a K-means clustering algorithm and can generate a sparse solution. This tool clusters cells in each condition into cell types in an unsupervised manner. It is able to recognize marker genes for each cell type.
Allows quality control (QC) and analysis components of parallel single cell transcriptome and epigenome data. Dr.seq is a quality control (QC) and analysis pipeline that provides both multifaceted QC reports and cell clustering results. Parallel single cell transcriptome data generated by different technologies can be transformed to the standard input with contained functions. Using relevant commands, the software can also be used to report quality measurements based on four aspects and can generate detailed analysis results for scATAC-seq and Drop-ChIP datasets.
A density-based clustering algorithm, which is both time- and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. densityCut effectively clustered irregular shape synthetic benchmark datasets. We have successfully used densityCut to cluster variant allele frequencies of somatic mutations, single-cell gene expression data, and single-cell CyTOF data. densityCut is based on density estimation on graphs. It could be considered as a variation of the spectral clustering algorithms but is much more time- and space-efficient. Moreover, it automatically selects the number of clusters and works for the datasets with a large number of clusters. In summary, densityCut does not make assumptions about the shape, size, and the number of clusters, and can be broadly applicable for exploratory data analysis.
Implements a methodological toolbox allowing flexible workflows under such a framework. Furthermore, Sincell contributes new algorithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies.
An integrated software tool for quality filtering, normalization, feature selection, iterative dimensionality reduction, clustering and the estimation of gene-expression gradients from large ensembles of single-cell RNA-seq datasets. SCell is open source, and implemented with an intuitive graphical interface.
Aims at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering, and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. ASAP combines a wide range of commonly used algorithms with sophisticated visualization tools. It allows researchers to interact with the data in a straightforward fashion and in real time.
Allows unsupervised and semi-supervised learning using Single Cell RNA-Seq data. To operate these learning, UNCURL provides a method for standardizing any prior biological information including bulk RNA-seq data, microarray data or even information about individual marker gene expression to a form compatible with scRNA-Seq data. Additionally, this package allows the integration of prior information which leads to large improvements in accuracy.
Supplies a spectral clustering algorithm for single-cell measurements. MPSSC is a standalone software based on the usage of numerous doubly stochastic affinity matrices coupled to the application of a specific structure on the target matrix. The program aims to improve clustering efficiency and to be applicable with information including different densities, high levels of noise or missing values.
Provides a comprehensive analysis of single-cell RNA-sequencing (scRNA-seq) data. iS-CellR integrates Seurat package and employs a fully-integrated web browser interface to process, analyze and visually interpret scRNA-seq data. The software offers a strategy for the analysis and visualization of scRNA-seq data without the need for specific programming skills. Users can explore heterogeneous populations of cells. The program can be modified and extended according to user needs to perform more intricate and targeted analysis.
Investigates cell functions based on single cell RNA (scRNA)-seq data. Corr provides a software aims to assist users in identifying genes linked to biological processes or phenotypes. The application ascertains a cell-to-cell “differentiability correlation" by taking into account the environment surrounding the targeted cells as well as the factorial analysis of variance in cluster number determination. This method can be applied to information containing fluctuations or noise.
Permits iterative clustering investigation. iterClust serves for instance retrieval biological heterogeneity, especially in single cell studies of heterogeneous tissues, where cell lineages impose a relatively strong hierarchical structure, or solve general clustering problems. It is based on clustering, partition, hierarchy, density and graph algorithms. This tool can solve complex hierarchical substructures that contribute to tissue heterogeneity.
Enables analysis of cellular and molecular processes at single cell resolution and assist in understanding of many biological processes. Para-DPMM contains a split-merge sampling that allows users to perform inter-cluster parallelization, in which threads running in parallel are of the same order as data points, resulting in a high level of parallelization. The program can be applied on real world genomic systems.
Allows users to fit Grade of membership models (GoM) for clustering of RNA-seq gene expression count data. CountClust also provides tools to identify which genes are most distinctively expressed in each cluster and to aid interpretation of results. The results can provide a richer summary of the structure in RNA-seq data than existing widely-used visualization methods such as Principal Components Analysis (PCA) and hierarchical clustering.
Gathers items dedicated to the management of single-cell (RNA-seq) data resulting from droplet technologies. DropletUtils provides more than 10 utilities allowing users to compute barcode rank statistics or to call cells according to the number of unique molecular identifiers (UMIs) associated with each barcode. It also provides features for identification of cells from empty droplets or generate a sparse or HDF5-backed count matrix. The software includes functions that focuses on data issue from 10X Genomics technology.
A generally applicable analytic pipeline for processing single-cell RNA-seq data from a whole organ or sorted cells. SINCERA provides a panel of analytic tools for users to conduct data filtering, normalization, clustering, cell type identification, and gene signature prediction, transcriptional regulatory network construction and important regulatory node identification. The pipeline enables RNA-seq analysis from heterogeneous single cell preparations after the nucleotide sequence reads are aligned to the genome of interest.
Handles clustering and visualization of large datasets, with a focus on single-cell sequencing ones. clusterExperiment assists users in defining a fit clustering method by comparing numerous clustering algorithms and their associated tuning parameters such as the dimensionality reduction method or the number of clusters. This package also provides a set of functions dedicated to visualization including the possibility to plot hierarchy of the clustering as well as two-dimensional representations of the data color-coded by cluster.
Assists in single-cell clustering. scVDMC is a multitask learning method with embedded feature selection to simultaneously capture the differentially expressed genes among cell clusters and across all cell populations. It utilizes expression patterns of different single-cell populations with shared cell-type markers and corresponding similar clusters. This method can be extended to perform soft cluster assignment.
An unsupervised hierarchical clustering approach for the identification of putative cell sub-populations from single-cell transcriptomics profiles. Clustering occurs in a linearly transformed subspace obtained from principal component directions and, at each level of our hierarchical clustering structure, the similarity between clusters is measured in subspaces of decreasing dimensionality by discarding principal directions as the number of clusters decreases. Using two real single cell datasets, we compared our approach to other commonly used statistical techniques, such as K-means and hierarchical clustering. We found that pcaReduce was able to give more consistent clustering structures when compared to broad and detailed cell type labels.
Serves for analyzing single-cell RNA-Seq datasets that addresses both the clustering interpretability and clustering subjectivity issues. DendroSplit includes several features: (1) gene-based justification for all decisions made when generating clusters; (2) interpretable hyperparameters; (3) ability to produce multiple clusterings for the same dataset; (4) incorporation into existing single-cell RNA-Seq workflows.
Identifies optimal number of clusters while partitioning data. Shrinkage Clustering is a non-negative matrix factorization (NMF) based method that optimizes cluster memberships while simultaneously shrinking the number of clusters to an optimum. This algorithm generates clusters of sufficiently large sample sizes as required by the user and can clusters applications with minimum cluster size constraints.
Detects the optimal set of signature genes to separate single cells into distinct groups. SAIC uses an iterative k-means clustering approach to perform an exhaustive search for the best signature genes within the search space, which is defined by the combination of a number of initial centers and p-values. The software is robust on both simulated and real datasets.