Sub-population identification | Single-cell RNA sequencing data analysis
Single-cell RNA sequencing is a powerful technology to study gene expression of individual cells. Several software tools enable identification of cellular sub-populations from heterogeneous samples by identifying and comparing common gene signatures.
Identifies subpopulations in high-dimensional single-cell data. PhenoGraph is a computational method that was developed to avoid the disadvantages of manual gating. This method is adaptative both in terms of dimensionality and sample size, making it suitable in a range of settings for which single-cell population structure is of interest, including other cancers or healthy tissues, and for use with other emerging single-cell technologies.
Assists users to construct a single-cell mouse cell atlas. scMCA is a program that permits definition of cell types based on single-cell digital expression. This application is also dedicated to predict cell types using single-cell data generated from a wide range of technologies. Moreover, it includes a function named "scMCA_vis" that provides a bref function for visualizing and downloading of scMCA results.
Allows to make unsupervised projection of single cells from an scRNA-seq experiment. scmap is easy to combine with other computational scRNA-seq methods. It is very fast, using 1,000 features taking only around twenty seconds to map 40,000 cells. Its run-time can be further improved since the centroids and features for each cluster can be pre-computed, and stored in memory, even for a very large atlas.
Address the dropout events prevalent in scRNAseq data. scImpute is an imputation method that determines which values are affected by dropout events in data and performs imputation only on dropout entries. This method learns each gene’s dropout probability in each cell by fitting a mixture model. Next, it imputes the dropout values in a cell by borrowing information of the same gene in other similar cells.
A computational method for extracting lineage relationships from single-cell gene expression data, and modeling the dynamic changes associated with cell differentiation. SCUBA draws techniques from nonlinear dynamics and stochastic differential equation theories, providing a systematic framework for modeling complex processes involving multi-lineage specifications.
Allows estimation of cell type proportions from bulk sequencing data. BSeq-sc is a bulk sequence single-cell deconvolution analysis pipeline that integrates bulk data with single-cell gene expression for: (1) identifying cell-type specific marker genes, (2) estimating the proportion of each cell type in measured bulk tissue RNA-seq samples, (3) adjusting individual gene expression samples for variation in cell type proportion so that differential expression because of proportion differences is removed, and (4) performing cell type-specific differential gene expression analysis among groups.
Provides stable and robust clustering for scRNA-seq data. SAFE-clustering is an unsupervised ensemble method that: (i) performs independent clustering using four state-of-the-art methods, SC3, CIDR, Seurat and t-SNE + k-means; and (ii) combines the four individual solutions into one consolidated solution using one of three hypergraph partitioning algorithms: hypergraph partitioning algorithm (HGPA), meta-clustering algorithm (MCLA) and cluster-based similarity partitioning algorithm (CSPA).
Models both the within-cluster and between-cluster variability of unique molecular identifier (UMI) count data. DIMM-SC is a novel statistical method for clustering droplet-based single cell transcriptomic data. It facilitates rigorous statistical inference of cell population heterogeneity. This tool can be useful for the fast-growing community of large-scale single cell transcriptome analysis.
Intends to identify distribution patterns in cell populations thanks to single-cell transcriptome study. FVFC uses gene coexpression network analysis (GCNA) to detect modules of genes with similar expression profiles and summarize them into eigengenes, which allows users to explore the distribution of cells interactively, interpret the gene features and generate new hypothesis. It also provides an interactive visualization using a clustering index parameter which helps to highlight interesting 2D patterns in the scatter plot matrix (SPLOM). The method was tested thanks to two large single-cell studies.
Analyzes transcriptomic data from cellularly homogeneous sample to define functional heterogeneity. MPH method quantifies the functional heterogeneity of homogeneous cell population based on transcriptomic data. This tool combines molecular process and proportions with an approximation of the differences in in gene expression levels for cells. This algorithm employs a non-negative matrix factorization (NMF) method.
Detects the optimal set of signature genes to separate single cells into distinct groups. SAIC uses an iterative k-means clustering approach to perform an exhaustive search for the best signature genes within the search space, which is defined by the combination of a number of initial centers and p-values. The software is robust on both simulated and real datasets.
Allows users to identify minority cell types. LSPCA consists of a dimension reduction framework that can be used for multiple single-cell data sets including a large data set of peripheral blood mononuclear cell (PBMC) transcriptomes. This method can for instance assist users in measuring the outcome of grouping on the annotated data.
Permits downstream analysis. This tool contains four steps: (1) dimensionality reduction accounting for zero inflation and over-dispersion, and adjusting for gene and cell-level covariates; (2) robust and stable cell clustering using resampling-based sequential ensemble clustering; (3) inference of cell lineages and ordering of the cells by developmental progression along lineages; and (4) differentially expressed (DE) analysis along lineages.
Defines the metabolism of heterogeneous cancer cell (sub)populations. scFBA translates single-cell transcriptomes into single-cell fluxomes to proceed. It permits users to discover genes that might show inconsistencies between bulk RNA-seq and scRNAseq profiles by pre-processing the data. This tool also allows users to employ data on bulk expression profiles to render a study more robust against possible data-specific errors of single-cell datasets.
Enables single-cell RNA sequencing (scRNA-seq) data imputation. PBLR is a cell sub-population based bounded low-rank method that can (1) recover transcriptomic level and dynamics masked by dropouts, (2) improve low-dimensional representation, and (3) restore the gene-gene co-expression relationship. The software also automatically detects cell subpopulations. It has few parameters, making it generally applicable to data from diverse labs or techniques.
Retrieves informative features, such as low-dimensional representations of gene expression profiles, per cell from massive single-cell data. scScope employs a deep learning method and a self-correcting layer. It can conduct imputations on zero-valued entries of input scRNA-seq data. This tool enables the exploitation of massive and noisy single-cell expression data. It is useful for unsupervised single-cell data modeling.
Identifies known cell subpopulations of varying potency, enabling reconstruction of cell-lineage trajectories. SCENT is an algorithm that can be used to identify and quantify biologically relevant expression heterogeneity in single-cell populations, as well as to reconstruct cell-lineage trajectories from time-course data. It differs substantially from other single-cell algorithms in that it uses single-cell entropy to independently order single cells in pseudo-time (i.e. differentiation potency), without the need for feature selection or clustering.
Assists researchers in designing a single cell experiment and optimal analysis procedure. SCEED is an empirical methodology that has functionality to simulate single cell RNA sequencing (scRNA-seq) data with user provided statistical characteristics: total number of cells, genes, groups proportions, marker genes and fold change (fC) of marker genes. The software is completely flexible and any number of single-cell algorithms can be added for testing as per user’s requirements.
Investigates multiple single-cell RNA-seq samples. MUDAN aims to conduct joint annotation of cell types across patients, time-points, and batches. It identifies clusters and artificially separated to simplify their visualization. This tool can recognize fast subpopulation and characterize them. It provides a collection of differential gene expression and marker selection features.
Discovers subpopulations and relationship between these subpopulations into scRNA expression dataset. scGPS can recognize homogenous subpopulations and check them using a set of functionalities implemented. It enables the detection of gene markers that allows users to distinguish a subpopulation from the remaining cells. This tool can choose optimal gene predictors and build prediction models.