Computational protocol: Studying hematopoiesis using single-cell technologies

Similar protocols

Protocol publication

[…] Single-cell transcriptome analysis remarkably serves as a powerful tool for studying cellular heterogeneity and lineage hierarchy (Fig. ). There are several available methods: single-cell qPCR [], single-cell microarray analysis [], and single-cell RNA-seq [, ]. After single-cell isolation from complex tissue, the first challenge is to amplify the small amount of RNA, which is about 10 pg per cell. Four mainstream strategies are used: multiplexed RT-PCR, polyA tailing followed by second-strand synthesis [], template switching, and in vitro transcription (IVT) []. Multiplexed RT-PCR is used in single-cell qPCR experiment []. Single-cell qPCR does not need to sequence the sample. It is convenient for detection of dozens of genes. PolyA tailing method was used in single-cell microarray and Tang-seq studies. Smart-seq and Smart-seq2 amplification is a widely used approach for the full-length mRNA analysis of single cells [, , ]. It uses the template-switching-based protocol to append a primer binding site on the 3′ end of the cDNA. cDNA is then amplified by PCR and sequenced by Illumina sequencing platform. The mRNA coverage of Smart-seq is between 10 and 20%. IVT used in CEL-seq and MARS-seq accomplishes a linear amplification of RNA using T7 promoter and RNA polymerase [, ]. The unique molecular identifiers (UMIs) are designed for reducing the amplification bias []. They enable the absolute counting of mRNA molecules in the single cell when mRNA capture efficiency and the sequencing depth are good enough. The low coverage of mRNA is a common problem for all existing methods.Fig. 1 Recently, application of single-cell transcriptomic analysis has rapidly spread to many areas such as early embryonic development [, –], cellular reprogramming [, ], human breast cancer [], metastatic melanoma [], circulating tumor cells [], olfactory neurogenesis [], early embryo development [], neuronal cell heterogeneity, and immune cell pathogenicity [–]. These applications demonstrate the broad applicability of single-cell transcriptomic analysis. [...] The downstream analysis focuses on visualizing the high-dimensional single-cell gene expression data and clustering the transcriptionally distinct subgroups. One visualization tool for high-dimensional data is principal component analysis (PCA), which has been widely applied in single-cell research [, , ]. PCA maps the high-dimensional data points into a low-dimensional space. Another visualization tool for reducing high-dimensional data into two or three dimensions is t-distributed stochastic neighbor embedding (t-SNE) [, ]. In SNE, nearby data points in high-dimensional space remain their similarity in low-dimensional space. However, SNE is hampered by the crowding problem, which means that the clusters could not be totally separated from each other. To alleviate the crowding problem, the Student’s t distribution method is introduced in t-SNE to compute the similarity between two points. As a powerful visualization tool, it has demonstrated great capacity in recent high-throughput single-cell studies [, ]. PCA and t-SNE are usually combined for the visualization of large-scale data. After identification of cell subpopulation, one can extract specific gene markers for each subpopulation. To improve differential gene identification from noisy single-cell data, Kharchenko et al. reported a probabilistic model of expression-magnitude distortions typical of single-cell RNA-sequencing measurements []. Other downstream clustering pipelines for high-throughput single-cell gene expression are mostly based on R package or MATLAB. A computational strategy named Seurat integrated these visualization methods and tools into an R package to deal with single-cell RNA-seq data [, ]. […]

Pipeline specifications

Software tools CEL-Seq, t-SNE, Seurat
Application scRNA-seq analysis