Your top 3 RNA-seq read alignment tools
RNA-sequencing (RNA-seq) is currently the leading technology for transcriptome analysis. RNA-seq has a wide range of applications, from the study of alternative gene splicing, post-transcriptional modifications, to comparison of relative gene expression between different biological samples.
To help you perform your RNA-seq experiments in the best conditions, we are continuing our series of surveys by asking you to choose your favorite analysis tools step by step.
Mapping reads to reference genome
After a first step of quality control (previous blog post here), the next step in the analysis of your RNA-seq experiment is alignment of reads to a reference genome or a transcriptome database.
There are two types of aligners: Splice-unaware and splice-aware. Splice-unaware aligners are able to align continuous reads to a genome of reference, but are not aware of exon/intron junctions. Therefore, in RNA-sequencing, there use is limited to the analysis of expression of known genes, or alignment to transcriptome. On the other hand, splice-aware aligners map reads over exon/intro junctions and are therefore used for discovering new splice forms, along with the analysis of gene expression levels.
With that in mind, we asked you to vote for your favorite reads alignment tools (among splice-aware and splice unaware aligners). Here’re the results of the survey.
Your number 1 reads aligner: STAR
Though it did not appear in the original survey, you were a lot to mention this tool so we thought it deserved the top spot!
Spliced Transcripts Alignment to a Reference (STAR) is a standalone software that uses sequential maximum mappable seed search followed by seed clustering and stitching to align RNA-seq reads. It is able to detect canonical junctions, non-canonical splices, and chimeric transcripts.
One of the main advantages of STAR are its high speed, accuracy, and efficiency (Engström et al.). Schematic representation of the Maximum Mappable Prefix search in the STAR algorithm for detecting (a) splice junctions, (b) mis- matches and (c) tails. STAR is implemented as a standalone C++ code and is freely available on Github.
Your second favorite tool: Tophat
You were 54% to choose Tophat as your favorite RNA-seq aligner.
TopHat aligns RNA-seq reads to mammalian-sized genomes by first using the short read aligner Bowtie, and then by mapping to a reference genome to discover RNA splice sites de novo. The TopHat pipeline. RNA-Seq reads are mapped against the whole reference genome, and those reads that do not map are set aside.
TopHat has been widely used in RNA-seq protocols and is often paired with the software Cufflinks for a full analysis of sequencing data (Trapnell et al.). Initially launched in 2009, Tophat got updated to Tophat2 in 2013, and has now been progressively replaced with HISAT.
Bronze medal for HISAT
We finish the podium with HISAT, chosen by 30% of voters.
HISAT (and its newer version HISAT2) is the next generation of spliced aligner from the same group that have developed TopHat.
HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-manzini (Fm) index, employing two types of indexes for alignment: a whole-genome Fm index to anchor each alignment and numerous local Fm indexes for very rapid extensions of these alignments.
HISAT most interesting features include its high speed and its low memory requirement.
Alignment speed of spliced alignment software for 20 million simulated 100-bp reads.
HISAT is open-source software freely available at http://www.ccb.jhu.edu/ software/hisat/.
Pär G Engström et al. (2013). Systematic evaluation of spliced alignment programs for rnA-seq data. Nature Methods.
Cole Trapnell et al. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics.
Alexander Dobin et al. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics.
Daehwan Kim et al. (2015). HISAT: a fast spliced aligner with low memory requirements. Nature Methods.