De novo transcriptome assembly software tools | RNA sequencing data analysis
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome.
A single-cell assembler for capturing and sequencing “microbial dark matter” that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. SPAdes is intended for both standard isolates and single-cell MDA bacteria assemblies. It works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide Additional contigs can also be provided to be used as long reads. SPAdes supports paired-end reads, mate-pairs and unpaired reads and can take as input several paired-end and mate-pair libraries simultaneously.
Builds transcriptomes from RNA-seq data. Trinity is a standalone software composed of three main components: (i) Inchworm, that first generates transcript contigs; (ii) Chrysalis, for clustering them and constructing complete de Bruijn graphs for each cluster and; (iii) Butterfly that processes individual graphs in parallel for finally resulting to the reconstruction of the transcript sequences.
Allows users to analyze ABySS-assembled contigs from shotgun transcriptome data. Trans-ABySS is a standalone software composed of two modules: transabyss that performs assembling of RNAseq data and transabyss-merge that allows to merge assemblies derived from the first feature. The application includes a gene-level expression metric based on reads aligned to contigs that can be used with or without an annotated reference genome.
Performs gene and isoform level quantification from RNA-Seq data. RSEM is a software package that quantifies gene and isoform abundances from single-end (SE) or paired-end (PE) RNA-Seq data. The software enables visualization of its output through probabilistically-weighted read alignments and read depth plots. It does not require a reference genome and thus can be useful for quantification with de novo transcriptome assemblies.
Assembles paired-end RNA-Seq data. EBARDenovo is based on a bi-directional expansion method using paired-end RNA-Seq data to guide the transcriptome assembly. The software is suited for detecting chimeric reads, created by natural gene fusion or sequence recombination, and assembly errors. Its outputs can be used for further analyses such as the identification of RNA editing sites and gene fusion candidates.
Detects variants such as single nucleotide polymorphisms (SNPs), indels and alternative splicing (AS) in transcriptomes. KisSplice is a standalone software able to identify bubble patterns generated by AS events without the need of a reference genome. The application can be used to census AS events in various species, however it is suited only for splicing not for the reconstruction of an entire transcript.
A comprehensive and user-friendly system for computational analysis of bacterial RNA-seq data. As input, Rockhopper takes RNA sequencing reads output by high-throughput sequencing technology (FASTQ, QSEQ, FASTA, SAM, or BAM files). Rockhopper supports the following tasks: reference based transcript assembly; de novo transcript assembly; normalizing data from different experiments; quantifying transcript abundance; testing for differential gene expression; characterizing operon structures; visualizing results in a genome browser.
Consists of an iterative multiple-pass system that focuses on observed data. MIRA is a program that enables users to utilize basic algorithms for both branches of the assembly system. This tool searches for patterns on a symbolic level in an alignment to identify differences in repetitive sequences in a genome assembly. It subsequently tags the bases, allowing discrimination of repeats.
Allows de-novo assembly of transcriptome using a reference proteome. STM exploits the fact that, by translating contigs into amino acid sequences, it is possible to search for orthologous regions in a reference proteome, even when it belongs to a distantly related organism. The method can join multiple transcript fragments that are part of a single gene, providing new and valuable information on the order and the orientation of these fragments along original transcript. Multiple- k, a method that performs multiple assemblies with various k-mer lengths and retains the best part of each one to form the final assembly is also available.
A de novo transcriptome assembler that takes advantage of techniques employed in Cufflinks to overcome limitations of the existing de novo assemblers. When tested on dog, human and mouse RNA-seq data, Bridger assembled more full-length reference transcripts while reporting considerably fewer candidate transcripts, hence greatly reducing false positive transcripts in comparison with the state-of-the-art assemblers. It runs substantially faster and requires much less memory space than most assemblers. More interestingly, Bridger reaches a comparable level of sensitivity and accuracy with Cufflinks.
Improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data. CAFE is a high-performing transcriptome assembly pipeline that enables to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map. It should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.
Combines statistical analysis modules into pipelines to deal with heterogenous big data. T-BioInfo is an application that can be used for: (1) next-generation sequencing (NGS) data (transcriptomics, genomics/epigenetics, and DNA/RNA); (2) mass-spectroscopy; (3) structural biology; and (4) data integration and modeling (virology, data association, and data mining).
Establishes a central, redistributable workbench for scientists and programmers working with RNA-related data. The RNA workbench builds a sustainable community around it. This platform is unique in combining available tools, workflows and training material, as well as providing easy access for experimentalists. It serves as a central hub for programmers, which can easily integrate and deploy their existing or novel tools and workflows.
Provides a de novo transcriptome assembler for short RNA-seq reads. Oases congregates unmapped RNA-seq reads into full length transcripts. It enables reconstruction with different k-values via dynamic cutoffs. This software adds as features an array of hash lengths, a dynamic filtering of noise, a resolution of alternative splicing (AS) events and merging of multiple assemblies.
Provides a de novo transcriptome assembler specifically made for RNA-Seq. SOAPdenovo-Trans is derived from the SOAPdenovo2 genome assembler which is made for transcriptome assembly. The software aims to process RNA-Seq data and enables alternative splicing (AS). It uses a multiple k-mers method to either merge the resultant assemblies in to one final set or to iterate several k-mers de Bruijin graph (DBG) assemblies during contig construction.
Assembles reads to reconstruct expressed transcripts. IDBA-Tran calculates the probability that a k-mer or short simple path contains error by using both their multiplicity and a multi-normal distribution to model the multiplicities of all k-mers in the whole component. The software also aligns reads to the transcript for supplying an estimated expression level for each transcript. It aims to improve accuracy especially for low-expressed transcripts.
Provides a Galaxy interface to RNA-seq analysis tools. Oqtans is the online platform for quantitative RNA-seq data analysis. Its integration into the Galaxy framework ensures transparent and reproducible computational analyses. This application is available in five incarnations: (i) as a cloud machine image, (ii) as a public Galaxy instance, (iii) as a git repository, (iv) the Galaxy Toolshed, and (v) a preconfigured share string to launch Galaxy CloudMan using sharing instance functionality.
Identifies and groups identical and near-identical reads. Fulcrum is a read collapser that returns a single consensus sequence. The software aims to simplify the problem of comparing N reads in a dataset to every other read in the set. It was designed to speed de novo sequencing and assembly efforts in which an N ×N comparison of reads is necessary, and can also be used as a first step in read mapping for polymorphism detection.
A software tool that extends de novo transfrags and identifies novel transfrags with DNA contigs or genes of close related species. BRANCH discovers novel exons first and then extends/joins fragmented de novo transfrags, so that the resulted transfrags are more complete.