Genome-guided transcriptome assembly software tools | RNA sequencing data analysis
A reference-based or genome-guided transcriptome assembly algorithm uses alignments of reads to the genome that are produced by a specialized spliced-alignment tool, such as TopHat2 or GSNAP, to identify clusters of reads that represent potential transcripts. It then builds transcript assemblies from these alignments. If paired-end reads are available, they improve the ability of the assembler to link together exons belonging to the same transcript.
Assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. Cufflinks assembles individual transcripts from RNA-seq reads that have been aligned to the genome. This software is able to infer the splicing structure of each gene because reads from multiple splice variants for a given gene can be found in a sample. Quantification of transcript abundances is also possible by preferring a reference annotation to assembling the reads.
Assembles transcripts based on references. Scallop improves the quality of produced transcriptomes according to the area under the curve measure. It constructs a transcriptome from a set of reads that have been aligned to a reference genome. This tool can divide the genome into regions of non-overlapping reads. It offers a large collection of customizable parameters corresponding to diverse stages of the analysis.
Builds transcriptomes from RNA-seq data. Trinity is a standalone software composed of three main components: (i) Inchworm, that first generates transcript contigs; (ii) Chrysalis, for clustering them and constructing complete de Bruijn graphs for each cluster and; (iii) Butterfly that processes individual graphs in parallel for finally resulting to the reconstruction of the transcript sequences.
Serves as a transcriptome reconstruction method. Scripture is able to reconstruct a mammalian transcriptome with no prior knowledge of gene annotations. It exploits longer reads that span splice junctions to link discontiguous (spliced) segments. This software identifies short but strongly expressed transcripts, lower transcripts with aggregate evidence and precise gene structures for most of found lincRNA loci.
Enables reconstruction of a transcriptome from RNA-seq reads. StringTie uses a genome-guided transcriptome assembly approach along with concepts from de novo genome assembly to improve transcript assembly. Successively this tool: first groups the reads into clusters, then creates a splice graph for each cluster from which it identifies transcripts, and then for each transcript it creates a separate flow network to estimate its expression level.
Allows users to handle RNA-sequencing pipeline based on the TopHat, Cufflinks and CummeRbund suite of software. Tuxedo is a program that enables assessment of alternative splicing (AS) inferred on fragments per kilobase per million (FPKM) values. It can assist researchers to detect genes and splicing variants and compare gene expression and transcripts under different conditions.
A comprehensive and user-friendly system for computational analysis of bacterial RNA-seq data. As input, Rockhopper takes RNA sequencing reads output by high-throughput sequencing technology (FASTQ, QSEQ, FASTA, SAM, or BAM files). Rockhopper supports the following tasks: reference based transcript assembly; de novo transcript assembly; normalizing data from different experiments; quantifying transcript abundance; testing for differential gene expression; characterizing operon structures; visualizing results in a genome browser.
A method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads. The testing of junctions is directed by the information available in the RNA-Seq dataset rather than a priori knowledge about the genome. Exons can thus be chained into stranded gene models.
Predicts splice graphs from RNA-Seq and expressed sequence tag (EST) data in order to enhance existing gene annotations. SpliceGrapher includes modules for recognizing alternative splicing (AS) events and for viewing predicted splice graphs along with the evidence employed to construct them. It can make predictions for genes that have low read coverage. This tool assists users to resolve AS events that are otherwise hard to detect from short-read data.
A probabilistic method for transcriptome assembly built on a Bayesian model of the RNA sequencing process. Under this model, samples from the posterior distribution over transcripts and their abundance values are obtained using Gibbs sampling. By using the frequency at which transcripts are observed during sampling to select the final assembly, we demonstrate marked improvements in sensitivity and precision over state-of-the-art assemblers on both simulated and real data.
Improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data. CAFE is a high-performing transcriptome assembly pipeline that enables to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map. It should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.
A genome-guided transcriptome assembler for RNA-seq data. TransComb can assemble all transcripts from short paired-end reads using a reference genome and analyze their abundances. It was developed based on a junction graph, weighted by a bin-packing strategy and paired-end information. A designed extension method based on weighted junction graphs can accurately extract paths representing expressed transcripts, whether they have low or high expression levels. Tested on both simulated and real datasets, TransComb demonstrates significant improvements in both recall and precision over leading assemblers, including StringTie, Cufflinks, Bayesembler, and Traph.
A framework for genome-based transcript reconstruction and quantification. CIDANE is engineered to not only assembly RNA-seq reads ab initio, but to also make use of the growing annotation of known splice sites, transcription start and end sites, or even full-length transcripts, available for most model organisms. To some extent, CIDANE is able to recover splice junctions that are invisible to existing bioinformatics tools.
Establishes a central, redistributable workbench for scientists and programmers working with RNA-related data. The RNA workbench builds a sustainable community around it. This platform is unique in combining available tools, workflows and training material, as well as providing easy access for experimentalists. It serves as a central hub for programmers, which can easily integrate and deploy their existing or novel tools and workflows.
Provides assembled and annotated Chinese hamster ovary (CHO) sequences with information derived from related organisms. UnoSeq is a bioinformatics pipeline developed to proves that expression profiling in organisms lacking any genome or transcriptome sequence information is feasible by combining Illumina’s mRNA-seq technology. This pipeline was applied to the analysis of CHO cells chosen as a model system owing to its relevance in the production of therapeutic proteins.
Analyzes raw data from RNA-seq experiments effectively and outputs results in a manner that is compatible with a wide variety of specialized downstream analyses on desktop computers. RNASEQR yields more accurate estimates for gene expression, complete gene structures and new transcript isoforms, as well as more accurate detection of single nucleotide variants (SNVs).
Allows creation and analysis of transcriptome maps. TRAM is a map-centred transcriptome analysis tool that integrates original methods for parsing, normalizing, mapping and statistically analyzing expression data. The software can identify chromosomal segments and gene clusters which are biologically relevant for the cell differentiation toward the megakaryocyte phenotype. It can also summarize and allow the analysis of gene expression data of unmapped genes.
Allows isoform discovery and abundance estimation. SLIDE is a sparse linear model approachthat uses RNA-Seq data to discover mRNA isoforms given an extant annotation of gene and exon boundaries, and to estimate the abundance of the discovered or other specified mRNA isoforms. The software can be used as a downstream isoform discovery tool of de novo gene and exon assembly algorithms. It can be extended to incorporate mRNA isoform information from EST (21), CAGE (19), and RACE (18) data.
Aims to reduce the efforts put into basic data processing for next-generation sequencing (NGS). QuickNGS enables data analysis for major applications of NGS in a batch-like operation mode. This pipeline relies on the organization of available metadata in a MySQL database which is used to control the overall workflow composed of specific software applications for different kinds of analysis.