Improves power, model and account for over-dispersion inherent in RNA-seq data. EAGLE provides a flexible framework for modeling influence of both technical and biological factors while accounting for extra-binomial variation in sequencing data. This R package is a method to test for gene-environment (GxE) interactions using allele specific expression (ASE). It uses a binomial generalized linear mixed model (GLMM), predicting the relative number of RNA-seq reads from each allele at exonic, heterozygous loci under different environmental conditions.
Reconstructs a consensus transcriptome from a collection of individual assemblies. TACO is an algorithm that employs change point detection to break apart complex loci and correctly delineate transcript start and end sites, and a dynamic programming approach to assemble transcripts from a network of splicing patterns. It also contains an easy to use companion tool for comparing meta-assemblies to reference transcriptomes, assessing overlap with reference and also protein coding potential.
Provides assistance for internal controls that can assess almost all stages of the RNA-seq workflow. Sequins supports library preparation, sequencing, split-read alignment, transcript assembly, gene expression and alternative splicing. This software is appropriate to evaluate downstream bioinformatic steps, enhance the optimization parameter choice and can be used as normalization factors to compare multiple sample.
Identifies both protein-coding and non-coding indicators. TROM performs a comprehensive transcriptome mapping for diverse tissues and cell types within and across four mammalian species. It also provides a useful resource of conserved cell-state associated transcription factors, RNA-binding proteins and lncRNAs, which characterize transcriptomes of various cell states and enable researchers to explore new hypotheses in developmental biology.
A Java program for the automated detection and classification of transcription start sites (TSS) from RNA-seq data. TSSpredator reads RNA-seq data in the form of simple wiggle files and performs a genome wide comparative prediction of TSS, for example between different growth conditions.
A fast and accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA-sequencing (RNA-seq), which often span multiple exons due to splicing. phASER provides 1) dramatically more accurate phasing of rare and de novo variants compared to population-based phasing; 2) phasing of variants in the same gene up to hundreds of kilobases away which cannot be obtained from DNA-sequencing reads; 3) high confidence measures of haplotypic expression, greatly improving power for allelic expression studies.
Identifies transcription units (Tus) with given RNA-seq data of any bacterium using a machine-learning approach. SeqTU can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.
A computational method to assess a quantitative measure of mRNA integrity. This is done by quantitatively modeling of the 3' bias of read coverage profiles along each mRNA transcript. A per-sample summary mRIN is then derived as an indicator of mRNA degradation. This method has been used for systematic analysis of large scale RNA-Seq data of postmortem tissues, in which RNA degradation during tissue collection is particularly an issue.
Offers a statistical model for counts of RNA-Seq data. mseq is an R package that gathers an iterative glm procedure for the Poisson linear model, a training procedure of the multiple additive regression trees (MART) model and cross-validation for both methods. It can model non-uniformity in short-read rates with the aim of improving the estimation of gene and isoform expressions for both Illumina and Applied Biosystems data.
Identifies branch points in complex genomes. LaSSO is an algorithm that provides an approach to detect lariat intermediates and to map branch points on a genomic scale. This tool can perform on the identification of additional cryptic or alternative splice sites in analyzing an intronic sequence with its corresponding upstream or downstream exon sequence. Moreover, it can be applied to spot novel splicing events by partitioning the genome with a sliding window while ignoring known annotations.
Provides a data-driven solution to test the assumptions of global normalization methods. Group level information about each sample (such as tumor/normal status) must be provided because the test assesses if there are global differences in the distributions between the user-defined groups.
Captures all k-mer variation in an input set of RNA-seq libraries. DE-kupl is a k-mer-based computational protocol that has four main components: (1) indexing, (2) filtering and masking, (3) differential expression (DE) and (4) extending and annotating. The software directly analyzes the contents of the raw FASTQ files, displacing mapping to the final stage of the procedure. It is able to detect a wide range of differential transcription and RNA processing events.
Allows identification of multiple gene sets that play a role in the characterization, clinical application, or functional relevance of a disease phenotype. GISPA is designed to characterize the molecular tumor profile of a single sample relative to other, comparison samples based on changes (increasing/decreasing) among several diverse, genome-wide data types. A user-friendly interface, shinyGISPA, was also developed to combine and compare multiple levels of genomic to proteomic data.
Allows users to analyze U-indel RNA editing in non-model species with no prior data available. T-Aligner is a read mapping and assembly tool that fits multiple potential edited open reading frames (ORFs) from shotgun reads mapped to each cryptogene. The application enables the read mapping and visualization of the totality of the editing states and their coverage as well as the assembly of canonical and alternative translatable mRNAs.
A tool designed to simultaneously uncover patterns of focal copy number alteration and coordinated expression change, thus combining both principles. FocalScan outputs a ranking of tentative cancer drivers or suppressors. FocalScan works with RNA-seq data, and unlike other tools it can scan the genome unaided by a gene annotation, enabling identification of novel putatively functional elements including lncRNAs. Application on a breast cancer data set suggests considerably better performance than other DNA/RNA integration tools.
Improves the predictive performances of ordinary logistic ridge regression and the group lasso. GRridge allows the use of multiple sources of co-data (e.g. external p-values, gene lists, annotation) to improve prediction of binary, continuous and survival response using (logistic, linear or Cox) group-regularized ridge regression. It also facilitates post-hoc variable selection and prediction diagnostics by cross-validation using ROC curves and AUC.
Evaluates genes' expression level trustworthiness individually subsequent to the read alignment step. GeneQC is a package that consists of two main characteristics: feature extraction and modeling. The application estimates mapping uncertainty by using feature extraction, elastic-net regularization, and mixture model fitting. It can be added to pipelines to complement RNA-Seq data analysis and to assist users in planning further ones.
Resolves conflicts due to repeated sequences in RNA. Barnacle is a pipeline for detecting and characterizing chimeric transcripts from long RNA sequences, such as those generated by de novo transcriptome assembly. It identifies sequences with a variety of anomalous alignment topologies, predicts partial tandem duplications (PTDs), internal tandem duplications (ITDS), and fusions from these sequences, and measures the coverage of the inferred chimeric transcripts relative to corresponding wild-type transcripts.
Calculates a filtering threshold for replicated RNA sequencing data. HTSFilter provides an intuitive data-driven way to filter RNA-seq data and to effectively remove genes with low constant expression levels. HTSFilter may be useful in a variety of applications for RNA-seq data, including differential expression analyses, clustering and co-expression analyses, and network inference.
Rapid and quantitative metrics for evaluating structure probing data quality. SPEQC uses metrics to rapidly and quantitatively evaluate data quality from structure probing experiments, demonstrating their efficacy on both small synthetic libraries and transcriptome-wide datasets. A signal-to-noise ratio concept evaluates replicate agreement, which has the capacity to identify high-quality data. The developed metrics and tools will be useful in summarizing large-scale datasets and will help standardize quality control in the field.