1 - 50 of 68 results

Sequins / Sequencing spike-ins

A set of spike-in RNA standards that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. Sequins are an easy, simple and effective approach to assess the NGS workflow, calculate diagnostic statistics, internal reference ladders and normalize between multiple samples.


A software for inference of B-cell receptor (BCR) repertoires using short-read RNA sequencing data. V'DJer uses customized read extraction, assembly and V(D)J rearrangement detection and filtering to produce contigs representing the most abundant portions of the BCR. V’DJer allows for full inference of repertoire characteristics including variable and joining gene segment usage, population diversity, sequence sharing between populations, antigen binding region amino acid properties and motifs, clonal structure and somatic hypermutation in BCR repertoires.


A tool designed to simultaneously uncover patterns of focal copy number alteration and coordinated expression change, thus combining both principles. FocalScan outputs a ranking of tentative cancer drivers or suppressors. FocalScan works with RNA-seq data, and unlike other tools it can scan the genome unaided by a gene annotation, enabling identification of novel putatively functional elements including lncRNAs. Application on a breast cancer data set suggests considerably better performance than other DNA/RNA integration tools.

GISPA / Gene Integrated Set Profile Analysis

Combines and compares several genome-wide data types from three or more sample classes in order to find the drivers of each class. GISPA produces ranked gene sets within the context of an a priori specified molecular profile, such as genes that have some combination of increased CpG methylation, CN loss and decreased GE specific to a single sample or class. Sample Integrated Set Profile Analysis (SISPA), a variation of GISPA, is a novel approach to find samples within the context of a similar, a priori multidimensional profile from a gene set of interest, either GISPA-defined or by the user. GISPA and SISPA derive results from a combined analysis of all data types; both are non-parametric and therefore do not rely upon imposed analytical distributions and crucially, do not require a large sample size.


An R/Python wrapper for pathway (or functional gene-set) analysis of genomic loci, adapted for advances in genome research. Seq2pathway associates the biological significance of genomic loci with their target transcripts and then summarizes the quantified values on the gene-level into pathway scores. It is designed to isolate systematic disturbances and common biological underpinnings from next-generation sequencing (NGS) data. Seq2pathway offers Bioconductor users enhanced capability in discovering collective pathway effects caused by both coding genes and cis-regulation of non-coding elements.

ToNER / Transformation of Nucleotide Enrichment Ratios

Identifies enriched sites from differential RNA-seq experiments comprising enriched and unenriched libraries. ToNER uses a global distribution model to report statistics of enrichment for all nucleotides. It calculates position-wise normalized read depth ratio between two libraries for all mapped genome positions. The tool is able to identify transcription start site (TSS) from Cappable-seq data in prokaryotes. It can locate enriched positions in complex data of eukaryotes such as m6A-seq.


Allows study of extracellular vesicle (EV) mediated mRNA transfers between cells. EVtransfer also investigates the role of exosomes as a vehicle in mediating the exchange. The software enables quality control, alignment, mapping, and base call recalibration on the raw SNP array and RNA sequencing reads data, evaluation of the significance of genotypic variation of a cell line under in vitro co-culture, and estimation of the rate of false discovered loci involving in the transfer process.


Accesses the results of a systematically and continually updated and continually growing analysis of public RNA-seq data in European Nucleotide Archive (ENA). RNASeq-er enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices as well as sample attributes annotated with ontology terms. It provides access to baseline gene expression quantifications, aggregated across all runs in each of over 4000 normal tissue, cell type, developmental stage, sex, and strain conditions in 61 species.

ROP / Read Origin Protocol

A computational protocol aimed to discover the source of all reads, which originate from complex RNA molecules, recombinant antibodies and microbial communities. The ROP accounts for 98.8% of all reads across poly(A) and ribo-depletion protocols, compared to 83.8% by conventional reference-based protocols. ROP profiles repeats, circRNAs, gene fusions, trans-splicing events, recombined B/T-cell receptor sequences and microbial communities. The ‘dumpster diving’ profile of unmapped reads output by our method is not limited to RNA-seq technology and may be applied to whole-exome and whole-genome sequencing.

phASER / phasing and Allele Specific Expression from RNA-seq

A fast and accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA-sequencing (RNA-seq), which often span multiple exons due to splicing. phASER provides 1) dramatically more accurate phasing of rare and de novo variants compared to population-based phasing; 2) phasing of variants in the same gene up to hundreds of kilobases away which cannot be obtained from DNA-sequencing reads; 3) high confidence measures of haplotypic expression, greatly improving power for allelic expression studies.


An R script for processing MiTCR-derived CDR3 data from Peripheral T Cell Lymphoma (PTCL) RNA-seq. TcellClonality will remove non-productive CDR3 sequences, calculate the relative abundance of each CDR3, resolves ambiguity in CDR3 chain assignment, and classifies CDR3 clonotypes as being dominant or background (using control samples to determine the background level). It then calculates Shannon Entropy and estimates Tumor Purity for each sample. Finally, it includes code for analyzing T Cell Receptor (TCR) C gene expression. For analysis, RSEM v1.2.29 was used to calculate gene expression levels for all transcripts, and transcripts from TCR C genes were extracted.

SPEQC / Structure Probing Experiment Quality Control

Rapid and quantitative metrics for evaluating structure probing data quality. SPEQC uses metrics to rapidly and quantitatively evaluate data quality from structure probing experiments, demonstrating their efficacy on both small synthetic libraries and transcriptome-wide datasets. A signal-to-noise ratio concept evaluates replicate agreement, which has the capacity to identify high-quality data. The developed metrics and tools will be useful in summarizing large-scale datasets and will help standardize quality control in the field.


Investigates developmental epigenomes and transcriptomes that are related to De novo mutations (DNMs) in developmental disorders. EpiDenovo is a database for exploring the associations between embryonic epigenetic regulation and DNMs in developmental disorders, including neuropsychiatric disorders and congenital heart disease. This resource is based on the epigenomes of publicly available chromatin immunoprecipitation sequencing (ChIP-seq) and chromatin accessibility data during the embryonic development of mammals, including humans and mice.

EAGLE / Environment-ASE through Generalized LinEar modeling

Improves power, model and account for over-dispersion inherent in RNA-seq data. EAGLE provides a flexible framework for modeling influence of both technical and biological factors while accounting for extra-binomial variation in sequencing data. This R package is a method to test for gene-environment (GxE) interactions using allele specific expression (ASE). It uses a binomial generalized linear mixed model (GLMM), predicting the relative number of RNA-seq reads from each allele at exonic, heterozygous loci under different environmental conditions.


Identifies boundaries of expressed transcripts from RNA-seq reads alignment. DeepBound is an effective approach to employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. It can be used to detect transcript boundaries in the RNA-seq experiments, and the results can be further applied to correct transcript abundance estimation and to study gene alternative splicing. This framework is general and can be easily adapted to add other features and trained and applied to other species.

HapIso / Haplotype-specific Isoform Reconstruction

Reconstructs the haplotype-specific isoforms from long single-molecule reads. HapIso is a comprehensive method for the accurate reconstruction of the haplotype-specific isoforms of a diploid cell that uses the splice mapping of the long single-molecule reads and partitions the reads into parental haplotypes. To overcome gapped coverage and splicing structures of the gene, the haplotype reconstruction procedure is applied independently for regions of contiguous coverage defined as transcribed segments.


Contains simulation code for evaluating RNA-Seq normalization methods when assumptions are violated. rnaSeqAssumptions code is implemented in R and is freely available for download. Normalization methods were examined from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Normalization methods perform poorly when their assumptions are violated and this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment.

LaSSO / Lariat Sequence Site Origin

Identifies branch points in complex genomes. LaSSO is an algorithm that provides an approach to detect lariat intermediates and to map branch points on a genomic scale. This tool can perform on the identification of additional cryptic or alternative splice sites in analyzing an intronic sequence with its corresponding upstream or downstream exon sequence. Moreover, it can be applied to spot novel splicing events by partitioning the genome with a sliding window while ignoring known annotations.

Barnacle / Browsing Assembled RNA for Chimeras with Localized Evidence

Resolves conflicts due to repeated sequences in RNA. Barnacle is a pipeline for detecting and characterizing chimeric transcripts from long RNA sequences, such as those generated by de novo transcriptome assembly. It identifies sequences with a variety of anomalous alignment topologies, predicts partial tandem duplications (PTDs), internal tandem duplications (ITDS), and fusions from these sequences, and measures the coverage of the inferred chimeric transcripts relative to corresponding wild-type transcripts.