Analyzes two types of RNA-seq: single cell data and bulk data. URSM adjusts dropout events in single cell data and achieves simultaneously deconvolution in bulk data. This software doesn’t need to calculate on the same subjects the single cell and bulk data. It can (1) obtain reliable estimation of cell type specific gene expression profiles; (2) infer the dropout entries in single cell data; and (3) infer the mixing proportions of different cell types in bulk samples.
Interprets differential expression (DE) detection of RNA-Seq experiments with a small number or non-replicated samples in each class. LPEseq evaluates the baseline error distribution for each of the compared experimental conditions. It can be used on datasets containing replicates and is also efficient for non-replicated datasets. This tool is able to remove outliers derived from the replicates assumption between classes.
Infers relative poly(A) site used in terminal exons from RNA sequencing data and KAPAC. PAQR is composed of three modules: (1) a script to deduce transcript integrity values, (2) a script to create the coverage profiles for all considered terminal exons, and (3) a script to obtain the relative usage together with the estimated expression of poly(A) sites with sufficient evidence of usage. The software enables evaluation of 3′ end processing in data sets such as those from The Cancer Genome Atlas (TCGA).
Identifies large-scale copy-number variants (CNVs) in scRNA-seq. CONICS provides a method to separate neoplastic cells for downstream analysis. It includes algorithms to triage cells from a scRNA-seq assay, based on the presence of CNVs detected in an orthogonal DNA sequencing experiment. It integrates tumor-normal fold-changes with the minor-allele frequencies of point mutations to estimate false-discovery rates (FDRs) in CNV classification. Additionally, it includes routines to perform downstream phylogeny assessment and gene co-expression analysis.
Characterizes circRNAs candidates. FUCHS provides the user with directions for further steps to investigate the circRNA’s function and biogenesis. FUCHS is able to identify alternative exon usage within the same circle boundaries, summarize the different circles emerging from the same host-gene, quantify double-breakpoint fragments as indicator for circularity and visualize a circRNA’s read coverage profile independent of any genome browser.
Estimates 3’ untranslated region (UTR) landscape from RNA-seq. GETUTR has three steps: (1) preprocessing for the extraction of all reads in RNA-seq data, (2) smoothing via algorithms and (3) normalization applied for all genes. Three smoothing algorithms that were tested on their average lengths of 3’ UTR and on the prediction of polyadenylation cleavage site (PCS) are available through this software.
Programs search nucleotide databases by using a nucleotide query. BLASTN key features are searching with short sequencing and cross-species comparison. Users can select an optimization according to: (i) highly similar sequences, (ii) more dissimilar sequences or (iii) somewhat similar sequences. This web application proceeds by searching sets in NCBI data sources.
Searches protein database using a translated nucleotide query. BLASTX is a BLAST search application that compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. This application can also work in Blast2Sequences mode and can send BLAST searches over the network to public NCBI server if desired.
Assists users to observe DNA and protein sequence data from different species and populations. MEGA is composed of several tools allowing researchers to work on phylogenomics and phylomedicine. This repository includes features aiming to determine gene duplication events in gene family trees. Moreover, this tool is available through a graphical user interface (GUI) and a command line interface.
Detects head-to-tail spliced (back-spliced) sequencing reads, indicative of circular RNA (circRNA) in RNA-seq data. find_circ is a pipeline that can find circRNAs in any genomic region. It takes advantage of long (,100 nucleotides) reads, and predicts the acceptor and donor splice sites used to link the ends of the RNAs. This method provides evidence that circRNAs form an important class of post-transcriptional regulators.
Allows studying of spatial patterning of gene expression at the single-cell level. Seurat is an R package that enables quality control (QC), analysis, and exploration of single cell RNA-seq data. The software includes three computational methods: (1) unsupervised clustering and discovery of cell types and states, (2) spatial reconstruction of single cell data, and (3) integrated analysis of single cell RNA-seq across conditions, technologies, and species. It can also localize rare subpopulations, and map both spatially restricted and scattered groups.
Allows to find regions of sequence similarity. PSI-BLAST is a protein database search program. The software is able to access the probable substitutions at each sequence position using the results of a previous Gapped-Blast search, an algorithm comparing the amino acid substitution matrix. It can combine search results with robust statistics to build and apply profiles also known as a position-specific scoring matrix. A modified application of PSI-BLAST - PSI-BLASTexB - that solves sequence weighting scheme limitations, was also developed.
Assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. Cufflinks assembles individual transcripts from RNA-seq reads that have been aligned to the genome. This software is able to infer the splicing structure of each gene because reads from multiple splice variants for a given gene can be found in a sample. Quantification of transcript abundances is also possible by preferring a reference annotation to assembling the reads.
Allows to perform several low-level analyses on of single-cell RNA-seq data. Scran is a package that provides functions to normalize cell-specific biases, assign cell cycle phase, and detect highly variable and significantly correlated genes.
A computational approach that measures changes in mature RNA and pre-mRNA reads across different experimental conditions to quantify transcriptional and post-transcriptional regulation of gene expression. EISA reveals both transcriptional and post-transcriptional contributions to expression changes, increasing the amount of information that can be gained from RNA-seq data sets.
Allows users to quantify abundances of transcripts from RNA-Seq data and target sequences using high-throughput sequencing (HTS) reads. kallisto is based on pseudo-alignment concept to determine the compatibility of reads with targets. In test, this tool is able to treat over 30 million human reads using the read sequences and a transcriptome index.
Performs factor analysis on suitable sets of control genes or samples. RUVSeq furnishes estimations of expression fold-changes. This package implements the remove unwanted variation (RUV) methods for the normalization of RNA-Seq read counts between samples.
Aligns short read geared toward mammalian re-sequencing. Bowtie is based on a Burrows-Wheeler index based on the full-text minute-space (FM) index. It follows two steps: an initial, ungapped seed-finding stage that derives advantage from the speed and memory efficiency of the full-text minute index and a gapped extension stage that employs dynamic programming and benefits from the efficiency of single-instruction multiple-data (SIMD) parallel processing available on modern processors.
An R/Bioconductor package for modeling and correcting fragment sequence bias for RNA-seq transcript abundance estimation. This framework enables further research both into optimization of library preparation protocols to reduce or eliminate biases as well as computational approaches that mitigate bias.
Provides access to the genomic alignments of public ribo-seq reads in conjunction with mRNA-seq reads along with relevant annotation tracks. GWIPS-viz is a specialized ribo-seq browser allowing researchers to support ribo-seq evidence for alternative proteoforms inferred from phylogenetic analysis or detect with proteomics or other experimental techniques. It can be used as a support tool for predictions based on other approaches and for generating hypotheses that can be tested using methods other than ribo-seq.
Gathers human long poly-adenylated RNA transcripts derived from computational analysis of high-throughput RNA sequencing (RNA-Seq) data. MiTranscriptome provides a set of about 6,500 libraries including datasets from human tissues and samples from cell lines. The tissue libraries originate from primary tumor specimens, metastases, and normal or benign adjacent tissues.
Offers a reference sequence of chromosome 3B. Wheat3BMine is useful to delineate structural and functional features along a chromosome and to establish correlations between recombination intensity, gene density, gene expression, and evolution rate. It provides genomic annotation information of the wheat 3B survey such as gene, mRNA, polypeptide or repeat region. This database is searchable by names, identifiers or keywords related to genes, mRNA, repeat region or marker.
Provides resources to decode Pan-Cancer and Interaction Networks of lncRNAs, miRNAs, competing endogenous RNAs(ceRNAs), RNA-binding proteins (RBPs) and mRNAs from large-scale CLIP-Seq data and tumor samples. starBase deciphers Protein-RNA and miRNA-target interactions, such as protein-lncRNA, protein-sncRNA, protein-mRNA, protein-pseudogene, miRNA-lncRNA, miRNA-mRNA, miRNA-circRNA, miRNA-pseudogene, miRNA-sncRNA interactions and ceRNA networks from 108 CLIP-Seq datasets.
A database which offers gene annotation of Anolis carolinensis also known as Carolina anole an arboreal lizard. The anole lizard genome is composed of 13 chromosomes, assembled from 41.9861 contigs and 2.143 scaffolds. The total number of bases in the genome is 1.78Gb. The gene set for anole lizard was built using the Ensembl genebuild pipeline. In addition to the main set, gene models have been predicted for each tissue type using the RNA-Seq pipeline. Anolis carolinensis belongs to the Dactyloidae family.
Provides a comprehensive and tissue-specific plant circular RNA database. AtCircDB is an online resource for predicted and validated Arabidopsis hosting circular RNA candidates identified from largescale sequencing data. This database currently hosts four categories of information: (i) circular RNA information, (ii) potential miRNA–circular RNA interaction, (iii) super circular region and (iv) tissue information.
A comprehensive portal for blood-brain barrier transcriptomics data, obtained by sequencing mRNA (mRNA-seq) and microRNA (miRNA-seq) of polarized hCMEC/D3 cell monolayers. This data encompasses coding (gene expression, alternate splice forms, expressed single nucleotide variants -eSNVs) and non-coding (microRNA, LincRNA, circular RNA) counts that are easily accessible through BBBomics hub database. We also superimposed the RNA-seq coding data on 285 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, which include canonical, non-canonical, and/or atypical pathways retrievable using BBBomics hub.
Provides RNA-RNA interactions (RRIs) identified through high-throughput sequencing technologies. RISE is a comprehensive database of RNA interactome from sequencing experiments. It includes (i) comprehensive curation of RRIs, (ii) a large dataset of RRIs among mRNAs and lncRNAs, (iii) details of the interacting sites and (iv) extensive annotations for each RRI. It provides an assistance for researchers looking for interaction and other functional information on individual RNAs, and analyzing RRI networks of specific pathways or systems.
Gathers expression correlations between microRNAs and mRNAs by using total RNA sequencing (RNA-seq) experiments from NCBI’s Sequence Read Archive (SRA). mirCoX is an online repository integrating sequence-based miRNA target predictions from miRanda and TargetScan databases together with RNA-seq derived expression correlations. Its online interface allows users to browse by gene name or microRNA name and download information about miRNA, gene, correlation.
A web-based repository of RNA-Seq gene expression profiles and query tools. The website offers open and easy access to RNA-Seq gene expression profiles and tools to both compare tissues and find genes with specific expression patterns. To enlarge the scope of the RNA-Seq Atlas, the data were linked to common functional and genetic databases, in particular offering information on the respective gene, signaling pathway analysis and evaluation of biological functions by means of gene ontologies. Additionally, data were linked to several microarray gene profiles, including BioGPS normal tissue profiles and NCI60 cancer cell line expression data. Our data search interface allows an integrative detailed comparison between our RNA-Seq data and the microarray information.
Provides a comprehensive high-quality reference transcript dataset about Arabidopsis transcripts. AtRTD contains more than 82 190 unique transcript models. It was generated by integration of transcript assemblies of ca. 8.5 billion pairs of reads from 285 RNA-seq data sets obtained from 129 RNA-seq libraries. The database contains 37 137 events and those which occurred at least 50 times made up 95.24% of all alternative splicing events.
Gathers long RNA species derived from RNA-seq data analyses of human blood exosomes. exoRBase is a manually curated database which allows integration and visualization of RNA expression profiles spanning normal individuals and patients with different diseases. Besides, users can extract RNAs of interest through customized browsing options. The database includes about 15000 IncRNAs, 18000 mRNAS and 58000 circRNAs.
Provides access to processed and curated NGS experiments, including ChIP-Seq (transcription factors and histones), RNA-Seq and DNase-Seq. The current focus of this database is to unify NGS data for the haematopoietic system and ES cells. It encompasses two specialized compendia: one focused on blood cells (HAEMCODE), and a second focused on data from embryonic stem (ES) cells (ESCODE).
Contains sperm-borne RNA profiling expression data for mouse, rat, rabbit, and human. SpermBase provides large and small RNA expression data, total sperm and sperm heads. It will be expanded to other species such as plants. The database has been constructed on RNA-Seq analyses. The utility of SpermBase was shown by comparing the sperm RNA-Seq data and identifying highly conserved mammalian sperm-borne RNAs among the four mammalian species.
Aims to characterize the regulatory networks between RNA binding proteins (RBPs) and various RNA transcript classes by integrating large amounts of CLIP-seq (including HITS-CLIP, PAR-CLIP and iCLIP as variations) data sets. CLIPdb 1.0 consistently annotated the CLIP-seq data sets and RBPs, and provides a user-friendly interface for quick navigation of the CLIP-seq data.
Aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. This resource will continue to host additional RNA-Seq data, alignments and assemblies as they are generated over the coming years and provide a key resource for the annotation of NHP genomes as well as informing primate studies on evolution, reproduction, infection, immunity and pharmacology.
Offers the access of over 2000 human samples. IRBase offers an online database of intron retention (IR). This resource permits to assess a specified intron retention event within a tissue/cell type with a gene symbol and a tissue/cell type, or assess genome-wide intron retention within an RNA-seq dataset with an RNA-seq dataset. IRBase is a part of the toolbox developed by the CNRS to study the impact of IRintron retention on gene regulation.
Provides a manually curated database of mouse RNA-Seq datasets. RBPMetaDB is a resource that includes the metadata of perturbed RNA-binding proteins (RBPs). It allows users to access all the key information related to the curated RNA-Seq datasets, including the GEO/ArrayExpress accession numbers, dataset titles, numbers of samples, associated RNA-binding proteins (RBPs), perturbation types, and PubMed IDs.
Collects the RNC-seq, Ribo-seq and the corresponding mRNA-seq data from Gene Expression Omnibus (GEO) and Short Read Archive (SRA) databases. TranslatomeDB is an online resource that offers differential gene expression (DGE) analysis to compare two datasets and calculates translation ratios (TR) and elongation velocity index (EVI) to quantitatively assess the translation initiation efficiency and elongation speed.
Permits efficient searching of its database containing comprehensive information for all public RNA-seq data sets on mice with genotype as a factor. RNASeqMetaDB contains metadata for a total of 306 experiments targeting 298 different genes. Users can search the database using multiple parameters like genes, diseases, tissue types, keywords, and associated publications in order to find data sets that match their interests. Summary statistics of the metadata is also presented on the web server showing interesting global patterns of RNA-Seq studies.
Naim Al Mahi I am a PhD candidate working in the area of computational and statistical genomics, with a focus on developing new methodologies and computational pipelines for analyzing large scale genomics data.
University of Cincinnati