Supplies a communautary platform for managing information and meta-information derived from gene expression analysis. OMiCC consists of a repository of gene expression datasets coupled to a platform enabling the generation and analysis of data from multiples sources. Users can browse more than 26,000 pre-normalized and quality-checked human and mouse studies, compute significantly differentially expressed genes and differential expression profiles (DEPs). Results, including metadata and components of cross-study data, are compiled and made available to other users.
Facilitates understanding of the interconnection of pathways. Caleydo enables a non-distracting relation to gene expression data. It is based on simultaneous consideration of gene expression information and pathways. It allows to explore relationships between multiple, handcrafted pathways, and relationship of pathways to actual measurements of gene expression regulation directly. It can improve the process of understanding the complex network of pathways and the individual effects of gene expression regulation considerably.
Allows users to identify microRNA target and small interfering (si)RNA off-target signals from expression data. Sylamer supplies same functionality than Gene Set Enrichment Analysis1 (GSEA), to realize annotation of nucleotide patterns in sequences. It assesses over-and under-representation of nucleotide words of specific length in ranked genelists. It is useful in the case of study about genome-scale.
Allows users to integrate diverse gene expression data to aid in the interpretation of existing and new experiments. ADAGE is a neural network model. Using an unsupervised machine learning approach, the community-wide Pseudomonas aeruginosa gene expression data were integrated to create an ADAGE model that captures patterns corresponding to biological states or processes in gene expression data. The eADAGE algorithm combines multiple ADAGE models into one ensemble model to address model variability due to stochasticity and local minima.
Provides software and data about the genes expressed for every tissue and cell type in humans and mice. The Gene Expression Barcode uses publicly available microarray data sets combined with a suite of single-array preprocessing, quality control and analysis methods to perform analysis. Data have been used to compliment epigenetic studies, improve ChIP-seq and ChIP-chip data analysis and investigate increased heterogeneity in cancer.
An automated pipeline for RNA homology search. RNAlien models are a starting point to construct models of comparable sensitivity and specificity to manually curated ones from the Rfam database. It is based on an iterative sequence search process. In each step new sequences from a different section of the phylogenetic tree are searched for, filtered and possibly included in the growing RNA family model.
Identifies non-coding RNA (ncRNA) ends with terminal stem-loops by using chimeric reads from RNA-seq data. Vicinal exploits the self-priming and ligation property of ncRNA 3’ and 5’ terminal stem-loops during library preparation via the Gubler-Hoffman method and massively parallel sequencing. This software maps the unmapped fragments to the vicinity of the mapped fragments.
Represents a method for miRNA motif discovery from sequence and gene expression data. MixMir allows users to analyze gene expression data and mRNA sequence and uses a mixed linear model (MLM). It can be applied to other regulatory element motif detection problems, such as transcription factor and RNA-binding protein motif prediction.
Permits to explore the relationships between RNA sequences. Crosslink allows the user to visualize relationships determined by distinct tools within the same network. It can be used in a microRNA context since Vmatch and RNAhybrid are suitable tools for determining the antisense and hybridization relationships. The tool gives the determination of sequence relationships by using other tools (BLAST, Vmatch and RNAhybrid).
Provides both visualization and analysis of valuable time-series gene expression data during cell state transitions. CSTEA focuses on expounding important genes and their corresponding functions at intermediate time points during the dynamic transition process. The text mining served for collection of public datasets in order to propose a comprehensive roadmap describing the diverse cell state transitions. The purpose of this tool is to provide a valuable resource for both experimental and computational biologists.
Allows users to infer changes in quantities of immune cell sub-populations. DCQ is a digital cell quantifier that deduces changes in cell quantities between the two conditions based on a cell surface markers motivated model. It permits the prediction of over 200 immune cell types simultaneous and can discriminate between closely related immune subtypes and different levels of activity. Moreover, this software can generate detailed testable hypotheses concerning the role of specific immune cells under particular conditions.
Facilitates the retrieval of lung cell-specific gene expression information from extensive data sets derived from RNA sequencing of single cells. LungGENS is a web-based bioinformatics resource for querying single-cell gene expression databases by entering a gene symbol or a list of genes or selecting a cell type of their interest. It also integrates the data with previous RNA expression studies from mouse lung at various developmental times.
Allows users to detect significant genes for determining cell types and their stages of development. Keygenes is available both as a web platform and an application divided into three scripts: (i) the first one identifies the 500 most variably expressed genes across a next generation sequencing (NGS) dataset; (ii) the second uses an NGS training set to predict an NGS test set; and (iii) the third uses an NGS training set to predict a microarray test set.
Provides functions to perform ensemble minimum redundancy maximum relevance (mRMR) feature selection by taking full advantage of parallel computing. mRMRe can be beneficial from both a predictive (lower bias and lower variance) and biological (more thorough feature space exploration) point of view. This makes it particularly attractive for high-throughput genomic data analysis. This package contains a set of function to compute mutual information matrices from continuous, categorical and survival variables.
Leverages the relationships between tissues and cell-types. URSA is able to identify specific tissue/cell-type signals present in a given gene expression profile. It permits to automatically annotate samples in public gene expression repositories where most samples are currently lacking tissue/cell-type-specific information. The tool can be used to test and identify possible sample contaminations or resolve cancer samples of unknown primary origin.
Facilitates effective data analysis and allows the simultaneous visualization of groups of genes at a cell/tissue level of resolution within an organ. Tomato Expression Atlas provides an atlas which can be adapted to different types of expression data from diverse multicellular species. It can be used to produce publication graphics. This tool permits enhancing hypothesis development and testing in addition to candidate gene identification.
Stores and analyses Live Cell Array (LCA) data. BasyLiCA is a user-friendly software dedicated to wet lab biologists for the analysis of large amounts of LCA data in microplates. The user-friendly interface allows (i) the automatic insertion of LCA measurements; (ii) the manual or semi-automatic insertion of the characteristics of wells, strains and injection; (iii) the administration and the management of the database as a simple user or as an administrator and (iv) data treatment.
A web-based tool in which users can upload their individual SNP data and obtain predicted expression levels for the set of predictable genes across the 14 different cell types. GenoExp thus allows users with biological knowledge to study the possible effects that their set of SNPs might have on these genes and predict their cell-specific expression levels relative to the population average.
A generative, probabilistic model of RNA polymerase that fully describes loading, initiation, elongation and termination. Tfit implements a finite mixture model to identify sites of bidirectional or divergent transcription in nascent transcription assays such as Global Run-On and Precision Run-on followed by sequencing data. Tfit is separated by two modules: (1) bidir and (2) model.
A resourced-based, well-documented web system that provides publicly available information on genes, biological pathways, Gene Ontology terms, gene-gene interaction networks (importantly, with the directionality of interactions), and links to key related PubMed documents. The PathwaysWeb API simplifies the construction of applications that need to retrieve and interrelate information across multiple, pathway-related data types from a variety of original data sources. PathwaysBrowser is a companion website that enables users to explore the same integrated pathway data. The PathwaysWeb system facilitates reproducible analyses by providing access to all versions of the integrated data sets. Although its Gene Ontology subsystem includes data for mouse, PathwaysWeb currently focuses on human data. However, pathways for mouse and many other species can be inferred with a high success rate from human pathways.
Infers transcriptome conservation patterns. myTAI can be used to screen for stages of high or low transcriptome conservation within a biological process of interest. This tool can be used for investigating the developmental hourglass model of embryo development on the transcriptomic level. Moreover, it provides functionality for: taxonomic information retrieval, gene age enrichment analyses, differential gene expression analyses of age categories, and additional metrics for quantifying transcriptome conservation.
Generates possible toehold switches to detect desired RNA transcripts. Toehold switch web tool permits to (i) manipulate sequence to uppercase DNA form, (ii) screen for suitable trigger sequence, (iii) generate corresponding switch sequence, (iv) calculate ribosome binding site (RBS)-linker energy for each pair of sequence, (v) add sticky end for corresponding site, or (vi) rank switches according to machine learning.
Supplies a platform dedicated to signatures visualization. L1000FWD includes more than 16800 drugs and small molecules profile with their corresponding metadata and downloadable datasets. The application allows users to project custom signatures for pinpointing their position in the global expression space. It can be used for proposing indications and reveals potential mechanisms-of-action (MOA) for possible therapeutics.
Investigates RIP-chip datasets. REA identifies statistically meaningful cut-offs for enrichment values with a Gaussian mixture model (GMM) approach. It can calculate false discovery rates (FDRs) for sets of biological significant genes. This tool deletes bias introduced by the immunoprecipitation (IP) using a linear normalization technique based on principal component analysis (PCA).
Provides a spreadsheet application to perform plant–nematode interactions analysis. NEMATIC gathers transcriptomic data linked to other external transcriptomes and groups of genes to allow quick data management. The program allows users access to various features such as: (i) searching, selecting and filtering the genes of interest; (ii) selecting and filtering genes by Genevestigator expression values and; (iii) obtaining compiled information about filtered genes.
Reveals the genetic basis of variation in immune cell traits based on gene expression data. VoCAL is a deconvolution-based method that uses expression profiles from a complex tissue across a population of individuals to calculate relative cell type abundance values in each individual, and then identify the underlying genetic variants on the basis of the predicted cell type abundance. The software avoids cell quantification by inferring the immune traits indirectly. It has been implemented in the freely available R package ComICS.
Provides computational biology resources that are related to the genome/transcriptome of the model plant Physcomitrella patens (Aphanoregma patens). Cosmoss is a web app that offer both the transcriptome representation (including a BLAST and retrieval service) and splice site prediction of Physcomitrella. The moss Physcomitrella patens is an emerging plant model system due to its high rate of homologous recombination, haploidy, simple body plan, physiological properties as well as phylogenetic position.
Provides a scientific collaborative platform for laboratories interested in the horse functional genomics and provide a replicable model for other organisms. The Horse_Trans is a pipeline for analysis of RNAseq. It permits to make use of RNAseq as a transcriptional evidence to provide more accurate gene models, compare tissue specific gene models and test different approaches for effective integrative analysis of several RNAseq experiments.
Elucidates traits of functionally connected gene family entities in two whole genomes. O2EM can discover functional orthologs in a pair of species, deduce a meaningful sub-clustering of large gene families by gene expression data and disclose a row of similarly differentially expressed genes in the light of sequence similarity. It displays each single gene family in a two-dimensional matrix format.
Enables the conversion of lists of genes with different identifiers and from different WormBase (WB) versions to a single coherent format. In order to permit accurate cross-release comparisons, WormBase Converter has to take into account all successive changes in gene annotation and modifies the list of gene identifiers accordingly. It can handle such lists and also displays any gene name for which an associated WB ID gene identifier cannot be found. Furthermore, if a new WB version is released, the tool automatically retrieves the new data from the server this making it easy to keep lists up to date.
Allows to hypothesize a mechanism of cell response to knockout and heat shock, as well as a mechanism of gene expression regulation in presence of RNA polymerase competition. Rivals can be applied to estimate intensities of binding of the holoenzyme and phage type RNA polymerase to their promoters using data on gene transcription levels. This model can also be used to make functional predictions, like heat shock response in isolated chloroplasts
Rates digital audio recordings of psychotherapy sessions and other behavioral interactions. CACTI is a flexible and transcript-free program for facilitating the parsing and sequential coding of auditory behavioral interactions between two or more participants (such as those between clinician and client in a psychotherapy session). The software employs three modes: (1) parsing continuous behavioral data from a WAV audio file into codable utterances, (2) sequentially coding previously parsed utterances, and (3) assigning Likert-type global ratings.
Provides support for the composition of RESTful web services semantically annotated using SAWSDL. SemanticSCo is a supporting platform that supports the definition of constraints/conditions regarding the order in which service operations should be invoked, thus enabling the definition of complex service behaviours. It also provides support for the definition of analysis workflows at a high-level of abstraction, thus enabling users to focus on biological research issues rather than on the technical details of the composition process.
Provides automated and guided generation of circular visualization of large scale genomics and transcriptomics data. CGDV provides prepackaged karyotype files for various model organisms and generates config file based upon the genomics and transcriptomics data provided by the user. It can be applied to micro-organism such as bacteria and fungi genome, and to large organisms such as human and mouse genome.
Assists users in inferring Cox proportional hazard models. DegreeCox uses a priori knowledge to leverage the correlation or functional information present in gene expression data. This tool can be for instance used to detect gene expression signatures associated with survival of ovarian carcinoma patients. It can serve for prediction of a patient’s risk and identification of genes associated to death events from large-scale ovarian cancer gene expression datasets.
Provides a platform giving access to a comparative transcriptome analysis of two RNA-Seq datasets. PARRoT is a web application that aims to furnish an analysis strategy including pooled-assembly, clustering contigs on virtual transcripts and several quantification methods. The software performs de novo assembly, functional assignment, and differential gene analysis to categorize the datasets by Gene Ontology analyses.
Implements a new peak-calling algorithm based on an aggregate Gaussian mixture model (AGMM). The l1kdeconv package contains a novel peak calling algorithm for LINCS L1000 data, and provides a stable and accurate deconvolution algorithm for LINCS L1000 data. It has two components: the outlier detection and aggregated Gaussian mixture model.
Allows estimation of the proportion of immune and cancer cells from bulk tumor gene expression data. EPIC incorporates reference gene expression profiles from each major immune and other non-malignant cell type for modelling bulk RNA-Seq data as a superposition of these reference profiles. The software can be used with reference gene expression profiles pre-compiled from circulating or tumor-infiltrating cells, or provided by the user.
Offers a platform for gene expression comparisons between embryogenesis and regeneration in Nematostella. NvERTx is a web application supplying transcript models for the identification of gene sequences. The application gives access to a repository that can be browsed by gene name, JGI ID or by retrieving a personal sequence through a BLAST tool. In addition, users can test for conservation of regeneration gene batteries, investigate gene expression clusters, mine data and detects groups of co-expressed genes.
Constructs cluster hierarchy. SilHAC employs a Silhouette Index based criterion to choose the pair of clusters. It is based on a hierarchical agglomerative clustering algorithm. This tool enables users to retrieve optimal number of clusters and the associated clustering solution without taking any input other than the dataset. It can minimize the number of initial clusters to deal with using an iterative cluster merging process.
Provides a variety of multivariate outlier detection and subspace techniques. OuRS is a minimum covariance determinant (MCD) algorithm for non-quantitative data. The MCD model exploits a widely-used approach to identify multivariate structures and likely outliers. It uses the correspondence analysis to define Mahalanobis distances via the singular vectors and allows user to virtually realize any data type.
Provides an approach for detecting three-way clustering patterns in multi-tissue multi-individual gene expression data. MultiCluster is an algorithm that proposes a multi-way clustering method based on a tensor decomposition that uses nonnegativity constraints and the sharing of information across modes to detect multi-modal specificities in this context. It was tested on simulated and on Genotype-Tissue Expression (GTEx) dataset.
Predicts gene expression levels based on transcription factor (TF) binding alterations inferred from cis-regulatory variants. TF2Exp is able to evaluate the impact of single nucleotide polymorphism (SNPs) in linkage disequilibrium (LD) and uncommon variants, and can determine the alteration of gene expression for over three thousand genes. It can infer regulatory regions and TF binding events of each gene based on the reference cell line.
Allows gene expression analysis. DVX exploits two mixture distributions within a linear model framework to evaluate differential dispersion and differential expression. This software offers graphical visualization parameters and several common data pre-processing options. It supports also the LIMMA model that permits the analysis of differential expression between two treatment groups.
Predicts the mean tissue-specific abundance of all genes. peaBrain can provides individual abundance on a subject-by-subject basis by including the transcriptomic consequences of genotype variation. This software models the transcriptional machinery of a tissue via a two-stage process. In addition to predicting disease-associated variants, it can search the discriminative ability of derived scores in identifying allele-specific transcription factor binding sites.
Supplies a deterministic model of protein synthesis. The PURE system simulator is a model which consists of more than 240 components, and about 900 reactions can be used to reproduce Met-Gly-Gly tripeptiide (fMGG) synthesis. This method leans on the exploitation of ordinary differential equations (ODE) coupled to items derived from an Escherichia coli-based reconstituted in vitro translation system.
Aims to increase the convergence rate. NNLM is an imputation-based method that includes an adaptation of sequential coordinate-wise descent to nonnegative matrix factorization (NMF), increasing the convergence rate. This permits to handle missing values naturally and to integrate prior knowledge to guide NMF towards a more meaningful decomposition. This model is available as an R package.
Provides tools to incorporate transcription factors (TFs) into gene imputation models. TF-TWAS consists of an extension to the transcriptome wide association studies (TWAS) cis models, by imputing models that integrate transcription factor information with transcription-wide association study methodology. The software served to test three hypotheses concerning the way polymorphism within TFs may be associated with transcription levels of their transcribed genes: (1) a model focusing on TF expression, (2) a model focusing on TF-gene binding and (3) a model combining both.
Allows users to evaluate the cell identity of engineered cell types. CellScore is a R package that provides features for computing and visualizing the cell scoring results for a given cell transition dataset. The method leans on a combination of cosine similarity of expression profiles and fractions of expressed cell type specific genes. It allows users to glean information about the success of engineered cell transition.
Provides a collection of tools to design and investigate RASL-seq experiments. RASLseqTools furnishes empirical estimates of experimental, sequencing, and alignment error. It can be used for: (1) assessing the sequence similarity of barcodes and RRASL probes, (2) demultiplexing and aligning RRASL-seq FASTQ reads, (3) annotating demultiplexed and mapped reads with meta-data, (4) displaying batch effects, and (5) correcting systematic error using normalization methods.