Identifies biases in aligned sequence data which potentially mislead phylogenetic reconstructions. BaCoCa allows a parallel determination of a suite of different statistical properties of alignments for complete concatenated amino acid and nucleotide data sets as well as for user-defined gene partitions and taxon subsets in one single process run prior to any tree reconstruction. Its results can be easily used for further analyses in programs like Excel or statistical packages like R.
Implements the codon deviation coefficient (CDC) measure, using it to characterize codon usage bias (CUB) and to ascertain its statistical significance. CAT is a software package which allows estimation of CUB by accounting for background nucleotide compositions tailored to codon positions and adoption of the bootstrapping to assess the statistical significance of CUB for any given sequence.
Offers a method able to learn the unique cell-type specific methylomes for each individual sample from its bulk data. TCA is a computational approach which can extract cell-type-specific signals from abundant cell types. This program can be used in epigenetic association studies and, broadly as a general statistical framework for obtaining underlying 3D information from 2D convolved signals.
Assists the researchers in annotating the function of a protein from its composition using whole or part of the protein. COPid has three modules called search, composition and analysis. The search module allows searching of protein sequences in six different databases. Search results list database proteins in ascending order of Euclidian distance or descending order of compositional similarity with the query sequence. The composition module allows calculation of the composition of a sequence and average composition of a group of sequences. The composition module also allows computing composition of various types of amino acids (e.g. charge, polar, hydrophobic residues). The analysis module provides the following options; i) comparing composition of two classes of proteins, ii) creating a phylogenetic tree based on the composition and iii) generating input patterns for machine learning techniques.
Consists of an annotable microarray data repository and analysis application. BASE is a Minimum Information About a Microarray Experiment (MIAME) guidelines compliant application. It features a web browser user interface, laboratory information management system (LIMS) for biomaterials and array production, annotations, hierarchical overview of analysis. It also integrates tools like MultiExperiment Viewer (MEV) and GenePattern.
A de novo genome assembler using next generation sequencing (NGS) data. BASE enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads.
A whole genome pairwise and multiple alignment editor. The program highlights differences between pairs of alignments and allows the user to easily navigate large alignments of similar sequences. Although Base-By-Base was intended as an editor and viewer for alignments of highly similar sequences, it is also provides many of the functions of other generic alignment editors. In addition to visualizing genomes and protein sequences, Base-By-Base allows the user to estimate simple phylogenetic trees, calculate the numbers of conserved and non-conserved sequence positions, and test simple quantitative hypothesis using novel modifications.
Recognizes mycobacterial membrane protein and their classes. Unb-DPC uses oversampling technique Synthetic Minority Oversampling Technique (SMOTE) to remove biasness among different type of member proteins. It utilizes dipeptide compositions to extracted features from the unbiased data. This tool is able to avoid biasness among different classes and preserves protein sequence structure information simultaneously.
A composition and phylogeny-based algorithm to classify very short metagenomic reads (75-100 bp) into specific taxonomic and functional groups. MetaCV performs (for both sensitivity and specificity) as good as BlastX-based methods on simulated short reads, but runs 300 times faster, thus provides fast and accurate analysis on huge amount of NGS data. To our knowledge, MetaCV, benefited from the strategy of composition comparison, is the first algorithm that can classify millions of very short reads within affordable time.
Assesses linear copolymers composition matrices from mass spectrometry (MS) spectra data. COCONUT is an open source software able to perform spectral preprocessing including centroiding and baseline correction as well as to compute isotope patterns, evaluate copolymer composition or solve isobaric species. The software had been tested on simulated mass spectra from different monomers.
Automates recognition of statistically significant patterns of amino acid enrichment or depletion. Composition Profiler aids in the discovery of statistically significant composition anomalies by color-coding and sorting residues according to their physico-chemical or structural properties. It permits to highlight bias in amino acid composition between two sets of protein sequences.
Evaluates candidate structures for a set of homologous RNAs on their ability to reproduce the patterns exhibited by biological structures. SPuNC is a structure prediction method that consists of the following steps: (1) given a multiple sequence alignment, generate an ensemble of candidate structures; (2) score all structures in the ensemble with a scoring function; and (3) return top-scoring structure(s) or consensus structure.
A program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads. CONCOCT does unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples.
Adopts the concepts of Harshlight, but implements them in a manner that utilizes the unique characteristics of the Illumina technology. BASH requires knowledge of the direct neighbours of a bead, and the identities of other ‘nearby’ beads. BASH differs from Harshlight in the compact defect step in three important ways: (i) the outliers are calculated within an array from the replicate beads, rather than from replicate arrays; (ii) the minimum size is specified rather than being estimated from simulated data; and (iii) the compact defect step is iterated rather than being performed once. BASH forms part of the beadarray Bioconductor package.
Assists users to observe DNA and protein sequence data from different species and populations. MEGA is composed of several tools allowing researchers to work on phylogenomics and phylomedicine. This repository includes features aiming to determine gene duplication events in gene family trees. Moreover, this tool is available through a graphical user interface (GUI) and a command line interface.
Permits to represent protein sequence features. SSE-ACC uses amino acid composition (AAC), k-mer composition, and amino acid composition methods to extract concerned features. It is able to generate 100-dimensional feature vectors. The first 60 dimensions are used to describe the frequency of each amino acid in each of the three possible secondary structure elements and the last 40 dimensions represent the frequency of each amino acid having each of the two possible solvent accessibility states.
Gathers several functional enrichment analysis tools based on combinatorial optimization. CopTea is an R package that includes a network-based probabilistic generative model which identifies important gene ontology (GO) terms indirectly linked with the active gene list, as well as a statistical framework for combination-based functional enrichment analysis. It can serve for biological and medical research.
A method based on principal component analysis (PCA) and designed for the correction of cell type heterogeneity in epigenome-wide association studies (EWAS). ReFACTor tool is based on a variant of PCA and can be applied to any tissue. It selects the sites that can be reconstructed with low error using a low-rank approximation of the original methylation matrix. Moreover, ReFACTor does not use the phenotype in the selection process, making ReFACTor useful as part of a quality control step in EWAS.
A program to generate various different modes of Chou's general PseAAC, such as the gene ontology mode, the functional domain mode, and the sequential evolution mode. PseAAC-General allows the users to define their own desired modes. In every mode, 544 physicochemical properties of the amino acids are available for choosing. The computing efficiency is at least 100 times that of existing programs, which makes it able to facilitate the extensive studies on proteins and peptides.
Catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, Oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions reported in peer reviewed research articles. It proposes improved search, filtering and extended data display functions. A PHIB-BLAST search function is provided and a link to PHI-Canto, a tool for authors to directly curate their own published data into PHI-base. PHI-base contains information from 2219 manually curated references. The data provide information on 4460 genes from 264 pathogens tested on 176 hosts in 8046 interactions.
Consists of a collection of bio-imaging analysis software. BISE aims to summarize the applied problems that these applications can solve. It utilizes a crowdsourcing technique fostering exchanges and collaboration to collect relevant information. This platform assists researchers to find appropriated tools for their investigation, to recognize and edit workflows, and to discover new tools.
Compiles information about the red flour beetle Tribolium castaneum. iBeetle-Base gathers sequence information and links for the totality of genes of the Tribolium castaneum as well as annotations of RNAi phenotypes, described with controlled vocabularies and the Tribolium morphological ontology (TrOn). Searches can be made by specific phenotypes, gene names or IDs, and files of interest can be downloaded as an Excel, CSV or a PDF file.
Provides a complete summary database from 1094 genome-wide association studies (GWAS) on diseases and other complex traits. MR-Base is a platform using data to perform Mendelian randomization (MR) tests and sensitivity analyses. MR-Base exists conceptually as two-part framework: (i) it is a repository of harmonized published GWAS summary data which has been aggregated from disparate and heterogeneous sources on traits from across the phenome; (ii) it plays host to a range of causal estimation methods and automatically applied sensitivity analyses that can be used to improve the reliability of causal inferences.
Provides a repository dealing with silkworms. Silkworm Base intends to provide a genetic resource stock information, especially about mutation. The database includes more than 470 strains and over 340 genes/alleles, from egg, larva, cocoon, pupa and adults, collected by the Institute of Genetic Resource of Kyushu University.
Exploits within-species K-mer arrays and allows cross-species comparisons, for phylogeny building. KGCAK aims to capture features of genome sequences and turns digital K-mer arrays from genomes into easy-to-understand and visualized data from a comparative genomics perspective. The database proposes 3 access modes: one to compare multiple species at the same time, one to view the data, and another to explore data in a single genome fashion.
Provides a database for cryopreserved embryos. CARD R-BASE is an online resource that permits the access of (i) different type of strains like inbred, mutant of nature / artificial, transgenic, targeted mutation, gene trap, or mutant of insertion, (ii) genes, (iii) newspaper’s references that mention mouse genome and (iv) disease of mice. A strain file is also downloadable in English and in Japanese.
Hosts observed and modeled base triples, organized in two ways, by geometric triple family and by three-base combination. The RNA Base Triples Database provides separate web pages for each base triple family and for each three-base combination. To view a particular base triple family, the user clicks on the cell corresponding to that family. In this table, green colored cells indicate families with observed instances of fully annotated base triples. The number in each cell reports the number of distinct base combinations observed for that family. Yellow cells designate families having no fully annotated instances. The classification helps to identify recurrent triple motifs that can substitute for each other while conserving RNA 3D structure, with applications in RNA 3D structure prediction and analysis of RNA sequence evolution.
Allows the global investigation of more than 100 RNA modification types and reveals extensive and complex post-transcriptional modifications (PTMs) of RNA. RMBase provides a variety of interfaces and graphic visualizations to facilitate analyses of the massive modification sites in normal tissues and cancer cells. Moreover, this platform assists researchers to discover potential functional roles of RNA modifications hidden in different data.
Holds data on 42 parameters of 250 natural and 503 semisynthetic analogs of taxanes. TaxKB enables the user to search data on the structure, drug-likeness, and physicochemical properties of both natural and synthetic taxanes with a “General Search” option in addition to a “Parameter Specific Search”. It aims to provide information on Absorption, Distribution, Metabolism, and Excretion/Toxicity as well as data on bioavailability and target interaction properties of candidate anticancer taxanes, ahead of expensive clinical trials.
A comprehensive web resource developed for bridging soybean translational genomics and molecular breeding research. It provides information for six entities including genes/proteins, microRNAs/sRNAs, metabolites, single nucleotide polymorphisms, plant introduction lines and traits. It also incorporates many multi-omics datasets including transcriptomics, proteomics, metabolomics and molecular breeding data, such as quantitative trait loci, traits and germplasm information. Soybean Knowledge Base has a new suite of tools such as In Silico Breeding Program for soybean breeding, which includes a graphical chromosome visualizer for ease of navigation. It integrates quantitative trait loci, traits and germplasm information along with genomic variation data, such as single nucleotide polymorphisms, insertions, deletions and genome-wide association studies data, from multiple soybean cultivars and Glycine soja.
Aims to facilitate studies of the immune system. IKB is a resource that provides information about genes and proteins involved in immunological processes, their evolutionary history, orthologous genes and genetic variations at many levels including single nucleotide polymorphisms (SNPs), disease-causing mutations, alternatively spliced variants and copy number variations. The data provided can be used for large scale studies targeting immune systems.
Supplies an unified environment for predictive biology which gathers data tools, and their associated interfaces. KBase aims to ease creation, execution and collaboration around reproducible analyses and allows users to share it publicly or with individuals. KBase integrates data model that will increasingly support user-driven and automated meta-analysis and is useful for build models of dynamic cellular systems for microbes and plants.
Integrates the available experimental, functional, structural and sequential information about protein-protein interactions (PPIs). HINT-KB is a database that combines primitive information to produce new knowledge. It produces information by calculating an accurate confidence score for each protein pair. This database can be employed to produce reference datasets to compare different PPI computational prediction methods.
Provides structured and detailed information about the experimental conditions under which aptamers were selected and their binding affinity quantified. Aptamer Base is an open collaborative database with a unique resource that can be updated and curated in a decentralized manner, thereby accommodating the ever evolving filed of aptamer research. Aptamer Base describes the pH, temperature, salt concentrations and the buffering agent for all systematic evolution of ligands by exponential enrichment (SELEX) experiments.
Collects 16S rRNA sequences from a large number of datasets. MetaMetaDB is a comprehensive (‘‘meta-’’) and compact database that contains collection of 16S rRNA sequences associated with diverse environments. Users can submit the 16S rRNA sequences of certain prokaryotes and thus can investigate the microbial habitability for analyzing the ecology and evolution of prokaryotes. The database provides a reverse perspective of the environments in which each prokaryotic group exists, opening the door to the investigation of ‘‘meta-metagenomics’’.
Gathers information about human and animal cell lines available from repositories and laboratories throughout Italy and from several major collections in other European countries. CLDB includes over 4920 human and 1380 animal cell lines descriptions pages and about 1000 index pages based on terms from controlled vocabulary. Searches can be made by cell line name or free text search.
Provides information about clinical cancer variants and interpretations in a structured way. PMKB is an interactive online application for collaborative editing, maintenance, and sharing of structured clinical-grade cancer mutation interpretations. The database contains 457 variant descriptions with 281 clinical-grade interpretations. The EGFR, BRAF, KRAS, and KIT genes are associated with the largest numbers of interpretable variants. The interpretations are accessed either directly via the Web interface or programmatically via the existing application programming interface (API).