Weka / Waikato Environment for Knowledge Analysis
Assists users in exploring data using inductive learning. Weka includes methods for inducing interpretable piecewise linear models of non-linear processes. It contains learning algorithms: (i) classifiers for both classification and regression, (ii) meta-classifiers that can improve the performance of the base classifiers, association rule learners, unsupervised learning methods (clustering) and (iii) a number of methods for pre-processing data called filters.
An abbreviation dictionary automatically constructed from the whole MEDLINE as of April, 2009. Acromine identifies abbreviation definitions by assuming a word sequence co-occurring frequently with a parenthetical expression to be a potential expanded form. Applied to the whole MEDLINE (9,635,599 abstracts), the implemented system extracted 68,007 abbreviation candidates and recognized 467,402 expanded forms. The current Acromine achieves 99% precision and 82-95% recall on our evaluation corpus that roughly emulates the whole MEDLINE.
PLAN2L / Plant annotation to literature
A web tool for integrated text mining and literature-derived bio-entity relation extraction. PLAN2L facilitates a more efficient retrieval of information relevant to heterogeneous biological topics, from implications in biological relationships at the level of protein interactions and gene regulation, to sub-cellular locations of gene products and associations to cellular and developmental processes, i.e. cell cycle, flowering, root, leaf and seed development.
GEM / GAAIN Entity Mapper
Automates data mapping across different datasets or from a dataset on Alzheimer’s disease to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. GEM allows researchers from around the world who have collected data on Alzheimer’s disease and aging to participate in a collaborative effort of data sharing.
An automatic curation system to extract genomic knowledge from PubMed articles and made this knowledge available in the cloud with a website to facilitate browsing, searching, and reasoning. Currently, Literome focuses on two types of knowledge most pertinent to genomic medicine: directed genic interactions such as pathways and genotype–phenotype associations. Users can search for interacting genes and the nature of the interactions, as well as diseases and drugs associated with a single nucleotide polymorphism or gene. Users can also search for indirect connections between two entities, e.g. a gene and a disease might be linked because an interacting gene is associated with a related disease.
PESTO / Parameter EStimation Toolbox
Combines many state-of-the-art methods which are applicable to a wide array of different parameter estimation problems. PESTO offers different features: (1) multi-start local optimization and interfaces to global and hybrid optimizers; (2) optimization-, integration-based or hybrid profile calculation for uncertainty and identifiability analysis; (3) several sampling methods for uncertainty and identifiability analysis; (4) visualization of all analysis results; (5) and efficient work flow and optional parallelization. Users can easily customize this tool.
A tool to identify, extract and map point mutations from literature on the protein reported. It uses a protein sequence or a multiple sequence alignment in fasta format as input. It uses a Uniprot ID or an Accession code provided as a descriptor to retrieve protein information from Uniprot. From there, it extracts protein names, gene names and synonyms. These are used as keywords to download abstracts. The retrieved abstracts are scanned for mutations. Mutations reported are mapped onto the amino acid sequence.
VDA / Validation Discriminant Analysis
Minimizes the number of validation experiments necessary for reliable performance estimation and fair comparison between algorithms through a cost-efficient method. VDA is a method for designing a minimal validation dataset to allow reliable comparisons between the performances of different algorithms. Implementation of the VDA approach achieves this reduction by selecting predictions that maximize the minimum Hamming distance between algorithmic predictions in the validation set. VDA can be used to correctly rank algorithms according to their performances.
Provides an easy way to build XML files following the Helmholtz Open BioInformatics Technology network (HOBIT) format descriptions from inside the user's own programs. BioDOM is designed to be a modular system which can easily be extended as necessary to accomodate new formats. Additionally, it provides functions to convert native non-XML output of various bioinformatic tools to the HOBIT XML formats. In addition to these functions, there are conversion functions for many commonly used non-XML formats, which allow traditional tools and services a smooth transition from their data formats towards the XML formats.
Generalized ADP / Generalized Algebraic Dynamic Programming
Intends to ease the production of dynamic programming algorithms. Generalized ADP is a framework which is based on the separation of traversal states of space, scoring and the user-wanted solutions. This method aims to assist users in programs implementation by the merging of multiple and reusable components coupled with a standardized grammar depicting the types of the attribute functions attached to each production rule.
An ontology-driven, concept-based literature search engine developed at RGD. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed.
Facilitates biomedical event trigger recognition. TrigNER takes advantage of a configurable optimization algorithm that allows the software to adapt itself to corpora with various events and domains. It exploits conditional random fields (CRF) and provides sets optimized for the linguistic and context characteristics of each event type. This software is a machine learning-based solution and exploits a configurable algorithm to correct the feature set and training parameters for every event type.
Europe PMC
A unique, free, information resource for biomedical and health researchers. Formerly known as UKPMC, the service was rebranded in November 2012 as Europe PMC to reflect the scope of the funding agencies that support it. Europe PMC offers RESTful web services to access both articles and grants, powerful search tools such as citation-count sort order and data citation features, a service to add publications to your ORCID, a variety of export formats, and an External Links service that enables any related resource to be linked from Europe PMC content.
Provides access to useful molecular biology number. BioNumbers is a dynamic database of key numbers in molecular and cell biology. It aims to connect researchers to numbers available in the literature. The database contains more than 4500 distinct properties from more than 200 organisms. Registered users can contribute entries by providing information on values of interest that have been published in a peer reviewed journal and can also provide commentary on numbers already in the database.
NIF / Neuroscience Information Framework
Gives access to a searchable collection of neuroscience data, a catalog of biomedical resources, and an ontology for neuroscience. NIF is a dynamic inventory of web-based neuroscience resources designed to serve neuroscience investigators by facilitating directed and intelligent access to data and findings, aiding integration, synthesis, and connectivity across related data and findings, stimulating new and enhanced development of neuroinformatic resources, and enabling new and enhanced analyses of data.
Colil / Comments on Literature in Literature
Provides a citation-based database about life science domains. Colil database contains citations, citation contexts and co-citations extracted from full-text publications. This database is built as a Linked Open Data (LOD) and uses the Resource Description Network (RDN). This tool offers three different services: an easy to search service, an ftp site and an advanced query builder. It aims to make biological research more efficient for researchers.
BioLit / Biological Literature
Provides access to semantically enriched content for all research articles from PubMed Central (PMC). BioLit is a database that includes full text or excerpts of open access articles directly within existing biological databases, and adds newly generated metadata to the articles for increasing their informative value. It applies a text-mining pipeline to identify ontology terms provided by a number of ontologies from the National Center for Biomedical Ontology (NCBO), as well as PDB IDs.
