Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

1 - 50 of 61 results
filter_list Filters
language Programming Language
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 50 of 61 results
Allows users to explore PubMed search results with the Gene Ontology (GO), a hierarchically structured vocabulary for molecular biology. GoPubMed provides the following benefits: first, it gives an overview of the literature abstracts by categorizing abstracts according to the GO and thus allowing users to quickly navigate through the abstracts by category. Second, GoPubMed automatically shows general ontology terms related to the original query, which often do not even appear directly in the abstract. Third, it enables users to verify its classification because GO terms are highlighted in the abstracts and as each term is labelled with an accuracy percentage. Fourth, exploring PubMed abstracts with GoPubMed is useful as it shows definitions of GO terms without the need for further look up.
A web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein-protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form.
A hybrid system for extracting chemical entities from natural language texts. ChemSpot is based on a conditional random field trained for identifying International Union of Pure and Applied Chemistry (IUPAC) entities and a dictionary built from ChemIDplus for extracting drugs, abbreviations, molecular formulas and trivial names. Evaluations showed a major performance advantage compared with a freely available named entity recognition tool for chemical entities, OSCAR4. Thus, we believe that ChemSpot sets a new state-of-the-art in the recognition of chemical entities.
A web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents.
A text-mining software tool that integrates several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem, and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g., scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have pre-processed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text.
CIIPro / Chemical In vitro-In vivo Profiling
A package to link chemical features and in vitro biological data with targeted in vivo biological activity. The CIIpro portal can automatically extract in vitro biological data from public resources for user-supplied compounds, and identify the most similar compounds based on their optimized bioprofiles. Compared to the existing hybrid approaches, the CIIPro portal provides a new read-across strategy to deal with missing data and biased data issues when using public data sources.
Cell line recognition
Cell line recognition and normalization system, supporting corpora and tagged documents. The aim is to create corpora that is suitable for training and evaluating machine learning systems to recognize and normalize established cell line names from text. We created two manually annotated corpora, Gellus and CLL. Gellus is suitable for the training of any machine learning systems in recognizing cell line name mentions while CLL is for evaluating the systems in recognizing the Cellosaurus cell line names.
eFIP / extracting Functional Impact of Phosphorylation
A tool to support article selection and information extraction of functional impact of phosphorylated proteins. The current version focuses on protein-protein interactions (PPIs) as functional impact. In eFIP, PPIs refer to interactions between protein elements, including protein complexes and classes of proteins. Impact is defined as any direct relation between protein phosphorylation and PPI. The relation could be positive (phosphorylation of A increases binding to B), negative (when phosphorylated A dissociates from B) or neutral (phosphorylated A binds B).
An open-source software tool for identifying chemical names in biomedical literature, including chemical identifiers, drug brand and trade names and also systematic formats. tmChem uses conditional random fields with a rich feature set and rule-based post processing modules for resolving local abbreviations and improving consistency. tmChem achieved the highest performance of any submission to the BioCreative IV CHEMDNER task (over 87% F-measure). The tmChem system combines two linear chain conditional random fields (CRF) models employing different tokenizations and feature sets. Model 1 is an adaptation of the BANNER named entity recognizer. It uses the MALLET toolkit and is implemented in Java. Model 2 is repurposed from part of the tmVar system for locating genetic variants. It uses the CRF++ toolkit and is implemented in Perl and C++. Both models employ multiple post processing steps.
OSCAR / Open-Source Chemistry Analysis Routines
A software for the recognition of named entities and data in chemistry publications. OSCAR4 can be used to identify chemical names, reaction names, ontology terms, enzymes and chemical prefixes and adjectives, and chemical data such as state, yield, IR, NMR and mass spectra and elemental analyses. In addition, where possible, any chemical names detected will be annotated with structures derived either by lookup, or name-to-structure parsing using open parser for systematic IUPAC nomenclature (OPSIN) or with identifiers from the Chemical Entities of Biological Interest (ChEBI) ontology.
Describes systematic chemical nomenclature. LeadMine is used for the identification and annotation of chemicals, protein targets, genes, diseases, species, named reactions, company names, cell lines. It uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. LeadMine differs from conventional machine learning approaches by being able to attribute all entities to a specific dictionary or grammar.
Exploits unlabeled data for incorporating domain knowledge into a named entity recognition model. BANNER-CHEMDNER includes natural language processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER unstructured information management architecture interface.
Identifies non-elliptical entity mentions in a coordinated noun phrase (NP) with ellipses. medtextmining proposes both intuitive graph-like and formal algebraic representation of a coordinated NP with ellipses. It is based on a practical named entity recognition (NER) system that effectively identified non-elliptical entity mentions using linguistic rules and an entity mention dictionary. The system was optimized by the Apriori algorithm which greatly reduces processing time for resolving ellipsis.
BioInfer / Bio Information Extraction Resource
Provides the key types of annotation for a single set of sentences, expressing complex relationships between both physical and abstract entities. BioInfer is a public resource providing an annotated corpus of biomedical English that aimed at developing information extraction (IE) systems and their components in the biomedical domain. This corpus is unique in the domain in combining annotation types for a single set of sentences, and in the level of detail of the relationship annotation.
OpenDMAP / Open Source Direct Memory Access Parser
Advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. OpenDMAP is an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering.
biomsef / BIOMedical Search Engine Framework
An open-source framework for the fast and lightweight development of domain-specific search engines. biomsef integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces.
miRLiN / miRNA Literature Network
A semantic indexing method to extract relationships between terms and miRNAs directly from the biomedical literature. miRLiN provides access to a latent semantic indexing model, which contains the most recent and comprehensive collection of miRNA abstracts in MEDLINE. Users can query the model with any combination of terms or miRNAs. When querying with terms, miRLiN ranks all miRNAs in the collection with respect to semantic associations to the query. Selected miRNAs and terms can be visualized as a network graph, where the nodes represent the selected miRNAs and terms and the edges represent cosine values. LSI modeling of MEDLINE abstracts can be useful for knowledge discovery.
A text mining tool to find new associations between drugs. DrugQuest clusters DrugBank records based on their textual information in a multidimensional vector space. We mainly apply partitional clustering algorithms in order to group together DrugBank records based on their textual information. Toxicity, targeted pathways, targeted proteins, diseases and/or other interactors are few examples of such textual information. Uniquely assigning DrugBank records into clusters, based on tagged terms such as pathways diseases, molecules, biological processes, can make DrugQuest a promising tool for new concept discovery and detection of new drug associations.
PWTEES / PathWay Turku Event Extraction System
Extracts pathway interactions from the literature utilizing an existing event extraction tool and pathway named entity recognition (PathNER). PWTEES can be used to enrich the molecular context of diseases by applying large-scale text mining of events involving genes and pathways. We extended a state-of-the-art text mining system by introducing pathway named entity recognition to identify interactions involving both genes/proteins and pathways.
Improves grounding and relationship resolution for molecular entities commonly encountered in mining and curation of biomedical text. Bioentities is a curated resource that contains a set of identifiers representing protein families and complexes along with multiple types of mappings: (i) links between text strings and Bioentities identifiers, (ii) between Bioentities identifiers and identifiers representing protein families and complexes in other resources, and (iii) between Bioentities families/complexes and their constituent members.
Presents pre-processed input from the underlying parsing, protein recognition and DB identifier assignment systems. Eighteen thousand full text articles are indexed by GNSuite, and more than eighteen million abstracts from PubMed by MEDIE. The system accepts several sources of input such as, MEDIE, GNSuite, and LINNAEUS. This can easily be extended with other systems that provide stand-off annotations, since each system is presented in a separate tab in the user interface. All underlying results are integrated to improve recall.
NERsuite / Named Entity Recognition Suite
Simplifies research experiments. NERsuite uses various combinations of different NLP applications such as tokenizer, POS-tagger, lemmatizer and chunker to proceed. It contains three sub-functions: (1) a tokenizer, (2) a modified version of the GENIA tagger and (3) a named entity recognizer. This tool was tested on two biomedical Named Entity Recognition (NER) tasks. It is able to computes the beginning and the past the end positions of a given sentence.
1 - 3 of 3 results
filter_list Filters
call_split Taxonomy
build Data Access
copyright License
1 - 3 of 3 results
Provides information about cell lines. Cellosaurus is composed of immortalized cell lines, naturally immortal cell lines, finite life cell lines when those are distributed and used widely, vertebrate cell line with an emphasis on human, mouse and rat cell lines, and invertebrate cell lines. It contains more than 100 000 cell lines, representing over 550 species. The database furnishes data such as synonyms, cross-references and references to publications, databases or ontologies.
1 - 2 of 2 results
filter_list Filters
computer Job seeker
Disable 1
thumb_up Fields of Interest
public Country
1 - 2 of 2 results

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.