Adapts the popular X!Tandem peptide search engine to work with Hadoop MapReduce for reliable parallel execution of large searches. MR-Tandem is designed to drop in wherever X!Tandem is already in use and requires no modification to existing X!Tandem parameter files, and only minimal modification to X!Tandem-based workflows. It runs on any Hadoop cluster but offers special support for Amazon Web Services for creating inexpensive on-demand Hadoop clusters.
Facilitates access to the literature relevant for the problem of kinetic modelling of metabolism. KiPar is able to retrieve documents that are likely to contain a value of a given parameter applicable to a given reaction. It aims to reduce the time involved in the kinetic modelling of metabolic pathways. This tool user to put multiple reactions in a single search request. It can serve to identify patterns for extracting information regarding kinetic parameters.
Allows users to enter queries to find MeSH terms closely related to the queries. Meshable relies on co-occurrence of text words and MeSH terms to find keywords that are related to each MeSH term. A query is then matched with the keywords for MeSH terms, and candidate MeSH terms are ranked based on their relatedness to the query. The experimental results show that our method achieves the best performance among several term extraction approaches in terms of topic coherence. Moreover, the interface can be effectively used to find full names of abbreviations and to disambiguate user queries.
A web-based tool that gives a clear and handful overview of the bibliography available corresponding to the user input. Users provides as input a gene list (expressed by gene names or ids from EntrezGene), a context of study (expressed by keywords), and optionally a range date. From this input, GeneValorization provides a matrix containing the number of publications with co-occurrences of gene names and keywords for the given range date. Gene synonyms and Mesh terms are leveraged when searching for publications. Graphics are automatically generated to assess the relative importance of genes within various contexts. Links to publications and other databases offering information on genes and keywords are also available.
A data transfer tool that abstracts scientific data repositories. BDSS consists of three parts: metadata repository, BDSS transfert client and an integration as a Galaxy data transfer tool. BDSS has the ability to take a data file manifest and look-up an alternate host from a curated database and select an optimal data transfer method for the source and destination computers. BDSS allows a researcher, who may be unaware of available technologies, to perform faster transfer by asking for data in a familiar ways. BDSS is unique in that it is tied to a curated inventory of file data transfer nodes, networks, and data transfer methods, and allows scientists to take advantage of advances in data transfer technologies even when they are unware of them.
Serves for the linguistic and informational analysis of large collections of biological sequences. KCH is a suite of MapReduce algorithms that allows users to extract either canonical or non-canonical k-mer statistics. It requires a Hadoop cluster and MapReduce to perform these analyses. It can be used for solving both local k-mer statistics (LS) and cumulative statistics (CS).
Computes biomedical semantic sentence similarity. BioSSES is a web-based system that utilizes WordNet as the general domain ontology and unified medical language system (UMLS) as the biomedical domain specific ontology. It was evaluated thanks of a benchmark data set consisting of 100 sentence pairs from the biomedical literature that is manually annotated by five human experts.
Provides a software system for the automatic evaluation of database search methods. Phase4 offers a large variety of evaluation scenarios, methods, performance measures and visualisations and can be extended easily. It offers a logical structure of common evaluation framework by a division into four phases: a construction, an execution, an evaluation and a report phase.
A tool to support article selection and information extraction of functional impact of phosphorylated proteins. The current version focuses on protein-protein interactions (PPIs) as functional impact. In eFIP, PPIs refer to interactions between protein elements, including protein complexes and classes of proteins. Impact is defined as any direct relation between protein phosphorylation and PPI. The relation could be positive (phosphorylation of A increases binding to B), negative (when phosphorylated A dissociates from B) or neutral (phosphorylated A binds B).
Allows users to search biomedical literature. SEACOIN is an interactive tool that includes several features including hypothesis generation based on both open discovery and closed discovery models and extractive summarization of top ranked abstracts that are associated with the query. It also contains other features such as: (1) usage of a controlled vocabulary to limit the k-ary trees to genes and more; (2) incorporation of text preprocessing steps; or (3) a “history” feature to store previously queried computed co-occurrence networks.
Consists of a modular document processing pipeline with extensible components for natural language processing that achieve state-of-the-art performance for POS tagging and chemical named entity recognition. ChemDataExtractor is able to automatically extract chemical information from scientific documents, facilitating the creation of massive chemical databases with minimal time and effort. The system provides a table processor for extraction of tabulated experimental properties and document-level processing algorithms to resolve data interdependencies and produce unified chemical records that incorporate information from multiple document domains.
Provides a method to facilitate knowledge discovery. MeSHy is an implementation of an algorithm that extracts the MeSH terms from the retrieved documents. It filters them excluding the trivial ones and then probabilistically scores and ranks pairs of MeSH terms derived from each document. The filtering is performed in two stages and its purpose is to keep the most informative and descriptive MeSH terms of the query.
Provides information about the roles of genes in stem cells of different types using evidence drawn from the sentences in biomedical texts. StemTextSearch can identify the roles of genes in stem cells using token sentences and queries that specify: (i) gene, (ii) category of stem cell, (iii) gene role, (iv) gene regulation, (v) cell process, (vi) stem-cell regulation, and (vii) species. Users can choose a gene and select the type of stem cell to query in StemTextSearch.
A sentence-based search tool for accessing published research articles related to a set of genes and concepts (keywords). Input may consist of several hundred genes and any number of concepts. Concepts can be simple keywords or more sophisticated ontology terms. Ferret may be used to efficiently perform a broad scan of the bioscience literature for information about genes of interest. Searches are conducted against PubMed, so Ferret is as current as PubMed. Ferret process retrieved PubMed documents to find sentences for each gene-concept pair and also for each gene-gene pair. It is species agnostic in its function.
Gives access to Medline and various other medical literature databases. HighWire Press offers search engine functions designed for medical literature. It can locate a particular article in a set of about 1000 journals and more than one million of free, full-text articles online. This search engine is queryable using author names, keywords, journal names, year of publication, and Medical Subject Headings such as PubMed.
Allows to search scientific articles from references of genes. PaperBLAST presents results of the user research by a list of links. Each link leads to scientific publication. This tool uses EuropePMC to search the full text of scientific articles and contains more than 700 000 scientific articles that mention over 400,000 different proteins.
Corpus-Transcriptional-Regulation is a corpus with graded textual similarity evaluated by curators and designed specifically oriented to a purpose. This tool was used as training and evaluation source of truth for Natural Language Processing (NLP) tools.
Permits visual evaluation of topics within a document corpus. Adjutant employs the full document corpus for unsupervised topic clustering. It furnishes functions to filter datasets and to conduct random sampling, including stratified sampling, with or without sampling weights. It uses linear models to estimate the optimal hdbscan minimum cluster points (minPts) parameter.
Browses and extracts images from The Cancer Genome Atlas that includes common characteristics as a user-submitted histopathological image. Luigi computes deep texture representations to propose several images classified by similarity degree. The platform provides information related with somatic mutations such as gene fusions or single nucleotides polymorphisms (SNPs), four signatures registered in the COSMIC database and two different gene expression levels including CD274.
Supports faster curation of case studies and reviews on Parkinson's and Alzheimer's disease. NapEasy is an automated PDF method that used sentence level linguistic features and spatial information across the entire publication to (i) automatically highlight sentences that matched the abovementioned five criteria and (ii) further assign highlighted sentences with one or more types of goal, method and finding.
Leverages semantic search and ontology-based query answering over a wide range of life science Linked Data, obtained from Bio2RDF. BioSearch can be applied to other application domains. It leverages keyword search and semantic query for finding information more accurately and efficiently. This tool conducts ontology-based query answering to automatically retrieve information from distributed datasets.
Assigns a field to each token or sequence of tokens in a query. Field Sensor processes by calculating a mapping between a query segment and a field, along with the likelihood of that mapping. This tool labels each segment of a query with a PubMed record field: text, title, author, journal, volume, issue, page and date.
Measures epithelial organization. EpiGraph is an image analysis method using segmented images from real epithelia or simulations to simplify quantification and comparison of packed tissues. It can compute the graphlet degree distribution agreement distance (GDD) of any epithelial tissue with other tessellation that serves as a reference. It also can analyze and compare diverse tissues or groups of cells.
Identifies, annotates, and indexes clinical documents. TIES is a natural language processing (NLP) pipeline and clinical document search engine. It supports tissue ordering and acquisition, building of Tissue Microarrays (TMAs), and integration with tissue banks and honest brokers. The tool provides a collaborative work space that enables research teams to work on queries and case sets together, even across institutions with separate TIES installations.
An application that manages many PDF files of articles, whose information can be obtained by PubMed. iPapers automatically searches and imports the information of the imported articles from PubMed DB, e.g. name of authors, title, journal name, volume, number, pages and abstract.
Gives access to research performance of about 8500 research institutions. SciVal facilitates new discoveries, collaboration and access to knowledge in order to find funding. It is a support platform that helps scientists, physicians, doctors, nurses throughout their careers. The tool permits to display research performance, benchmark and allows to analyse research trends.
Allows to download genome and assembly reports from NCBI. Genomes extracts the genus name and the species name from a scientific name. For the extraction of the species name, this tool can remove single quotes, brackets and candidate qualifiers.
Enables cataloging, annotating, browsing and fast searching in reference data and PDF files. I, Librarian is a PDF manager or PDF organizer, which enables researchers, scholars, or students to create an annotated collection of PDF articles. The program is also an advanced tool to mine scientific literature from PubMed, PubMed Central, NASA ADS, arXiv, IEEE Xplore, HighWire Press, and Springer. Academic writing is also supported.
Automates the download of genomic sequences. Genomepy is a simple software package that contains both command-line tools as well as a Python application programming interface (API). It supports providers for genomes include UCSC, NCBI and Ensembl. Downloaded genome sequences can be soft- or hard-masked and specific chromosomes or scaffolds can be either included or excluded based on regular expressions. Genomepy is a free and open source software and can be installed through standard package managers.
An end-to-end solution that can meet your most demanding scientific literature management needs. QUOSA provides smart, flexible end-to-end solutions. Share the latest full-text scientific information faster, help drive information usage and control costs all within one convenient platform.
Evaluates the significance of cocitation for two types of queries: (i) query gene set with any predefined/manually-curated gene set and (ii) query gene set with any user-defined free term set. CoCiter evaluates the significance of co-citation for any gene set from the 8,077,952 genes in the National Center for Biotechnology Information (NCBI) Entrez gene database, by using a text mining approach against the up-to-date Medical Literature Analysis and Retrieval System Online (MEDLINE) literature database. It provides a flexible and more precise approach to analyzing gene set functions, compared with the traditional function enrichment analysis.
Encodes heterogeneous original data in a uniform resource description framework (RDF) format. GORouter is a RDF model for (1) integrating heterogeneous original data with uniform RDF format, (2) creating additional mappings between pairs of terms coming from different Gene Ontology (GO) subontologies, and (3) introducing a set of reasoning rule-bases across various RDF datasets. An application for searching and browsing GO and its associations is also available.
Analyzes scientific documents to find the interactions between genes/proteins in order to reconstruct molecular networks. Biblio-MetReS relies on a central database with the genomes and gene annotation of more than 1000 organisms. It is this repository of gene names and functions that is accessed by the application when you choose your organism of interest.
A retrieval method that takes advantage of principles in image understanding, text mining and optical character recognition (OCR) to retrieve figure types defined conceptually. A search engine was developed to retrieve tables and figure types to aid computational and experimental research.
A model for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields (CRFs). FABLE can identify gene and protein mentions with fairly high accuracy even without features containing domain specific knowledge.
Screens the human genetic association literature in PubMed with high recall and specificity. GAPscreener is a support vector machine (SVM)-based application which proposes a user-friendly graphical user interface (GUI). The GAPscreener includes all components in the screening process: PubMed record retrieval from NCBI, text content processing for keyword extraction, SVM input data formatting, and SVM output display and record export. The GAPscreener could become a routine screening tool for researchers and database curators for maintaining a local reference database. GAPscreener has been used in the screening and curation of HuGE Navigator database.
A semantic search engine to answer questions in the biomedical domain. GoWeb combines classical keyword-based web search with text-mining and ontologies to navigate large results sets and facilitate question answering. User submit a query through the search form, the server preprocesses the query and sends a search request to the search service. The search service returns the first results. The first results are then annotated, highlighted, rendered and sent to the user. Compare to traditional search engines, GoWeb bridges the semantic gap with the limited amount of available semantic annotations by employing text-mining for extraction of ontology concepts from text. In a nutshell, GoWeb exploits that keywords and ontology terms co-occurring in snippets are often facts.
Provides free access to the full text of books and documents in life sciences and health care. Bookshelf is a full-text resource built and maintained by the National Center for Biotechnology Information (NCBI) within the National Library of Medicine (NLM). It includes textbooks, monographs, health reports, documentation, website content and databases. The database aims to (i) advance science and improve health care through the collection, exchange and dissemination of books and related documents and (ii)provide a permanent stable archive for the collection.
A database of semantic predications extracted from titles and abstracts of PubMed citations by SemRep, a rule-based semantic interpreter. Semantic predications are drawn from the unified medical language system knowledge sources; the subject and object pair corresponds to Metathesaurus concepts, and the predicate to a relation type in an extended version of the semantic network. SemMedDB can be used as a knowledge resource to assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support.
A searchable database of biomedical negated sentences. BioNOT can be used to extract such negated events in order to fill the gap created due to the absence of text mining applications. This database incorporates 58 million negated sentences, extracted from three sources: abstracts of articles indexed by PubMed, full-text of articles in the PubMed Central Open Access, and full-text of articles published by Elsevier publisher. After evaluating negated sentences for autism, Alzheimer's disease, and Parkinson's disease, we found many genes that are thought to be relevant by experts incorporate biomedical evidences suggesting the opposite.
Generates homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators. EVEX provides access to relevant information and related biomolecular entities of a gene or pair of genes of interest, from PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events.
Offers the capability to integrate, aggregate, analyze and visualize biomedical data from a wide variety of structured and unstructured information repositories. Bio-In combines its extensive capabilities as a life science R&D informatics services provider. The platform allows data normalization and linking, customizable workflows, integrated data browser, flexible and scalable approach, and domain and technology expertise.
A database for retrieving evidence sentences from PubMed abstracts and full-text articles available at PubMed Central. BELTracker uses a combination of multiple approaches based on traditional information retrieval, machine learning, and heuristics to accomplish this task. This database comprises three main components: (i) translation of a given BEL statement to an information retrieval (IR) query, (ii) retrieval of relevant PubMed citations and (iii) finding and ranking the evidence sentences in those citations.
Presents publically cases of highly similar citations in Medline. Deja vu is a database of duplicate publications, as identified using a number of different techniques, with the principle one being text similarity comparisons. This resource includes: (i) a streamlined process to update the database on a daily basis, (ii) a more collaborative approach for recruitment and qualification of topical experts as volunteer curators for specific publication areas, and (iii) methods to better address the question most often asked by authors.
Indexes text found inside biomedical images. YIF offers more comprehensive research results by searching over text that may not be present in the image caption, and offers the ability to find related images and associated papers by directly comparing image content. YIF’s analysis identifies image text elements, and subjects them to optical character recognition (OCR). We believe that searching over image text opens up new avenues for fruitful research in biomedical information retrieval.
1 - 2 of 2