Identifies environment ontology (ENVO) terms in literature. ENVIRONMENTS is able to detect environment descriptive terms based on the web resource of biodiversity Encyclopedia of Life (EOL) Taxon corpus. This software can manage orthographic variation like plural forms or spacing and hyphenation. It can also be adapted to large sources of text other than EOL and can be used as an extracting species-environment pairs tool with the SPECIES tagger.
Provides simplified text to enhance the performance of Natural language processing (NLP) systems and text mining (TM) applications. iSimp denotes simplified sentences in a corpus file, along with the annotation of simplification constructs in the original sentence. It uses shallow parsing and recursive transition networks to detect all forms of simplifications. This tool is able to detect six types of simplification constructs: coordination, relative clause, apposition, introductory phrase, subordinate clause and parenthetical element.
Existing terminological resources and scientific databases cannot keep up-to-date with the growth of neologisms. A domain independent method for term recognition is very useful to automatically recognize terms from documents. The TerMine demonstrator intergrates C-Value multiword term extraction and AcroMine acronym recognition.
Aims to extract all types of abbreviations with their expansions from a target paper on the fly. ALICE does not only facilitate recognition of an undefined abbreviation in a paper by constructing an abbreviation database or dictionary, but also makes biomedical literature retrieval more accurate.
The identification of biomedical terms in natural language is essential for information extraction from text. Seventeen term rewrite and suppress rules were implemented to increase the number of terms in the Unified Medical Language System (UMLS) suitable for text mining. By rewriting and suppressing the UMLS (and thereby increasing its recall and precision) it becomes more suitable for biomedical text mining purposes, such as information retrieval and knowledge discovery.
Retrieves conserved patterns from a protein input sequence. JUZBOX construct valid biomedical terms from any protein sequences that are characterized by the commonly used 20 single letter amino acid character codes. It can serve to determine the potential motifs from a protein sequence and found some interesting motifs. This tool is capable to find previously undiscovered associations of the conserved pattern, its corresponding motif and functional sites.
Identifies and tags the abbreviations in text with xml tags. If the long-form is given in the text or can be guessed from the document context, then the tag surrounding the abbreviation will contain the expansion's normalised form.
0 - 0 of 0
1 - 3 of 3