MALLET statistics

Tool stats & trends

Looking to identify usage trends or leading experts?


MALLET specifications


Unique identifier OMICS_25518
Alternative name MAchine Learning for LanguagE Toolkit
Software type Application/Script
Interface Command line interface
Restrictions to use None
Operating system Unix/Linux
Programming languages Java
License Apache License version 2.0
Computer skills Advanced
Version 2.0.8
Stability Stable
Maintained Yes




No version available



  • person_outline Andrew McCallum

Additional information

MALLET citations


Usage of cell nomenclature in biomedical literature

BMC Bioinformatics
PMCID: 5763300
PMID: 29322912
DOI: 10.1186/s12859-017-1978-0
call_split See protocol

[…] eline to annotate cell type and cell line names in the Open Access full text articles by using our dictionaries on cell types and cell lines. Whatizit employs taggers based on finite automata and the MAchine Learning for LanguagE Toolkit (MALLET) []. The taggers of Whatizit annotate documents in a dictionary-based approach. […]


ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

Biomed Res Int
PMCID: 4749772
PMID: 26942193
DOI: 10.1155/2016/4248026

[…] ata tokenized by each of the three tokenizers. Two state-of-the-art classification algorithms, SVM and CRFs, were used for this purpose. Yamcha [] toolkit has been used for realizing the SVM, whereas Mallet [] toolkit has been used for implementing the CRFs classifiers. Both toolkits are trained using default settings. In particular, the SVM employed by Yamcha is trained in the one-versus-all mode […]


Fine grained information extraction from German transthoracic echocardiography reports

BMC Med Inform Decis Mak
PMCID: 4643516
PMID: 26563260
DOI: 10.1186/s12911-015-0215-x

[…] lacked granularity and coverage with respect to clinical purposes, especially regarding degree, change, and temporal information.HITEx and Apache cTAKES both use open-source libraries like WEKA [] or MALLET [] to perform some tasks based on machine learning methods. Nevertheless, regular expressions and rule-based components still play a central role in both systems. The same applies to the approa […]


Using natural language processing techniques to inform research on nanotechnology

Beilstein J Nanotechnol
PMCID: 4505089
PMID: 26199848
DOI: 10.3762/bjnano.6.149

[…] (EXPO), target organs and/or organisms (TARGET), and types of toxicity/damage (TOXIC) [,]. ABNER contains the supervised machine learning algorithm linear-chain conditional random fields (CRFs) from Mallet (available at, an open source freely available Java-based statistical natural language processing toolkit []. To create training data for the CRF, the authors manua […]


A document processing pipeline for annotating chemical entities in scientific documents

J Cheminform
PMCID: 4331697
PMID: 25810778
DOI: 10.1186/1758-2946-7-S1-S7
call_split See protocol

[…] We applied a supervised machine-learning approach, through the application of Conditional Random Fields (CRFs) [] provided by MALLET []. Additionally, we compiled a dictionary of chemical entity name, and used the matches of these names in the texts as features for the CRF model.The method applied for this work was developed […]


Automating Data Abstraction in a Quality Improvement Platform for Surgical and Interventional Procedures

PMCID: 4371448
PMID: 25848598
DOI: 10.13063/2327-9214.1114

[…] ht NLP pipeline that enables the rapid prototyping of text classification tasks based on a simple set of XML-based templates and the integration of existing standalone NLP tools (openNLP, libSVM, and Mallet). We decided to build our own lightweight pipeline after a review of available generalized clinical systems, which for various reasons did not meet the specific requirements of our project. Som […]


Looking to check out a full list of citations?

MALLET reviews

star_border star_border star_border star_border star_border
star star star star star

Be the first to review MALLET