An abbreviation dictionary automatically constructed from the whole MEDLINE as of April, 2009. Acromine identifies abbreviation definitions by assuming a word sequence co-occurring frequently with a parenthetical expression to be a potential expanded form. Applied to the whole MEDLINE (9,635,599 abstracts), the implemented system extracted 68,007 abbreviation candidates and recognized 467,402 expanded forms. The current Acromine achieves 99% precision and 82-95% recall on our evaluation corpus that roughly emulates the whole MEDLINE.
Graduate School of Information Science and Technology, University of Tokyo, Bunkyo-ku, Tokyo, Japan and School of Computer Science, University of Manchester, National Centre for Text Mining (NaCTeM), Manchester Interdisciplinary Biocentre, Manchester, UK