Lexical Normalization of User-Generated Medical Forum Data
In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not bee...
Gespeichert in:
Veröffentlicht in: | Proceedings of the Fourth Social Media Mining for Health Applications (SMM4H) Workshop & Shared Task 2019-08, p.11-20 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the medical domain, user-generated social media text is increasingly
used as a valuable complementary knowledge source to scientific medical
literature. The extraction of this knowledge is complicated by colloquial
language use and misspellings. Yet, lexical normalization of such data has not
been addressed properly. This paper presents an unsupervised, data-driven
spelling correction module for medical social media. Our method outperforms
state-of-the-art spelling correction and can detect mistakes with an F0.5 of
0.888. Additionally, we present a novel corpus for spelling mistake detection
and correction on a medical patient forum. |
---|---|
DOI: | 10.18653/v1/W19-3202 |