Real-Word Errors in Arabic Texts: A Better Algorithm for Detection and Correction
Real-word (also known as semantic or context-sensitive) spelling error is a class of error that escapes the typical spell checker which relies on dictionary look-up. This kind of error occurs when a user types a correctly spelled word-by mistake-when another is intended, e.g., "I want a peace (...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2019-08, Vol.27 (8), p.1308-1320 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Real-word (also known as semantic or context-sensitive) spelling error is a class of error that escapes the typical spell checker which relies on dictionary look-up. This kind of error occurs when a user types a correctly spelled word-by mistake-when another is intended, e.g., "I want a peace (piece) of cake." Further, these errors commonly arise in text written by people with dyslexia. Real-word errors are harder to detect as we need to consider the context. In this paper, we propose a spell checker that detects and corrects real-word errors for the Arabic language. Our method avoids predefined confusion sets-a simple approach used by many works tackling this problem-which limits the list of words that can be detected and corrected. Thus, our system can detect and correct a larger set of real-word errors. For the detection phase, we employ word and stem n-gram (n = 1-3) language model along with machine learning, achieving a precision and recall of 83.5% and 99.2%, respectively. And for the correction phase we use n-gram, which results in an accuracy of 98%. Our scheme is robust, with an excellent performance even when the percentage of real-word error words is high. This makes the system suitable for handling errors in post OCR recognition of Arabic text. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2019.2918404 |