Turkish spelling error detection and correction by using word n-grams

N-grams can be used for spelling check and correction processes. The first step to use n-grams is to find the language specific n-grams by using a corpus. But a corpus cannot be big enough to contain all the possible word n-grams. Back-off smoothing technique is one of the techniques to estimate the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Dalkilic, G., Cebi, Y.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:N-grams can be used for spelling check and correction processes. The first step to use n-grams is to find the language specific n-grams by using a corpus. But a corpus cannot be big enough to contain all the possible word n-grams. Back-off smoothing technique is one of the techniques to estimate the frequency of the unknown n-grams in a corpus. By using Back-off technique and the Minimum Edit Distance (MED) algorithm, a program was developed to check spelling errors and suggest corrections in a sentence typed in Turkish. The results were compared with the results of Microsoft Word 2003 proofing tools, and found to be much better.
DOI:10.1109/ICSCCW.2009.5379481