A computational morphological lexicon for Turkish: TrLex

•There are 1.43 meanings per single-word lemma in the lexicon.•More than half of the single-word lemmas (56.7%) are in derived structure.•In loanwords, French makes up the largest proportion (75%) for Western languages.•In morphological segmentation, the average value for suffix length is 2.35.•Adje...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Lingua 2018-04, Vol.206, p.21-34
Hauptverfasser:	Aslan, Ozkan, Gunal, Serkan, Dincer, B. Taner
Format:	Artikel
Sprache:	eng
Schlagworte:	Agglutinative languages Compounding Computational linguistics Derivation Derivation (Morphology) Derivatives Grammar lexicon relationship Lexicon Morphological lexicon Morphology Paradigms Semantics Suffixes Turkish Turkish language Word formation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•There are 1.43 meanings per single-word lemma in the lexicon.•More than half of the single-word lemmas (56.7%) are in derived structure.•In loanwords, French makes up the largest proportion (75%) for Western languages.•In morphological segmentation, the average value for suffix length is 2.35.•Adjectives mostly derived from nouns. A morphological lexicon that is a computational source should be considered together with derivational morphology especially for agglutinative languages. To the best of our knowledge, in the Turkish language there has been no study that analyzes the derivational suffixes on the lexicon in a computational paradigm. This study provides a very rich lexical resource, filling a gap in the field, and would hopefully lead to new related studies as well. The morphological lexicon can be used in morphological analysis as well as in several other tasks, such as stemming and part of speech (POS) tagging. In this study, we introduce a morphological lexicon named TrLex and present its components, preparation processes and some statistics. We observed that more than half of the single-word lemmas (56.7%) are in the derived structure. Since the word formation in Turkish prefer the morphological processes, this number is higher than the rate of compound-type words (2.7%). As a result of the work, we obtained a knowledge-intensive data table including several fields such as form, structure, semantic information. We also extracted Lexical Markup Framework (LMF) formatted file containing only morphological and POS information and made the file freely available.
ISSN:	0024-3841 1872-6135
DOI:	10.1016/j.lingua.2018.01.003