Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language

This paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel (and consonant ’r’) is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel (and c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: šef, Tomaž, škrjanc, Maja, Gams, Matjaž
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel (and consonant ’r’) is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel (and consonant ’r’). We applied a machine-learning technique (decision trees or boosted decision trees). Then, some corrections are made on the word level, according the number of stressed vowels and the length of the word. For data sets we used the MULTEXT-East Slovene Lexicon, which was supplemented with lexical stress marks. The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.
ISSN:0302-9743
1611-3349
DOI:10.1007/3-540-46154-X_23