A Hybrid Approach to Chinese Abbreviation Expansion

This paper presents a hybrid approach to Chinese abbreviation expansion. In this study, each short-form in Chinese text is assumed to be created by the method of reduction and the method of elimination or generalization, respectively. A mapping table between short words and long words and a dictiona...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Fu, Guohong, Luke, Kang-Kuong, Zhang, Min, Zhou, GuoDong
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a hybrid approach to Chinese abbreviation expansion. In this study, each short-form in Chinese text is assumed to be created by the method of reduction and the method of elimination or generalization, respectively. A mapping table between short words and long words and a dictionary of non-reduced short-form/full-form pairs are thus applied to generate the respective expansion candidates. Then, a hidden Markov model (HMM) based disambiguation is employed to rank these candidates and select a proper expansion for each ambiguous abbreviation. In order to improve expansion accuracy, some linguistic knowledge like discourse information and abbreviation patterns are further employed to double-check the expanded results and revise some error expansions if any. The proposed approach was evaluated on an abbreviation-expanded corpus built from the Peking University Corpus. The results showed that a recall of 83.8% and a precision of 86.3% can be achieved on average for different types of Chinese abbreviations.
ISSN:0302-9743
1611-3349
DOI:10.1007/11940098_29