Multi-granularity sequence labeling model for acronym expansion identification

Identifying expansion forms for acronyms is beneficial to many natural language processing and information retrieval tasks. In this work, we study the problem of finding expansions in texts for given acronym queries by modeling the problem as a sequence labeling task. However, it is challenging for...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2017-02, Vol.378, p.462-474
Hauptverfasser: Liu, Jie, Liu, Caihua, Huang, Yalou
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Identifying expansion forms for acronyms is beneficial to many natural language processing and information retrieval tasks. In this work, we study the problem of finding expansions in texts for given acronym queries by modeling the problem as a sequence labeling task. However, it is challenging for traditional sequence labeling models like Conditional Random Fields (CRF) due to the complexity of the input sentences and the substructure of the categories. In this paper, we propose a Latent-state Neural Conditional Random Fields model (LNCRF) to deal with the challenges. On one hand, we extend CRF by coupling it with nonlinear hidden layers to learn multi-granularity hierarchical representations of the input data under the framework of Conditional Random Fields. On the other hand, we introduce latent variables to capture the fine granular information from the intrinsic substructures within the structured output labels implicitly. The experimental results on real data show that our model achieves the best performance against the state-of-the-art baselines.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2016.06.045