Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: da Silva, Joaquim Ferreira, Dias, Gaël, Guilloré, Sylvie, Pereira Lopes, José Gabriel
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, IR and IE. In this paper we propose two new association measures, the Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) for the extraction of contiguous and non-contiguous MWUs. Both measures are used by a new algorithm, the LocalMaxs, that requires neither empirically obtained thresholds nor complex linguistic filters. We assess the results obtained by both measures by comparing them with reference association measures (Specific Mutual Information, ø2, Dice and Log-Likelihood coefficients) over a multilingual parallel corpus. An additional experiment has been carried out over a part-of-speech tagged Portuguese corpus for extracting contiguous compound verbs.
ISSN:0302-9743
1611-3349
DOI:10.1007/3-540-48159-1_9