Multilingual Multiword Expression Identification Using Lateral Inhibition and Domain Adaptation

Correctly identifying multiword expressions (MWEs) is an important task for most natural language processing systems since their misidentification can result in ambiguity and misunderstanding of the underlying text. In this work, we evaluate the performance of the mBERT model for MWE identification...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Mathematics (Basel) 2023-06, Vol.11 (11), p.2548
Hauptverfasser:	Avram, Andrei-Marius, Mititelu, Verginica Barbu, Păiș, Vasile, Cercel, Dumitru-Clementin, Trăușan-Matu, Ștefan
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Computational linguistics Datasets domain adaptation Information retrieval International relations Language Language processing lateral inhibition Machine translation Mathematics multilingual Multilingualism multiword expression identification Natural language Natural language interfaces Natural language processing PARSEME corpus Performance evaluation Text categorization Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Correctly identifying multiword expressions (MWEs) is an important task for most natural language processing systems since their misidentification can result in ambiguity and misunderstanding of the underlying text. In this work, we evaluate the performance of the mBERT model for MWE identification in a multilingual context by training it on all 14 languages available in version 1.2 of the PARSEME corpus. We also incorporate lateral inhibition and language adversarial training into our methodology to create language-independent embeddings and improve its capabilities in identifying multiword expressions. The evaluation of our models shows that the approach employed in this work achieves better results compared to the best system of the PARSEME 1.2 competition, MTLB-STRUCT, on 11 out of 14 languages for global MWE identification and on 12 out of 14 languages for unseen MWE identification. Additionally, averaged across all languages, our best approach outperforms the MTLB-STRUCT system by 1.23% on global MWE identification and by 4.73% on unseen global MWE identification.
ISSN:	2227-7390 2227-7390
DOI:	10.3390/math11112548