CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition
Cross-lingual named entity recognition (NER) aims to train an NER system that generalizes well to a target language by leveraging labeled data in a given source language. Previous work alleviates the data scarcity problem by translating source-language labeled data or performing knowledge distillati...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cross-lingual named entity recognition (NER) aims to train an NER system that
generalizes well to a target language by leveraging labeled data in a given
source language. Previous work alleviates the data scarcity problem by
translating source-language labeled data or performing knowledge distillation
on target-language unlabeled data. However, these methods may suffer from label
noise due to the automatic labeling process. In this paper, we propose CoLaDa,
a Collaborative Label Denoising Framework, to address this problem.
Specifically, we first explore a model-collaboration-based denoising scheme
that enables models trained on different data sources to collaboratively
denoise pseudo labels used by each other. We then present an
instance-collaboration-based strategy that considers the label consistency of
each token's neighborhood in the representation space for denoising.
Experiments on different benchmark datasets show that the proposed CoLaDa
achieves superior results compared to previous methods, especially when
generalizing to distant languages. |
---|---|
DOI: | 10.48550/arxiv.2305.14913 |