Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer
Approaches to improving multilingual language understanding often struggle with significant performance gaps between high-resource and low-resource languages. While there are efforts to align the languages in a single latent space to mitigate such gaps, how different input-level representations infl...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Approaches to improving multilingual language understanding often struggle
with significant performance gaps between high-resource and low-resource
languages. While there are efforts to align the languages in a single latent
space to mitigate such gaps, how different input-level representations
influence such gaps has not been investigated, particularly with phonemic
inputs. We hypothesize that the performance gaps are affected by representation
discrepancies between these languages, and revisit the use of phonemic
representations as a means to mitigate these discrepancies. To demonstrate the
effectiveness of phonemic representations, we present experiments on three
representative cross-lingual tasks on 12 languages in total. The results show
that phonemic representations exhibit higher similarities between languages
compared to orthographic representations, and it consistently outperforms
grapheme-based baseline model on languages that are relatively low-resourced.
We present quantitative evidence from three cross-lingual tasks that
demonstrate the effectiveness of phonemic representations, and it is further
justified by a theoretical analysis of the cross-lingual performance gap. |
---|---|
DOI: | 10.48550/arxiv.2402.14279 |