Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models
Code-switching, the phenomenon of alternating between two or more languages in a single conversation, presents unique challenges for Natural Language Processing (NLP). Most existing research focuses on either syntactic constraints or neural generation, with few efforts to integrate linguistic theory...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Code-switching, the phenomenon of alternating between two or more languages
in a single conversation, presents unique challenges for Natural Language
Processing (NLP). Most existing research focuses on either syntactic
constraints or neural generation, with few efforts to integrate linguistic
theory with large language models (LLMs) for generating natural code-switched
text. In this paper, we introduce EZSwitch, a novel framework that combines
Equivalence Constraint Theory (ECT) with LLMs to produce linguistically valid
and fluent code-switched text. We evaluate our method using both human
judgments and automatic metrics, demonstrating a significant improvement in the
quality of generated code-switching sentences compared to baseline LLMs. To
address the lack of suitable evaluation metrics, we conduct a comprehensive
correlation study of various automatic metrics against human scores, revealing
that current metrics often fail to capture the nuanced fluency of code-switched
text. Additionally, we create CSPref, a human preference dataset based on human
ratings and analyze model performance across ``hard`` and ``easy`` examples.
Our findings indicate that incorporating linguistic constraints into LLMs leads
to more robust and human-aligned generation, paving the way for scalable
code-switching text generation across diverse language pairs. |
---|---|
DOI: | 10.48550/arxiv.2410.22660 |