RuleR: Improving LLM Controllability by Rule-based Data Recycling
Despite the remarkable advancement of Large language models (LLMs), they still lack delicate controllability under sophisticated constraints, which is critical to enhancing their response quality and the user experience. While conditional supervised fine-tuning (SFT) can potentially improve LLM cont...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Despite the remarkable advancement of Large language models (LLMs), they
still lack delicate controllability under sophisticated constraints, which is
critical to enhancing their response quality and the user experience. While
conditional supervised fine-tuning (SFT) can potentially improve LLM
controllability, curating new SFT data to fulfill the constraints usually
relies on human experts or proprietary LLMs, which is time-consuming and
expensive. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a
human/LLM-free data augmentation method incorporating multiple constraints into
the original SFT data. Instead of creating new responses from scratch, RuleR
integrates linguistic or formatting rules into the original instructions and
modifies the responses to fulfill the rule-defined constraints. Training on the
"recycled" data consolidates LLMs capability to generate constrained outputs.
Extensive experiments demonstrate RuleR's effectiveness in improving LLM
controllability while maintaining general instruction-following performance.
RuleR's code is released on https://github.com/tianyi-lab/RuleR. |
---|---|
DOI: | 10.48550/arxiv.2406.15938 |