Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models
Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins to develop a comprehensive range of skills, such as writing, reasoning, chatting, coding, and more. Each skill has unique characteristics, and these datasets are often heterogeneous a...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large language models (LLMs) are typically fine-tuned on diverse and
extensive datasets sourced from various origins to develop a comprehensive
range of skills, such as writing, reasoning, chatting, coding, and more. Each
skill has unique characteristics, and these datasets are often heterogeneous
and imbalanced, making the fine-tuning process highly challenging. Balancing
the development of each skill while ensuring the model maintains its overall
performance requires sophisticated techniques and careful dataset curation. In
this work, we propose a general, model-agnostic, reinforcement learning
framework, Mixture-of-Skills (MoS), that learns to optimize data usage
automatically during the fine-tuning process. This framework ensures the
optimal comprehensive skill development of LLMs by dynamically adjusting the
focus on different datasets based on their current learning state. To validate
the effectiveness of MoS, we conduct extensive experiments using three diverse
LLM backbones on two widely used benchmarks and demonstrate that MoS
substantially enhances model performance. Building on the success of MoS, we
propose MoSpec, an adaptation for task-specific fine-tuning, which harnesses
the utilities of various datasets for a specific purpose. Our work underlines
the significance of dataset rebalancing and present MoS as a powerful, general
solution for optimizing data usage in the fine-tuning of LLMs for various
purposes. |
---|---|
DOI: | 10.48550/arxiv.2406.08811 |