Enhancing Few-Shot Transfer Learning with Optimized Multi-Task Prompt Tuning through Modular Prompt Composition
In recent years, multi-task prompt tuning has garnered considerable attention for its inherent modularity and potential to enhance parameter-efficient transfer learning across diverse tasks. This paper aims to analyze and improve the performance of multiple tasks by facilitating the transfer of know...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In recent years, multi-task prompt tuning has garnered considerable attention
for its inherent modularity and potential to enhance parameter-efficient
transfer learning across diverse tasks. This paper aims to analyze and improve
the performance of multiple tasks by facilitating the transfer of knowledge
between their corresponding prompts in a multi-task setting. Our proposed
approach decomposes the prompt for each target task into a combination of
shared prompts (source prompts) and a task-specific prompt (private prompt).
During training, the source prompts undergo fine-tuning and are integrated with
the private prompt to drive the target prompt for each task. We present and
compare multiple methods for combining source prompts to construct the target
prompt, analyzing the roles of both source and private prompts within each
method. We investigate their contributions to task performance and offer
flexible, adjustable configurations based on these insights to optimize
performance. Our empirical findings clearly showcase improvements in accuracy
and robustness compared to the conventional practice of prompt tuning and
related works. Notably, our results substantially outperform other methods in
the field in few-shot settings, demonstrating superior performance in various
tasks across GLUE benchmark, among other tasks. This achievement is attained
with a significantly reduced amount of training data, making our method a
promising one for few-shot settings. |
---|---|
DOI: | 10.48550/arxiv.2408.13227 |