When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario
Large pre-trained language models (PLMs) have garnered significant attention for their versatility and potential for solving a wide spectrum of natural language processing (NLP) tasks. However, the cost of running these PLMs may be prohibitive. Furthermore, PLMs may not be open-sourced due to commer...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large pre-trained language models (PLMs) have garnered significant attention
for their versatility and potential for solving a wide spectrum of natural
language processing (NLP) tasks. However, the cost of running these PLMs may be
prohibitive. Furthermore, PLMs may not be open-sourced due to commercial
considerations and potential risks of misuse, such as GPT-3. The parameters and
gradients of PLMs are unavailable in this scenario. To solve the issue,
black-box tuning has been proposed, which utilizes derivative-free optimization
(DFO), instead of gradient descent, for training task-specific continuous
prompts. However, these gradient-free methods still exhibit a significant gap
compared to gradient-based methods. In this paper, we introduce gradient
descent into black-box tuning scenario through knowledge distillation.
Furthermore, we propose a novel method GDFO, which integrates gradient descent
and derivative-free optimization to optimize task-specific continuous prompts
in a harmonized manner. Experimental results show that GDFO can achieve
significant performance gains over previous state-of-the-art methods. |
---|---|
DOI: | 10.48550/arxiv.2305.10013 |