UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many information retrieval tasks require large labeled datasets for
fine-tuning. However, such datasets are often unavailable, and their utility
for real-world applications can diminish quickly due to domain shifts. To
address this challenge, we develop and motivate a method for using large
language models (LLMs) to generate large numbers of synthetic queries cheaply.
The method begins by generating a small number of synthetic queries using an
expensive LLM. After that, a much less expensive one is used to create large
numbers of synthetic queries, which are used to fine-tune a family of reranker
models. These rerankers are then distilled into a single efficient retriever
for use in the target domain. We show that this technique boosts zero-shot
accuracy in long-tail domains and achieves substantially lower latency than
standard reranking methods. |
---|---|
DOI: | 10.48550/arxiv.2303.00807 |