MAIR: A Massive Benchmark for Evaluating Instructed Retrieval
Recent information retrieval (IR) models are pre-trained and instruction-tuned on massive datasets and tasks, enabling them to perform well on a wide range of tasks and potentially generalize to unseen tasks with instructions. However, existing IR benchmarks focus on a limited scope of tasks, making...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent information retrieval (IR) models are pre-trained and
instruction-tuned on massive datasets and tasks, enabling them to perform well
on a wide range of tasks and potentially generalize to unseen tasks with
instructions. However, existing IR benchmarks focus on a limited scope of
tasks, making them insufficient for evaluating the latest IR models. In this
paper, we propose MAIR (Massive Instructed Retrieval Benchmark), a
heterogeneous IR benchmark that includes 126 distinct IR tasks across 6
domains, collected from existing datasets. We benchmark state-of-the-art
instruction-tuned text embedding models and re-ranking models. Our experiments
reveal that instruction-tuned models generally achieve superior performance
compared to non-instruction-tuned models on MAIR. Additionally, our results
suggest that current instruction-tuned text embedding models and re-ranking
models still lack effectiveness in specific long-tail tasks. MAIR is publicly
available at https://github.com/sunnweiwei/Mair. |
---|---|
DOI: | 10.48550/arxiv.2410.10127 |