Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders su...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-11
Hauptverfasser:	Saad-Falcon, Jon, Fu, Daniel Y, Arora, Simran, Guha, Neel, Ré, Christopher
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Context Documents Machine learning Mathematical models Parameters Retrieval
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!