LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

State-of-the-art large language models (LLMs) are now claiming remarkable supported context lengths of 256k or even more. In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yuan, Tao, Ning, Xuefei, Zhou, Dong, Yang, Zhijie, Li, Shiyao, Zhuang, Minghui, Tan, Zheyue, Yao, Zhuyu, Lin, Dahua, Li, Boxun, Dai, Guohao, Yan, Shengen, Wang, Yu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!