MedCalc-Bench: Evaluating Large Language Models for Medical Calculations

As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagno...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-06
Hauptverfasser: Khandekar, Nikhil, Qiao, Jin, Xiong, Guangzhi, Dunn, Soren, Applebaum, Serina S, Zain Anwar, Sarfo-Gyamfi, Maame, Safranek, Conrad W, Anwar, Abid A, Zhang, Andrew, Gilson, Aidan, Singer, Maxwell B, Amisha Dave, Taylor, Andrew, Zhang, Aidong, Chen, Qingyu, Lu, Zhiyong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!