Implementation and evaluation of an additional GPT-4-based reviewer in PRISMA-based medical systematic literature reviews

•GPT-4 API as a Complementary Reviewer: Novel integration method for systematic literature reviews, enhancing efficiency and rigor.•Feasibility and Reliability Evaluation: Assessing GPT-4′s potential as a primary screening tool in healthcare information retrieval.•Comprehensive Inter-rater Agreement...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of medical informatics (Shannon, Ireland) Ireland), 2024-09, Vol.189, p.105531, Article 105531
Hauptverfasser:	Landschaft, Assaf, Antweiler, Dario, Mackay, Sina, Kugler, Sabine, Rüping, Stefan, Wrobel, Stefan, Höres, Timm, Allende-Cid, Hector
Format:	Artikel
Sprache:	eng
Schlagworte:	AI-based reviewer Artificial Intelligence GPT-4 API Humans Natural Language Processing PRISMA Reproducibility of Results Systematic literature review Systematic Reviews as Topic
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•GPT-4 API as a Complementary Reviewer: Novel integration method for systematic literature reviews, enhancing efficiency and rigor.•Feasibility and Reliability Evaluation: Assessing GPT-4′s potential as a primary screening tool in healthcare information retrieval.•Comprehensive Inter-rater Agreement Analysis: Utilizing Cohen’s kappa to assess agreement between human reviewers, GPT-4, and consensus, employing distinct methodologies for various parameter types.•Full-text Extraction Advancements: Overcoming limitations to extend AI-based reviewer capabilities in evidence synthesis. PRISMA-based literature reviews require meticulous scrutiny of extensive textual data by multiple reviewers, which is associated with considerable human effort. To evaluate feasibility and reliability of using GPT-4 API as a complementary reviewer in systematic literature reviews based on the PRISMA framework. A systematic literature review on the role of natural language processing and Large Language Models (LLMs) in automatic patient-trial matching was conducted using human reviewers and an AI-based reviewer (GPT-4 API). A RAG methodology with LangChain integration was used to process full-text articles. Agreement levels between two human reviewers and GPT-4 API for abstract screening and between a single reviewer and GPT-4 API for full-text parameter extraction were evaluated. An almost perfect GPT–human reviewer agreement in the abstract screening process (Cohen’s kappa > 0.9) and a lower agreement in the full-text parameter extraction were observed. As GPT-4 has performed on a par with human reviewers in abstract screening, we conclude that GPT-4 has an exciting potential of being used as a main screening tool for systematic literature reviews, replacing at least one of the human reviewers.
ISSN:	1386-5056 1872-8243 1872-8243
DOI:	10.1016/j.ijmedinf.2024.105531