AI Discernment in Foot and Ankle Surgery Research: A Survey Investigation
Category: Other Introduction/Purpose: Artificial intelligence (AI) encompasses computer systems emulating human intelligence. Large language models (LLM), like ChatGPT (OpenAI), exemplify this trend. Trained on vast datasets, LLMs utilize machine learning and natural language processing to generate...
Gespeichert in:
Veröffentlicht in: | Foot & ankle orthopaedics 2024-12, Vol.9 (4) |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Category:
Other
Introduction/Purpose:
Artificial intelligence (AI) encompasses computer systems emulating human intelligence. Large language models (LLM), like ChatGPT (OpenAI), exemplify this trend. Trained on vast datasets, LLMs utilize machine learning and natural language processing to generate coherent responses. However, their use in scientific and medical research raises concerns about plagiarism and accuracy. The scientific community faces the challenge of distinguishing AI-generated content from human-authored text. This study aims to assess foot and ankle surgeons' ability to discern AI-generated abstracts from human-written ones in the field of foot and ankle surgery. Additionally, it examines how participant characteristics, such as experience and familiarity with AI, impact this differentiation process.
Methods:
A survey was developed encompassing participant characteristic inquiries and the presentation of 12 abstracts—6 AI-generated and 6 from the Journal of Foot and Ankle Surgery. Participants, blinded to abstract creation methods, determined whether each was AI or human-generated and provided confidence scores on a 0-100 scale. The survey was administered to foot and ankle surgeons at the Orthopedic Foot and Ankle Center twice, with the second survey completed two weeks after the initial one.
Descriptive statistics characterized participant attributes, with mean and standard deviations reported. Two-sample tests of proportions assessed differences in correct identifications between AI and human-generated abstracts. Pearson's correlation examined associations between correct identifications, participant attributes, and confidence scores. Intraclass correlation coefficients (ICCs) evaluated inter- and intra-rater reliability. Statistical significance was set at p < 0.05, analyzed using STATA/S.E (Version 17.0).
Results:
Nine reviewers participated in the survey study with varying years of practice (0 – 29 years), publications (1 to 200), years reviewing articles (1 to 25), and self-reported AI familiarity scores (0 – 75). Among 216 responses, 109 (50.5%) correctly identified abstract sources. During the first assessment (T1), reviewers accurately identified 44% of AI-generated and 57% of human-generated abstracts, with similar results at the second assessment (T2). No significant difference existed between AI and human-generated abstract identification rates at both time points. Correlation analysis showed mixed relationships between correct |
---|---|
ISSN: | 2473-0114 2473-0114 |
DOI: | 10.1177/2473011424S00129 |