Evaluate Chat‐GPT's programming capability in Swift through real university exam questions

In this study, we evaluate the programming capabilities of OpenAI's GPT‐3.5 and GPT‐4 models using Swift‐based exam questions from a third‐year university course. The results indicate that both GPT models generally outperform the average student score, yet they do not consistently exceed the pe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software, practice & experience practice & experience, 2024-11, Vol.54 (11), p.2129-2143
Hauptverfasser: Zhang, Zizhuo, Wen, Lian, Jiang, Yanfei, Liu, Yongli
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this study, we evaluate the programming capabilities of OpenAI's GPT‐3.5 and GPT‐4 models using Swift‐based exam questions from a third‐year university course. The results indicate that both GPT models generally outperform the average student score, yet they do not consistently exceed the performance of the top students. This comparison highlights areas where the GPT models excel and where they fall short, providing a nuanced view of their current programming proficiency. The study also reveals surprising instances where GPT‐3.5 outperforms GPT‐4, suggesting complex variations in AI model capabilities. By providing a clear benchmark of GPT's programming skills in an academic context, our research contributes valuable insights for future advancements in AI programming education and underscores the need for continued development to fully realize AI's potential in educational settings.
ISSN:0038-0644
1097-024X
DOI:10.1002/spe.3330