Evaluating ChatGPT as a Question Answering System: A Comprehensive Analysis and Comparison with Existing Models
In the current era, a multitude of language models has emerged to cater to user inquiries. Notably, the GPT-3.5 Turbo language model has gained substantial attention as the underlying technology for ChatGPT. Leveraging extensive parameters, this model adeptly responds to a wide range of questions. H...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the current era, a multitude of language models has emerged to cater to
user inquiries. Notably, the GPT-3.5 Turbo language model has gained
substantial attention as the underlying technology for ChatGPT. Leveraging
extensive parameters, this model adeptly responds to a wide range of questions.
However, due to its reliance on internal knowledge, the accuracy of responses
may not be absolute. This article scrutinizes ChatGPT as a Question Answering
System (QAS), comparing its performance to other existing QASs. The primary
focus is on evaluating ChatGPT's proficiency in extracting responses from
provided paragraphs, a core QAS capability. Additionally, performance
comparisons are made in scenarios without a surrounding passage. Multiple
experiments, exploring response hallucination and considering question
complexity, were conducted on ChatGPT. Evaluation employed well-known Question
Answering (QA) datasets, including SQuAD, NewsQA, and PersianQuAD, across
English and Persian languages. Metrics such as F-score, exact match, and
accuracy were employed in the assessment. The study reveals that, while ChatGPT
demonstrates competence as a generative model, it is less effective in question
answering compared to task-specific models. Providing context improves its
performance, and prompt engineering enhances precision, particularly for
questions lacking explicit answers in provided paragraphs. ChatGPT excels at
simpler factual questions compared to "how" and "why" question types. The
evaluation highlights occurrences of hallucinations, where ChatGPT provides
responses to questions without available answers in the provided context. |
---|---|
DOI: | 10.48550/arxiv.2312.07592 |