Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5
Mika H\"am\"al\"ainen, Emily \"Ohman, Flammie Pirinen, Khalid Alnajjar, So Miyagawa, Yuri Bizzoni, Niko Partanen, and Jack Rueter. 2023. Proc. of the Joint 3rd International Conference on NLP4DH and 8th IWCLUL. ACL, Tokyo, Japan, edition A common way of assessing language learner...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Mika H\"am\"al\"ainen, Emily \"Ohman, Flammie Pirinen, Khalid
Alnajjar, So Miyagawa, Yuri Bizzoni, Niko Partanen, and Jack Rueter. 2023.
Proc. of the Joint 3rd International Conference on NLP4DH and 8th IWCLUL.
ACL, Tokyo, Japan, edition A common way of assessing language learners' mastery of vocabulary is via
multiple-choice cloze (i.e., fill-in-the-blank) questions. But the creation of
test items can be laborious for individual teachers or in large-scale language
programs. In this paper, we evaluate a new method for automatically generating
these types of questions using large language models (LLM). The VocaTT
(vocabulary teaching and training) engine is written in Python and comprises
three basic steps: pre-processing target word lists, generating sentences and
candidate word options using GPT, and finally selecting suitable word options.
To test the efficiency of this system, 60 questions were generated targeting
academic words. The generated items were reviewed by expert reviewers who
judged the well-formedness of the sentences and word options, adding comments
to items judged not well-formed. Results showed a 75% rate of well-formedness
for sentences and 66.85% rate for suitable word options. This is a marked
improvement over the generator used earlier in our research which did not take
advantage of GPT's capabilities. Post-hoc qualitative analysis reveals several
points for improvement in future work including cross-referencing
part-of-speech tagging, better sentence validation, and improving GPT prompts. |
---|---|
DOI: | 10.48550/arxiv.2403.02078 |