Crosslingual Content Scoring in Five Languages Using Machine-Translation and Multilingual Transformer Models

This paper investigates crosslingual content scoring, a scenario where scoring models trained on learner data in one language are applied to data in a different language. We analyze data in five different languages (Chinese, English, French, German and Spanish) collected for three prompts of the est...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of artificial intelligence in education 2023-11, Vol.34 (4), p.1294-1320
Hauptverfasser:	Horbach, Andrea, Pehlke, Joey, Laarmann-Quante, Ronja, Ding, Yuning
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Automation Classification Computer Science Computers and Education Correspondence Crowdsourcing Datasets Educational NLP for a Multilingual World Educational Technology English language Experiments High School Students Information retrieval Language Language of Instruction Languages Learning Machine learning Machine translation Multilingualism Predominantly White Institutions Scoring Scoring models Students Test Items Translation User Interfaces and Human Computer Interaction Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper investigates crosslingual content scoring, a scenario where scoring models trained on learner data in one language are applied to data in a different language. We analyze data in five different languages (Chinese, English, French, German and Spanish) collected for three prompts of the established English ASAP content scoring dataset. We cross the language barrier by means of both shallow and deep learning crosslingual classification models using both machine translation and multilingual transformer models. We find that a combination of machine translation and multilingual models outperforms each method individually - our best results are reached when combining the available data in different languages, i.e. first training a model on the large English ASAP dataset before fine-tuning on smaller amounts of training data in the target language.
ISSN:	1560-4292 1560-4306
DOI:	10.1007/s40593-023-00370-1