Fine Grain Synthetic Educational Data: Challenges and Limitations of Collaborative Learning Analytics
While data privacy is a key aspect of Learning Analytics, it often creates difficulty when promoting research into underexplored contexts as it limits data sharing. To overcome this problem, the generation of synthetic data has been proposed and discussed within the LA community. However, there has...
Gespeichert in:
Veröffentlicht in: | IEEE access 2022, Vol.10, p.26230-26241 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While data privacy is a key aspect of Learning Analytics, it often creates difficulty when promoting research into underexplored contexts as it limits data sharing. To overcome this problem, the generation of synthetic data has been proposed and discussed within the LA community. However, there has been little work that has explored the use of synthetic data in real-world situations. This research examines the effectiveness of using synthetic data for training academic performance prediction models, and the challenges and limitations of using the proposed data sharing method. To evaluate the effectiveness of the method, we generate synthetic data from a private dataset, and distribute it to the participants of a data challenge to train prediction models. Participants submitted their models as docker containers for evaluation and ranking on holdout synthetic data. A post-hoc analysis was conducted on the top 10 participant's models by comparing the evaluation of their performance on synthetic and private validation datasets. Several models trained on synthetic data were found to perform significantly poorer when applied to the non-synthetic private dataset. The main contribution of this research is to understand the challenges and limitations of applying predictive models trained on synthetic data in real-world situations. Due to these challenges, the paper recommends model designs that can inform future successful adoption of synthetic data in real-world educational data systems. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2022.3156073 |