A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction
Substance use is a global issue that negatively impacts millions of persons who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient allocation of appropriate resources is challenging due to their complex use patterns (e.g., their tendency to change usage within months) and the...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Substance use is a global issue that negatively impacts millions of persons
who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient
allocation of appropriate resources is challenging due to their complex use
patterns (e.g., their tendency to change usage within months) and the high
acquisition costs for collecting PWUD-focused substance use data. Thus, there
has been a paucity of machine learning models for accurately predicting
short-term substance use behaviors of PWUDs. In this paper, using longitudinal
survey data of 258 PWUDs in the U.S. Great Plains collected by our team, we
design a novel GAN that deals with high-dimensional low-sample-size tabular
data and survey skip logic to augment existing data to improve classification
models' prediction on (A) whether the PWUDs would increase usage and (B) at
which ordinal frequency they would use a particular drug within the next 12
months. Our evaluation results show that, when trained on augmented data from
our proposed GAN, the classification models improve their predictive
performance (AUROC) by up to 13.4% in Problem (A) and 15.8% in Problem (B) for
usage of marijuana, meth, amphetamines, and cocaine, which outperform
state-of-the-art generative models. |
---|---|
DOI: | 10.48550/arxiv.2407.13047 |