A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction

Substance use is a global issue that negatively impacts millions of persons who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient allocation of appropriate resources is challenging due to their complex use patterns (e.g., their tendency to change usage within months) and the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Thach, Nguyen, Habecker, Patrick, Johnston, Bergen, Cervantes, Lillianna, Eisenbraun, Anika, Mason, Alex, Tyler, Kimberly, Khan, Bilal, Chan, Hau
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Thach, Nguyen
Habecker, Patrick
Johnston, Bergen
Cervantes, Lillianna
Eisenbraun, Anika
Mason, Alex
Tyler, Kimberly
Khan, Bilal
Chan, Hau
description Substance use is a global issue that negatively impacts millions of persons who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient allocation of appropriate resources is challenging due to their complex use patterns (e.g., their tendency to change usage within months) and the high acquisition costs for collecting PWUD-focused substance use data. Thus, there has been a paucity of machine learning models for accurately predicting short-term substance use behaviors of PWUDs. In this paper, using longitudinal survey data of 258 PWUDs in the U.S. Great Plains collected by our team, we design a novel GAN that deals with high-dimensional low-sample-size tabular data and survey skip logic to augment existing data to improve classification models' prediction on (A) whether the PWUDs would increase usage and (B) at which ordinal frequency they would use a particular drug within the next 12 months. Our evaluation results show that, when trained on augmented data from our proposed GAN, the classification models improve their predictive performance (AUROC) by up to 13.4% in Problem (A) and 15.8% in Problem (B) for usage of marijuana, meth, amphetamines, and cocaine, which outperform state-of-the-art generative models.
doi_str_mv 10.48550/arxiv.2407.13047
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_13047</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_13047</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_130473</originalsourceid><addsrcrecordid>eNqFzr0OgjAUQOEuDkZ9ACfvC4BFILg2_g6GmIibCblAkSaUkksh-vZG4u50ljN8jC097gbbMORrpJca3E3AI9fzeRBN2UNAbAZZw0nEINqWDOYVWAOif2rZWLgorawsIMGsr5FgjxahNAS3ypB1Ekkabn3WWWxyCfdOwpVkoXKrTDNnkxLrTi5-nbHV8ZDszs7oSFtSGumdfj3p6PH_Hx9aUj-Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction</title><source>arXiv.org</source><creator>Thach, Nguyen ; Habecker, Patrick ; Johnston, Bergen ; Cervantes, Lillianna ; Eisenbraun, Anika ; Mason, Alex ; Tyler, Kimberly ; Khan, Bilal ; Chan, Hau</creator><creatorcontrib>Thach, Nguyen ; Habecker, Patrick ; Johnston, Bergen ; Cervantes, Lillianna ; Eisenbraun, Anika ; Mason, Alex ; Tyler, Kimberly ; Khan, Bilal ; Chan, Hau</creatorcontrib><description>Substance use is a global issue that negatively impacts millions of persons who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient allocation of appropriate resources is challenging due to their complex use patterns (e.g., their tendency to change usage within months) and the high acquisition costs for collecting PWUD-focused substance use data. Thus, there has been a paucity of machine learning models for accurately predicting short-term substance use behaviors of PWUDs. In this paper, using longitudinal survey data of 258 PWUDs in the U.S. Great Plains collected by our team, we design a novel GAN that deals with high-dimensional low-sample-size tabular data and survey skip logic to augment existing data to improve classification models' prediction on (A) whether the PWUDs would increase usage and (B) at which ordinal frequency they would use a particular drug within the next 12 months. Our evaluation results show that, when trained on augmented data from our proposed GAN, the classification models improve their predictive performance (AUROC) by up to 13.4% in Problem (A) and 15.8% in Problem (B) for usage of marijuana, meth, amphetamines, and cocaine, which outperform state-of-the-art generative models.</description><identifier>DOI: 10.48550/arxiv.2407.13047</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-07</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.13047$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.13047$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Thach, Nguyen</creatorcontrib><creatorcontrib>Habecker, Patrick</creatorcontrib><creatorcontrib>Johnston, Bergen</creatorcontrib><creatorcontrib>Cervantes, Lillianna</creatorcontrib><creatorcontrib>Eisenbraun, Anika</creatorcontrib><creatorcontrib>Mason, Alex</creatorcontrib><creatorcontrib>Tyler, Kimberly</creatorcontrib><creatorcontrib>Khan, Bilal</creatorcontrib><creatorcontrib>Chan, Hau</creatorcontrib><title>A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction</title><description>Substance use is a global issue that negatively impacts millions of persons who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient allocation of appropriate resources is challenging due to their complex use patterns (e.g., their tendency to change usage within months) and the high acquisition costs for collecting PWUD-focused substance use data. Thus, there has been a paucity of machine learning models for accurately predicting short-term substance use behaviors of PWUDs. In this paper, using longitudinal survey data of 258 PWUDs in the U.S. Great Plains collected by our team, we design a novel GAN that deals with high-dimensional low-sample-size tabular data and survey skip logic to augment existing data to improve classification models' prediction on (A) whether the PWUDs would increase usage and (B) at which ordinal frequency they would use a particular drug within the next 12 months. Our evaluation results show that, when trained on augmented data from our proposed GAN, the classification models improve their predictive performance (AUROC) by up to 13.4% in Problem (A) and 15.8% in Problem (B) for usage of marijuana, meth, amphetamines, and cocaine, which outperform state-of-the-art generative models.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzr0OgjAUQOEuDkZ9ACfvC4BFILg2_g6GmIibCblAkSaUkksh-vZG4u50ljN8jC097gbbMORrpJca3E3AI9fzeRBN2UNAbAZZw0nEINqWDOYVWAOif2rZWLgorawsIMGsr5FgjxahNAS3ypB1Ekkabn3WWWxyCfdOwpVkoXKrTDNnkxLrTi5-nbHV8ZDszs7oSFtSGumdfj3p6PH_Hx9aUj-Q</recordid><startdate>20240717</startdate><enddate>20240717</enddate><creator>Thach, Nguyen</creator><creator>Habecker, Patrick</creator><creator>Johnston, Bergen</creator><creator>Cervantes, Lillianna</creator><creator>Eisenbraun, Anika</creator><creator>Mason, Alex</creator><creator>Tyler, Kimberly</creator><creator>Khan, Bilal</creator><creator>Chan, Hau</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240717</creationdate><title>A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction</title><author>Thach, Nguyen ; Habecker, Patrick ; Johnston, Bergen ; Cervantes, Lillianna ; Eisenbraun, Anika ; Mason, Alex ; Tyler, Kimberly ; Khan, Bilal ; Chan, Hau</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_130473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Thach, Nguyen</creatorcontrib><creatorcontrib>Habecker, Patrick</creatorcontrib><creatorcontrib>Johnston, Bergen</creatorcontrib><creatorcontrib>Cervantes, Lillianna</creatorcontrib><creatorcontrib>Eisenbraun, Anika</creatorcontrib><creatorcontrib>Mason, Alex</creatorcontrib><creatorcontrib>Tyler, Kimberly</creatorcontrib><creatorcontrib>Khan, Bilal</creatorcontrib><creatorcontrib>Chan, Hau</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Thach, Nguyen</au><au>Habecker, Patrick</au><au>Johnston, Bergen</au><au>Cervantes, Lillianna</au><au>Eisenbraun, Anika</au><au>Mason, Alex</au><au>Tyler, Kimberly</au><au>Khan, Bilal</au><au>Chan, Hau</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction</atitle><date>2024-07-17</date><risdate>2024</risdate><abstract>Substance use is a global issue that negatively impacts millions of persons who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient allocation of appropriate resources is challenging due to their complex use patterns (e.g., their tendency to change usage within months) and the high acquisition costs for collecting PWUD-focused substance use data. Thus, there has been a paucity of machine learning models for accurately predicting short-term substance use behaviors of PWUDs. In this paper, using longitudinal survey data of 258 PWUDs in the U.S. Great Plains collected by our team, we design a novel GAN that deals with high-dimensional low-sample-size tabular data and survey skip logic to augment existing data to improve classification models' prediction on (A) whether the PWUDs would increase usage and (B) at which ordinal frequency they would use a particular drug within the next 12 months. Our evaluation results show that, when trained on augmented data from our proposed GAN, the classification models improve their predictive performance (AUROC) by up to 13.4% in Problem (A) and 15.8% in Problem (B) for usage of marijuana, meth, amphetamines, and cocaine, which outperform state-of-the-art generative models.</abstract><doi>10.48550/arxiv.2407.13047</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2407.13047
ispartof
issn
language eng
recordid cdi_arxiv_primary_2407_13047
source arXiv.org
subjects Computer Science - Learning
title A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T10%3A49%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Novel%20GAN%20Approach%20to%20Augment%20Limited%20Tabular%20Data%20for%20Short-Term%20Substance%20Use%20Prediction&rft.au=Thach,%20Nguyen&rft.date=2024-07-17&rft_id=info:doi/10.48550/arxiv.2407.13047&rft_dat=%3Carxiv_GOX%3E2407_13047%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true