Building risk prediction models for daily use of marijuana using machine learning techniques

•Machine learning models superiorly predicts daily marijuana use.•Highest prediction weight was given to smoking, e-cigarette use, and male sex.•Poor mental health, depression, cognitive decline were important predictors.•Presence of chronic health conditions emerged as important features for daily...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Drug and alcohol dependence 2021-08, Vol.225, p.108789-108789, Article 108789
Hauptverfasser: Parekh, Tarang, Fahim, Farhan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Machine learning models superiorly predicts daily marijuana use.•Highest prediction weight was given to smoking, e-cigarette use, and male sex.•Poor mental health, depression, cognitive decline were important predictors.•Presence of chronic health conditions emerged as important features for daily use.•Abnormal sleep pattern was common in daily marijuana users. Identifying the characteristics of adults with recent marijuana use is limited by standard statistical methods and requires a unique approach. The objective of this study is to evaluate the efficiency of machine learning models in predicting daily marijuana use and identify factors associated with daily use among adults. The study analyzed pooled data from the 2016–2019 Behavioral Risk Factor Surveillance System (BRFSS) Survey in 2020. Prediction models were developed using four machine learning algorithms, including Logistic Regression, Decision Tree, and Random Forest with Gini function, and Naïve Bayes. Respondents were randomly divided into training and testing samples. The performance of all the models was compared using accuracy, AUC, precision, and recall. The study included 253,569 respondents, of whom 10,182 (5.9 %) reported daily marijuana use in the last 30 days. Of daily marijuana use, 53.4 % were young adults (age 18−34 years), 34.3 % female, 56.1 % non-Hispanic White, 15.2 % were college graduates, and 67.3 % were employed. Random Forest was the best performing model with AUC 0.97, followed by a Decision tree (AUC 0.95). The most important factors for daily marijuana use were the current use of e-cigarette and combustible cigarette use, male gender, unmarried, poor mental health, depression, cognitive decline, abnormal sleep pattern, and high-risk behavior. Data mining methods were useful in the discovery of behavior health-risk knowledge and to visualize the significance of predicting modeling from a multidimensional behavioral health survey.
ISSN:0376-8716
1879-0046
DOI:10.1016/j.drugalcdep.2021.108789