Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use

Distracted driving leads to a significant number of road crashes worldwide. Smartphone use is one of the most common causes of cognitive distraction among drivers. Available data on drivers’ phone use presents an invaluable opportunity to identify the main factors behind this behavior. Machine learn...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sustainability 2023-07, Vol.15 (13), p.10668
Hauptverfasser: Taamneh, Madhar M, Taamneh, Salah, Alomari, Ahmad H, Abuaddous, Musab
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 13
container_start_page 10668
container_title Sustainability
container_volume 15
creator Taamneh, Madhar M
Taamneh, Salah
Alomari, Ahmad H
Abuaddous, Musab
description Distracted driving leads to a significant number of road crashes worldwide. Smartphone use is one of the most common causes of cognitive distraction among drivers. Available data on drivers’ phone use presents an invaluable opportunity to identify the main factors behind this behavior. Machine learning (ML) techniques are among the most effective techniques for this purpose. However, the potential and usefulness of these techniques are limited, due to the imbalance of available data. The majority class of instances collected is for drivers who do not use their phones, while the minority class is for those who do use their phones. This paper evaluates two main approaches for handling imbalanced datasets on driver phone use. These methods include oversampling and undersampling. The effectiveness of each method was evaluated using six ML techniques: Multilayer Perceptron (MLP), Support Vector Machine (SVM), Naive Bayes (NB), Bayesian Network (BayesNet), J48, and ID3. The proposed methods were also evaluated on three Deep Learning (DL) models: Arch1 (5 hidden layers), Arch2 (10 hidden layers), and Arch3 (15 hidden layers). The data used in this document were collected through a direct observation study to explore a set of human, vehicle, and road surface characteristics. The results showed that all ML methods, as well as DL methods, achieved balanced accuracy values for both classes. ID3, J48, and MLP methods outperformed the rest of the ML methods in all scenarios, with ID3 achieving slightly better accuracy. The DL methods also provided good performances, especially for the undersampling data. The results also showed that the classification methods performed best on the undersampled data. It was concluded that road classification has the highest impact on cell phone use, followed by driver age group, driver gender, vehicle type, and, finally, driver seatbelt usage.
doi_str_mv 10.3390/su151310668
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2836498620</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A758355867</galeid><sourcerecordid>A758355867</sourcerecordid><originalsourceid>FETCH-LOGICAL-c371t-86a47d957dbbde7b30eaac527febb06d4554fe62e9db0370e891f088aa28abd53</originalsourceid><addsrcrecordid>eNpVkUtPwzAMxysEEtPYiS8QiRNCG0nTPHqcGI9JSEw8zpXbOFtQl0LSIcanJ9M4DPtgy_79LVvOsnNGJ5yX9DpumGCcUSn1UTbIqWJjRgU9PshPs1GM7zQZ56xkcpDB1EO7_XF-SfoVkltrsendF3qMkXSWzNc1tOAbNGQGPZAH8Kbd0a_YrLz73GAkzpNFQOOSMDVmIckDWaw6j-Qt4ll2YqGNOPqLw-zt7vb15mH8-HQ_v5k-jhuuWD_WEgplSqFMXRtUNacI0IhcWaxrKk0hRGFR5liamnJFUZfMUq0Bcg21EXyYXeznfoRut1ZfvXebkK6LVa65LEotc5qoyZ5aQouV87brAzTJDa5dk1a2LtWnSmguhJYqCS7_CRLT43e_hE2M1fzl-T97tWeb0MUY0FYfwa0hbCtGq92PqoMf8V-bpoMY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2836498620</pqid></control><display><type>article</type><title>Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB Electronic Journals Library</source><creator>Taamneh, Madhar M ; Taamneh, Salah ; Alomari, Ahmad H ; Abuaddous, Musab</creator><creatorcontrib>Taamneh, Madhar M ; Taamneh, Salah ; Alomari, Ahmad H ; Abuaddous, Musab</creatorcontrib><description>Distracted driving leads to a significant number of road crashes worldwide. Smartphone use is one of the most common causes of cognitive distraction among drivers. Available data on drivers’ phone use presents an invaluable opportunity to identify the main factors behind this behavior. Machine learning (ML) techniques are among the most effective techniques for this purpose. However, the potential and usefulness of these techniques are limited, due to the imbalance of available data. The majority class of instances collected is for drivers who do not use their phones, while the minority class is for those who do use their phones. This paper evaluates two main approaches for handling imbalanced datasets on driver phone use. These methods include oversampling and undersampling. The effectiveness of each method was evaluated using six ML techniques: Multilayer Perceptron (MLP), Support Vector Machine (SVM), Naive Bayes (NB), Bayesian Network (BayesNet), J48, and ID3. The proposed methods were also evaluated on three Deep Learning (DL) models: Arch1 (5 hidden layers), Arch2 (10 hidden layers), and Arch3 (15 hidden layers). The data used in this document were collected through a direct observation study to explore a set of human, vehicle, and road surface characteristics. The results showed that all ML methods, as well as DL methods, achieved balanced accuracy values for both classes. ID3, J48, and MLP methods outperformed the rest of the ML methods in all scenarios, with ID3 achieving slightly better accuracy. The DL methods also provided good performances, especially for the undersampling data. The results also showed that the classification methods performed best on the undersampled data. It was concluded that road classification has the highest impact on cell phone use, followed by driver age group, driver gender, vehicle type, and, finally, driver seatbelt usage.</description><identifier>ISSN: 2071-1050</identifier><identifier>EISSN: 2071-1050</identifier><identifier>DOI: 10.3390/su151310668</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Accuracy ; Algorithms ; Automobile drivers ; Bayesian analysis ; Classification ; Datasets ; Decision trees ; Deep learning ; Machine learning ; Mathematical models ; Methods ; Multilayer perceptrons ; Neural networks ; Performance evaluation ; Roads ; Sampling techniques ; Seat belts ; Smartphones ; Support vector machines ; Surface properties ; Sustainability ; Traffic accidents &amp; safety</subject><ispartof>Sustainability, 2023-07, Vol.15 (13), p.10668</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c371t-86a47d957dbbde7b30eaac527febb06d4554fe62e9db0370e891f088aa28abd53</citedby><cites>FETCH-LOGICAL-c371t-86a47d957dbbde7b30eaac527febb06d4554fe62e9db0370e891f088aa28abd53</cites><orcidid>0000-0002-2920-0175 ; 0000-0002-4827-548X ; 0000-0002-2414-0193 ; 0000-0002-4046-8965</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Taamneh, Madhar M</creatorcontrib><creatorcontrib>Taamneh, Salah</creatorcontrib><creatorcontrib>Alomari, Ahmad H</creatorcontrib><creatorcontrib>Abuaddous, Musab</creatorcontrib><title>Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use</title><title>Sustainability</title><description>Distracted driving leads to a significant number of road crashes worldwide. Smartphone use is one of the most common causes of cognitive distraction among drivers. Available data on drivers’ phone use presents an invaluable opportunity to identify the main factors behind this behavior. Machine learning (ML) techniques are among the most effective techniques for this purpose. However, the potential and usefulness of these techniques are limited, due to the imbalance of available data. The majority class of instances collected is for drivers who do not use their phones, while the minority class is for those who do use their phones. This paper evaluates two main approaches for handling imbalanced datasets on driver phone use. These methods include oversampling and undersampling. The effectiveness of each method was evaluated using six ML techniques: Multilayer Perceptron (MLP), Support Vector Machine (SVM), Naive Bayes (NB), Bayesian Network (BayesNet), J48, and ID3. The proposed methods were also evaluated on three Deep Learning (DL) models: Arch1 (5 hidden layers), Arch2 (10 hidden layers), and Arch3 (15 hidden layers). The data used in this document were collected through a direct observation study to explore a set of human, vehicle, and road surface characteristics. The results showed that all ML methods, as well as DL methods, achieved balanced accuracy values for both classes. ID3, J48, and MLP methods outperformed the rest of the ML methods in all scenarios, with ID3 achieving slightly better accuracy. The DL methods also provided good performances, especially for the undersampling data. The results also showed that the classification methods performed best on the undersampled data. It was concluded that road classification has the highest impact on cell phone use, followed by driver age group, driver gender, vehicle type, and, finally, driver seatbelt usage.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Automobile drivers</subject><subject>Bayesian analysis</subject><subject>Classification</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Methods</subject><subject>Multilayer perceptrons</subject><subject>Neural networks</subject><subject>Performance evaluation</subject><subject>Roads</subject><subject>Sampling techniques</subject><subject>Seat belts</subject><subject>Smartphones</subject><subject>Support vector machines</subject><subject>Surface properties</subject><subject>Sustainability</subject><subject>Traffic accidents &amp; safety</subject><issn>2071-1050</issn><issn>2071-1050</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpVkUtPwzAMxysEEtPYiS8QiRNCG0nTPHqcGI9JSEw8zpXbOFtQl0LSIcanJ9M4DPtgy_79LVvOsnNGJ5yX9DpumGCcUSn1UTbIqWJjRgU9PshPs1GM7zQZ56xkcpDB1EO7_XF-SfoVkltrsendF3qMkXSWzNc1tOAbNGQGPZAH8Kbd0a_YrLz73GAkzpNFQOOSMDVmIckDWaw6j-Qt4ll2YqGNOPqLw-zt7vb15mH8-HQ_v5k-jhuuWD_WEgplSqFMXRtUNacI0IhcWaxrKk0hRGFR5liamnJFUZfMUq0Bcg21EXyYXeznfoRut1ZfvXebkK6LVa65LEotc5qoyZ5aQouV87brAzTJDa5dk1a2LtWnSmguhJYqCS7_CRLT43e_hE2M1fzl-T97tWeb0MUY0FYfwa0hbCtGq92PqoMf8V-bpoMY</recordid><startdate>20230701</startdate><enddate>20230701</enddate><creator>Taamneh, Madhar M</creator><creator>Taamneh, Salah</creator><creator>Alomari, Ahmad H</creator><creator>Abuaddous, Musab</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>4U-</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-2920-0175</orcidid><orcidid>https://orcid.org/0000-0002-4827-548X</orcidid><orcidid>https://orcid.org/0000-0002-2414-0193</orcidid><orcidid>https://orcid.org/0000-0002-4046-8965</orcidid></search><sort><creationdate>20230701</creationdate><title>Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use</title><author>Taamneh, Madhar M ; Taamneh, Salah ; Alomari, Ahmad H ; Abuaddous, Musab</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c371t-86a47d957dbbde7b30eaac527febb06d4554fe62e9db0370e891f088aa28abd53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Automobile drivers</topic><topic>Bayesian analysis</topic><topic>Classification</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Methods</topic><topic>Multilayer perceptrons</topic><topic>Neural networks</topic><topic>Performance evaluation</topic><topic>Roads</topic><topic>Sampling techniques</topic><topic>Seat belts</topic><topic>Smartphones</topic><topic>Support vector machines</topic><topic>Surface properties</topic><topic>Sustainability</topic><topic>Traffic accidents &amp; safety</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Taamneh, Madhar M</creatorcontrib><creatorcontrib>Taamneh, Salah</creatorcontrib><creatorcontrib>Alomari, Ahmad H</creatorcontrib><creatorcontrib>Abuaddous, Musab</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>University Readers</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Sustainability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Taamneh, Madhar M</au><au>Taamneh, Salah</au><au>Alomari, Ahmad H</au><au>Abuaddous, Musab</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use</atitle><jtitle>Sustainability</jtitle><date>2023-07-01</date><risdate>2023</risdate><volume>15</volume><issue>13</issue><spage>10668</spage><pages>10668-</pages><issn>2071-1050</issn><eissn>2071-1050</eissn><abstract>Distracted driving leads to a significant number of road crashes worldwide. Smartphone use is one of the most common causes of cognitive distraction among drivers. Available data on drivers’ phone use presents an invaluable opportunity to identify the main factors behind this behavior. Machine learning (ML) techniques are among the most effective techniques for this purpose. However, the potential and usefulness of these techniques are limited, due to the imbalance of available data. The majority class of instances collected is for drivers who do not use their phones, while the minority class is for those who do use their phones. This paper evaluates two main approaches for handling imbalanced datasets on driver phone use. These methods include oversampling and undersampling. The effectiveness of each method was evaluated using six ML techniques: Multilayer Perceptron (MLP), Support Vector Machine (SVM), Naive Bayes (NB), Bayesian Network (BayesNet), J48, and ID3. The proposed methods were also evaluated on three Deep Learning (DL) models: Arch1 (5 hidden layers), Arch2 (10 hidden layers), and Arch3 (15 hidden layers). The data used in this document were collected through a direct observation study to explore a set of human, vehicle, and road surface characteristics. The results showed that all ML methods, as well as DL methods, achieved balanced accuracy values for both classes. ID3, J48, and MLP methods outperformed the rest of the ML methods in all scenarios, with ID3 achieving slightly better accuracy. The DL methods also provided good performances, especially for the undersampling data. The results also showed that the classification methods performed best on the undersampled data. It was concluded that road classification has the highest impact on cell phone use, followed by driver age group, driver gender, vehicle type, and, finally, driver seatbelt usage.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/su151310668</doi><orcidid>https://orcid.org/0000-0002-2920-0175</orcidid><orcidid>https://orcid.org/0000-0002-4827-548X</orcidid><orcidid>https://orcid.org/0000-0002-2414-0193</orcidid><orcidid>https://orcid.org/0000-0002-4046-8965</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2071-1050
ispartof Sustainability, 2023-07, Vol.15 (13), p.10668
issn 2071-1050
2071-1050
language eng
recordid cdi_proquest_journals_2836498620
source MDPI - Multidisciplinary Digital Publishing Institute; EZB Electronic Journals Library
subjects Accuracy
Algorithms
Automobile drivers
Bayesian analysis
Classification
Datasets
Decision trees
Deep learning
Machine learning
Mathematical models
Methods
Multilayer perceptrons
Neural networks
Performance evaluation
Roads
Sampling techniques
Seat belts
Smartphones
Support vector machines
Surface properties
Sustainability
Traffic accidents & safety
title Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T15%3A32%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analyzing%20the%20Effectiveness%20of%20Imbalanced%20Data%20Handling%20Techniques%20in%20Predicting%20Driver%20Phone%20Use&rft.jtitle=Sustainability&rft.au=Taamneh,%20Madhar%20M&rft.date=2023-07-01&rft.volume=15&rft.issue=13&rft.spage=10668&rft.pages=10668-&rft.issn=2071-1050&rft.eissn=2071-1050&rft_id=info:doi/10.3390/su151310668&rft_dat=%3Cgale_proqu%3EA758355867%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2836498620&rft_id=info:pmid/&rft_galeid=A758355867&rfr_iscdi=true