Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion

Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2023, Vol.25, p.529-542
Hauptverfasser: Mou, Luntian, Zhou, Chao, Xie, Pengtao, Zhao, Pengfei, Jain, Ramesh, Gao, Wen, Yin, Baocai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 542
container_issue
container_start_page 529
container_title IEEE transactions on multimedia
container_volume 25
creator Mou, Luntian
Zhou, Chao
Xie, Pengtao
Zhao, Pengfei
Jain, Ramesh
Gao, Wen
Yin, Baocai
description Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.
doi_str_mv 10.1109/TMM.2021.3128738
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2774332564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9618813</ieee_id><sourcerecordid>2774332564</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</originalsourceid><addsrcrecordid>eNo9kEFLw0AQhYMoWKt3wUvAc-rsbDabHGtrtdDioRWPId1MdEuajbubiv_ehIqnNzO8Nw--ILhlMGEMsoftej1BQDbhDFPJ07NgxLKYRQBSnvezQIgyZHAZXDm3B2CxADkK9NIZb02rVbihuoo2XUv2qB2V4YoK2-jmI6yMDedWH2kQ8-10Q86Fc_KkvDZN-K79Zzj1npphjR6LIb3uaq8PpizqcNG5_n4dXFRF7ejmT8fB2-JpO3uJVq_Py9l0FSnMmI92KEFhmQhRSChYUSYokCvkiImgGElxlYCIUcgUoEoElFimUMUkd5IR8nFwf_rbWvPVkfP53nS26StzlDLmHEUS9y44uZQ1zlmq8tbqQ2F_cgb5ADTvgeYD0PwPaB-5O0U0Ef3bs4SlKeP8F71UcXs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774332564</pqid></control><display><type>article</type><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><source>IEEE Electronic Library (IEL)</source><creator>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</creator><creatorcontrib>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</creatorcontrib><description>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2021.3128738</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Annotations ; Attention ; Blinking ; Coders ; Computational modeling ; Computer vision ; Convolutional neural networks ; Datasets ; Dictionaries ; driver drowsiness detection ; Driver fatigue ; Feature extraction ; Hidden Markov models ; isotropic self-supervised learning (IsoSSL) ; momentum contrast (MoCo) ; multimodal fusion model ; Optical flow (image analysis) ; Representations ; Self-supervised learning ; Sleepiness ; Traffic accidents ; Vehicles ; Videos ; Yawning</subject><ispartof>IEEE transactions on multimedia, 2023, Vol.25, p.529-542</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</citedby><cites>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</cites><orcidid>0000-0003-2373-4966 ; 0000-0002-8070-802X ; 0000-0002-1551-4448 ; 0000-0003-3121-1823 ; 0000-0001-6419-6653</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9618813$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4023,27922,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9618813$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Mou, Luntian</creatorcontrib><creatorcontrib>Zhou, Chao</creatorcontrib><creatorcontrib>Xie, Pengtao</creatorcontrib><creatorcontrib>Zhao, Pengfei</creatorcontrib><creatorcontrib>Jain, Ramesh</creatorcontrib><creatorcontrib>Gao, Wen</creatorcontrib><creatorcontrib>Yin, Baocai</creatorcontrib><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</description><subject>Annotations</subject><subject>Attention</subject><subject>Blinking</subject><subject>Coders</subject><subject>Computational modeling</subject><subject>Computer vision</subject><subject>Convolutional neural networks</subject><subject>Datasets</subject><subject>Dictionaries</subject><subject>driver drowsiness detection</subject><subject>Driver fatigue</subject><subject>Feature extraction</subject><subject>Hidden Markov models</subject><subject>isotropic self-supervised learning (IsoSSL)</subject><subject>momentum contrast (MoCo)</subject><subject>multimodal fusion model</subject><subject>Optical flow (image analysis)</subject><subject>Representations</subject><subject>Self-supervised learning</subject><subject>Sleepiness</subject><subject>Traffic accidents</subject><subject>Vehicles</subject><subject>Videos</subject><subject>Yawning</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFLw0AQhYMoWKt3wUvAc-rsbDabHGtrtdDioRWPId1MdEuajbubiv_ehIqnNzO8Nw--ILhlMGEMsoftej1BQDbhDFPJ07NgxLKYRQBSnvezQIgyZHAZXDm3B2CxADkK9NIZb02rVbihuoo2XUv2qB2V4YoK2-jmI6yMDedWH2kQ8-10Q86Fc_KkvDZN-K79Zzj1npphjR6LIb3uaq8PpizqcNG5_n4dXFRF7ejmT8fB2-JpO3uJVq_Py9l0FSnMmI92KEFhmQhRSChYUSYokCvkiImgGElxlYCIUcgUoEoElFimUMUkd5IR8nFwf_rbWvPVkfP53nS26StzlDLmHEUS9y44uZQ1zlmq8tbqQ2F_cgb5ADTvgeYD0PwPaB-5O0U0Ef3bs4SlKeP8F71UcXs</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Mou, Luntian</creator><creator>Zhou, Chao</creator><creator>Xie, Pengtao</creator><creator>Zhao, Pengfei</creator><creator>Jain, Ramesh</creator><creator>Gao, Wen</creator><creator>Yin, Baocai</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2373-4966</orcidid><orcidid>https://orcid.org/0000-0002-8070-802X</orcidid><orcidid>https://orcid.org/0000-0002-1551-4448</orcidid><orcidid>https://orcid.org/0000-0003-3121-1823</orcidid><orcidid>https://orcid.org/0000-0001-6419-6653</orcidid></search><sort><creationdate>2023</creationdate><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><author>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>Attention</topic><topic>Blinking</topic><topic>Coders</topic><topic>Computational modeling</topic><topic>Computer vision</topic><topic>Convolutional neural networks</topic><topic>Datasets</topic><topic>Dictionaries</topic><topic>driver drowsiness detection</topic><topic>Driver fatigue</topic><topic>Feature extraction</topic><topic>Hidden Markov models</topic><topic>isotropic self-supervised learning (IsoSSL)</topic><topic>momentum contrast (MoCo)</topic><topic>multimodal fusion model</topic><topic>Optical flow (image analysis)</topic><topic>Representations</topic><topic>Self-supervised learning</topic><topic>Sleepiness</topic><topic>Traffic accidents</topic><topic>Vehicles</topic><topic>Videos</topic><topic>Yawning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mou, Luntian</creatorcontrib><creatorcontrib>Zhou, Chao</creatorcontrib><creatorcontrib>Xie, Pengtao</creatorcontrib><creatorcontrib>Zhao, Pengfei</creatorcontrib><creatorcontrib>Jain, Ramesh</creatorcontrib><creatorcontrib>Gao, Wen</creatorcontrib><creatorcontrib>Yin, Baocai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mou, Luntian</au><au>Zhou, Chao</au><au>Xie, Pengtao</au><au>Zhao, Pengfei</au><au>Jain, Ramesh</au><au>Gao, Wen</au><au>Yin, Baocai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023</date><risdate>2023</risdate><volume>25</volume><spage>529</spage><epage>542</epage><pages>529-542</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2021.3128738</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2373-4966</orcidid><orcidid>https://orcid.org/0000-0002-8070-802X</orcidid><orcidid>https://orcid.org/0000-0002-1551-4448</orcidid><orcidid>https://orcid.org/0000-0003-3121-1823</orcidid><orcidid>https://orcid.org/0000-0001-6419-6653</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2023, Vol.25, p.529-542
issn 1520-9210
1941-0077
language eng
recordid cdi_proquest_journals_2774332564
source IEEE Electronic Library (IEL)
subjects Annotations
Attention
Blinking
Coders
Computational modeling
Computer vision
Convolutional neural networks
Datasets
Dictionaries
driver drowsiness detection
Driver fatigue
Feature extraction
Hidden Markov models
isotropic self-supervised learning (IsoSSL)
momentum contrast (MoCo)
multimodal fusion model
Optical flow (image analysis)
Representations
Self-supervised learning
Sleepiness
Traffic accidents
Vehicles
Videos
Yawning
title Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T07%3A44%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Isotropic%20Self-Supervised%20Learning%20for%20Driver%20Drowsiness%20Detection%20With%20Attention-Based%20Multimodal%20Fusion&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Mou,%20Luntian&rft.date=2023&rft.volume=25&rft.spage=529&rft.epage=542&rft.pages=529-542&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2021.3128738&rft_dat=%3Cproquest_RIE%3E2774332564%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2774332564&rft_id=info:pmid/&rft_ieee_id=9618813&rfr_iscdi=true