Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion

Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2023, Vol.25, p.529-542
Hauptverfasser:	Mou, Luntian, Zhou, Chao, Xie, Pengtao, Zhao, Pengfei, Jain, Ramesh, Gao, Wen, Yin, Baocai
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Attention Blinking Coders Computational modeling Computer vision Convolutional neural networks Datasets Dictionaries driver drowsiness detection Driver fatigue Feature extraction Hidden Markov models isotropic self-supervised learning (IsoSSL) momentum contrast (MoCo) multimodal fusion model Optical flow (image analysis) Representations Self-supervised learning Sleepiness Traffic accidents Vehicles Videos Yawning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	542
container_issue
container_start_page	529
container_title	IEEE transactions on multimedia
container_volume	25
creator	Mou, Luntian Zhou, Chao Xie, Pengtao Zhao, Pengfei Jain, Ramesh Gao, Wen Yin, Baocai
description	Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.
doi_str_mv	10.1109/TMM.2021.3128738
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2774332564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9618813</ieee_id><sourcerecordid>2774332564</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</originalsourceid><addsrcrecordid>eNo9kEFLw0AQhYMoWKt3wUvAc-rsbDabHGtrtdDioRWPId1MdEuajbubiv_ehIqnNzO8Nw--ILhlMGEMsoftej1BQDbhDFPJ07NgxLKYRQBSnvezQIgyZHAZXDm3B2CxADkK9NIZb02rVbihuoo2XUv2qB2V4YoK2-jmI6yMDedWH2kQ8-10Q86Fc_KkvDZN-K79Zzj1npphjR6LIb3uaq8PpizqcNG5_n4dXFRF7ejmT8fB2-JpO3uJVq_Py9l0FSnMmI92KEFhmQhRSChYUSYokCvkiImgGElxlYCIUcgUoEoElFimUMUkd5IR8nFwf_rbWvPVkfP53nS26StzlDLmHEUS9y44uZQ1zlmq8tbqQ2F_cgb5ADTvgeYD0PwPaB-5O0U0Ef3bs4SlKeP8F71UcXs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774332564</pqid></control><display><type>article</type><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><source>IEEE Electronic Library (IEL)</source><creator>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</creator><creatorcontrib>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</creatorcontrib><description>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2021.3128738</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Annotations ; Attention ; Blinking ; Coders ; Computational modeling ; Computer vision ; Convolutional neural networks ; Datasets ; Dictionaries ; driver drowsiness detection ; Driver fatigue ; Feature extraction ; Hidden Markov models ; isotropic self-supervised learning (IsoSSL) ; momentum contrast (MoCo) ; multimodal fusion model ; Optical flow (image analysis) ; Representations ; Self-supervised learning ; Sleepiness ; Traffic accidents ; Vehicles ; Videos ; Yawning</subject><ispartof>IEEE transactions on multimedia, 2023, Vol.25, p.529-542</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</citedby><cites>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</cites><orcidid>0000-0003-2373-4966 ; 0000-0002-8070-802X ; 0000-0002-1551-4448 ; 0000-0003-3121-1823 ; 0000-0001-6419-6653</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9618813$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4023,27922,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9618813$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Mou, Luntian</creatorcontrib><creatorcontrib>Zhou, Chao</creatorcontrib><creatorcontrib>Xie, Pengtao</creatorcontrib><creatorcontrib>Zhao, Pengfei</creatorcontrib><creatorcontrib>Jain, Ramesh</creatorcontrib><creatorcontrib>Gao, Wen</creatorcontrib><creatorcontrib>Yin, Baocai</creatorcontrib><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</description><subject>Annotations</subject><subject>Attention</subject><subject>Blinking</subject><subject>Coders</subject><subject>Computational modeling</subject><subject>Computer vision</subject><subject>Convolutional neural networks</subject><subject>Datasets</subject><subject>Dictionaries</subject><subject>driver drowsiness detection</subject><subject>Driver fatigue</subject><subject>Feature extraction</subject><subject>Hidden Markov models</subject><subject>isotropic self-supervised learning (IsoSSL)</subject><subject>momentum contrast (MoCo)</subject><subject>multimodal fusion model</subject><subject>Optical flow (image analysis)</subject><subject>Representations</subject><subject>Self-supervised learning</subject><subject>Sleepiness</subject><subject>Traffic accidents</subject><subject>Vehicles</subject><subject>Videos</subject><subject>Yawning</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFLw0AQhYMoWKt3wUvAc-rsbDabHGtrtdDioRWPId1MdEuajbubiv_ehIqnNzO8Nw--ILhlMGEMsoftej1BQDbhDFPJ07NgxLKYRQBSnvezQIgyZHAZXDm3B2CxADkK9NIZb02rVbihuoo2XUv2qB2V4YoK2-jmI6yMDedWH2kQ8-10Q86Fc_KkvDZN-K79Zzj1npphjR6LIb3uaq8PpizqcNG5_n4dXFRF7ejmT8fB2-JpO3uJVq_Py9l0FSnMmI92KEFhmQhRSChYUSYokCvkiImgGElxlYCIUcgUoEoElFimUMUkd5IR8nFwf_rbWvPVkfP53nS26StzlDLmHEUS9y44uZQ1zlmq8tbqQ2F_cgb5ADTvgeYD0PwPaB-5O0U0Ef3bs4SlKeP8F71UcXs</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Mou, Luntian</creator><creator>Zhou, Chao</creator><creator>Xie, Pengtao</creator><creator>Zhao, Pengfei</creator><creator>Jain, Ramesh</creator><creator>Gao, Wen</creator><creator>Yin, Baocai</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2373-4966</orcidid><orcidid>https://orcid.org/0000-0002-8070-802X</orcidid><orcidid>https://orcid.org/0000-0002-1551-4448</orcidid><orcidid>https://orcid.org/0000-0003-3121-1823</orcidid><orcidid>https://orcid.org/0000-0001-6419-6653</orcidid></search><sort><creationdate>2023</creationdate><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><author>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>Attention</topic><topic>Blinking</topic><topic>Coders</topic><topic>Computational modeling</topic><topic>Computer vision</topic><topic>Convolutional neural networks</topic><topic>Datasets</topic><topic>Dictionaries</topic><topic>driver drowsiness detection</topic><topic>Driver fatigue</topic><topic>Feature extraction</topic><topic>Hidden Markov models</topic><topic>isotropic self-supervised learning (IsoSSL)</topic><topic>momentum contrast (MoCo)</topic><topic>multimodal fusion model</topic><topic>Optical flow (image analysis)</topic><topic>Representations</topic><topic>Self-supervised learning</topic><topic>Sleepiness</topic><topic>Traffic accidents</topic><topic>Vehicles</topic><topic>Videos</topic><topic>Yawning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mou, Luntian</creatorcontrib><creatorcontrib>Zhou, Chao</creatorcontrib><creatorcontrib>Xie, Pengtao</creatorcontrib><creatorcontrib>Zhao, Pengfei</creatorcontrib><creatorcontrib>Jain, Ramesh</creatorcontrib><creatorcontrib>Gao, Wen</creatorcontrib><creatorcontrib>Yin, Baocai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mou, Luntian</au><au>Zhou, Chao</au><au>Xie, Pengtao</au><au>Zhao, Pengfei</au><au>Jain, Ramesh</au><au>Gao, Wen</au><au>Yin, Baocai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023</date><risdate>2023</risdate><volume>25</volume><spage>529</spage><epage>542</epage><pages>529-542</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2021.3128738</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2373-4966</orcidid><orcidid>https://orcid.org/0000-0002-8070-802X</orcidid><orcidid>https://orcid.org/0000-0002-1551-4448</orcidid><orcidid>https://orcid.org/0000-0003-3121-1823</orcidid><orcidid>https://orcid.org/0000-0001-6419-6653</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2023, Vol.25, p.529-542
issn	1520-9210 1941-0077
language	eng
recordid	cdi_proquest_journals_2774332564
source	IEEE Electronic Library (IEL)
subjects	Annotations Attention Blinking Coders Computational modeling Computer vision Convolutional neural networks Datasets Dictionaries driver drowsiness detection Driver fatigue Feature extraction Hidden Markov models isotropic self-supervised learning (IsoSSL) momentum contrast (MoCo) multimodal fusion model Optical flow (image analysis) Representations Self-supervised learning Sleepiness Traffic accidents Vehicles Videos Yawning
title	Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T07%3A44%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Isotropic%20Self-Supervised%20Learning%20for%20Driver%20Drowsiness%20Detection%20With%20Attention-Based%20Multimodal%20Fusion&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Mou,%20Luntian&rft.date=2023&rft.volume=25&rft.spage=529&rft.epage=542&rft.pages=529-542&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2021.3128738&rft_dat=%3Cproquest_RIE%3E2774332564%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2774332564&rft_id=info:pmid/&rft_ieee_id=9618813&rfr_iscdi=true