Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion
Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of s...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on multimedia 2023, Vol.25, p.529-542 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 542 |
---|---|
container_issue | |
container_start_page | 529 |
container_title | IEEE transactions on multimedia |
container_volume | 25 |
creator | Mou, Luntian Zhou, Chao Xie, Pengtao Zhao, Pengfei Jain, Ramesh Gao, Wen Yin, Baocai |
description | Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods. |
doi_str_mv | 10.1109/TMM.2021.3128738 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2774332564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9618813</ieee_id><sourcerecordid>2774332564</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</originalsourceid><addsrcrecordid>eNo9kEFLw0AQhYMoWKt3wUvAc-rsbDabHGtrtdDioRWPId1MdEuajbubiv_ehIqnNzO8Nw--ILhlMGEMsoftej1BQDbhDFPJ07NgxLKYRQBSnvezQIgyZHAZXDm3B2CxADkK9NIZb02rVbihuoo2XUv2qB2V4YoK2-jmI6yMDedWH2kQ8-10Q86Fc_KkvDZN-K79Zzj1npphjR6LIb3uaq8PpizqcNG5_n4dXFRF7ejmT8fB2-JpO3uJVq_Py9l0FSnMmI92KEFhmQhRSChYUSYokCvkiImgGElxlYCIUcgUoEoElFimUMUkd5IR8nFwf_rbWvPVkfP53nS26StzlDLmHEUS9y44uZQ1zlmq8tbqQ2F_cgb5ADTvgeYD0PwPaB-5O0U0Ef3bs4SlKeP8F71UcXs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774332564</pqid></control><display><type>article</type><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><source>IEEE Electronic Library (IEL)</source><creator>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</creator><creatorcontrib>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</creatorcontrib><description>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2021.3128738</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Annotations ; Attention ; Blinking ; Coders ; Computational modeling ; Computer vision ; Convolutional neural networks ; Datasets ; Dictionaries ; driver drowsiness detection ; Driver fatigue ; Feature extraction ; Hidden Markov models ; isotropic self-supervised learning (IsoSSL) ; momentum contrast (MoCo) ; multimodal fusion model ; Optical flow (image analysis) ; Representations ; Self-supervised learning ; Sleepiness ; Traffic accidents ; Vehicles ; Videos ; Yawning</subject><ispartof>IEEE transactions on multimedia, 2023, Vol.25, p.529-542</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</citedby><cites>FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</cites><orcidid>0000-0003-2373-4966 ; 0000-0002-8070-802X ; 0000-0002-1551-4448 ; 0000-0003-3121-1823 ; 0000-0001-6419-6653</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9618813$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4023,27922,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9618813$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Mou, Luntian</creatorcontrib><creatorcontrib>Zhou, Chao</creatorcontrib><creatorcontrib>Xie, Pengtao</creatorcontrib><creatorcontrib>Zhao, Pengfei</creatorcontrib><creatorcontrib>Jain, Ramesh</creatorcontrib><creatorcontrib>Gao, Wen</creatorcontrib><creatorcontrib>Yin, Baocai</creatorcontrib><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</description><subject>Annotations</subject><subject>Attention</subject><subject>Blinking</subject><subject>Coders</subject><subject>Computational modeling</subject><subject>Computer vision</subject><subject>Convolutional neural networks</subject><subject>Datasets</subject><subject>Dictionaries</subject><subject>driver drowsiness detection</subject><subject>Driver fatigue</subject><subject>Feature extraction</subject><subject>Hidden Markov models</subject><subject>isotropic self-supervised learning (IsoSSL)</subject><subject>momentum contrast (MoCo)</subject><subject>multimodal fusion model</subject><subject>Optical flow (image analysis)</subject><subject>Representations</subject><subject>Self-supervised learning</subject><subject>Sleepiness</subject><subject>Traffic accidents</subject><subject>Vehicles</subject><subject>Videos</subject><subject>Yawning</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFLw0AQhYMoWKt3wUvAc-rsbDabHGtrtdDioRWPId1MdEuajbubiv_ehIqnNzO8Nw--ILhlMGEMsoftej1BQDbhDFPJ07NgxLKYRQBSnvezQIgyZHAZXDm3B2CxADkK9NIZb02rVbihuoo2XUv2qB2V4YoK2-jmI6yMDedWH2kQ8-10Q86Fc_KkvDZN-K79Zzj1npphjR6LIb3uaq8PpizqcNG5_n4dXFRF7ejmT8fB2-JpO3uJVq_Py9l0FSnMmI92KEFhmQhRSChYUSYokCvkiImgGElxlYCIUcgUoEoElFimUMUkd5IR8nFwf_rbWvPVkfP53nS26StzlDLmHEUS9y44uZQ1zlmq8tbqQ2F_cgb5ADTvgeYD0PwPaB-5O0U0Ef3bs4SlKeP8F71UcXs</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Mou, Luntian</creator><creator>Zhou, Chao</creator><creator>Xie, Pengtao</creator><creator>Zhao, Pengfei</creator><creator>Jain, Ramesh</creator><creator>Gao, Wen</creator><creator>Yin, Baocai</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2373-4966</orcidid><orcidid>https://orcid.org/0000-0002-8070-802X</orcidid><orcidid>https://orcid.org/0000-0002-1551-4448</orcidid><orcidid>https://orcid.org/0000-0003-3121-1823</orcidid><orcidid>https://orcid.org/0000-0001-6419-6653</orcidid></search><sort><creationdate>2023</creationdate><title>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</title><author>Mou, Luntian ; Zhou, Chao ; Xie, Pengtao ; Zhao, Pengfei ; Jain, Ramesh ; Gao, Wen ; Yin, Baocai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-b270c2d655a70a1ad62523c232265e42ec3c6054257800f650d2d80f4e7b71e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>Attention</topic><topic>Blinking</topic><topic>Coders</topic><topic>Computational modeling</topic><topic>Computer vision</topic><topic>Convolutional neural networks</topic><topic>Datasets</topic><topic>Dictionaries</topic><topic>driver drowsiness detection</topic><topic>Driver fatigue</topic><topic>Feature extraction</topic><topic>Hidden Markov models</topic><topic>isotropic self-supervised learning (IsoSSL)</topic><topic>momentum contrast (MoCo)</topic><topic>multimodal fusion model</topic><topic>Optical flow (image analysis)</topic><topic>Representations</topic><topic>Self-supervised learning</topic><topic>Sleepiness</topic><topic>Traffic accidents</topic><topic>Vehicles</topic><topic>Videos</topic><topic>Yawning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mou, Luntian</creatorcontrib><creatorcontrib>Zhou, Chao</creatorcontrib><creatorcontrib>Xie, Pengtao</creatorcontrib><creatorcontrib>Zhao, Pengfei</creatorcontrib><creatorcontrib>Jain, Ramesh</creatorcontrib><creatorcontrib>Gao, Wen</creatorcontrib><creatorcontrib>Yin, Baocai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mou, Luntian</au><au>Zhou, Chao</au><au>Xie, Pengtao</au><au>Zhao, Pengfei</au><au>Jain, Ramesh</au><au>Gao, Wen</au><au>Yin, Baocai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023</date><risdate>2023</risdate><volume>25</volume><spage>529</spage><epage>542</epage><pages>529-542</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Driverdrowsiness is an important cause of traffic accidents. Many studies using computer vision techniques to detect driver drowsiness states, such as slow blinking, yawning, and nodding, have demonstrated excellent potential. Although existing studies have made significant progress, the number of samples in the training corpora is small, which makes it difficult for a model to learn effective drowsiness representations from images or videos. To address this issue, we develop an isotropic self-supervised learning (IsoSSL) approach to learn powerful representations of images without relying on human-provided annotations and propose an IsoSSL-MoCo model by combining IsoSSL with momentum contrast (MoCo). To exploit the complementarity of multimodal data, an attention-based multimodal fusion model is also proposed to fuse features from the eye, mouth, and optical flow of the head. Specifically, we first use the IsoSSL-MoCo model to pretrain the image encoders for the three modalities in other datasets. Then, these encoders are fine-tuned and integrated into the proposed fusion model. The feature vectors generated by the image encoders of the three modalities are fed into the recursive layer to extract temporal information. To capture the importance degrees of the effects of temporal features from the three modalities on drowsiness detection, an attention mechanism is introduced to automatically weigh the feature vectors from the recursive layer to improve detection accuracy. Finally, a vector representation is generated by the attention layer and is used to detect driver drowsiness states. Experimental results based on two challenging datasets show that our method outperforms the baseline methods and the latest existing methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2021.3128738</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2373-4966</orcidid><orcidid>https://orcid.org/0000-0002-8070-802X</orcidid><orcidid>https://orcid.org/0000-0002-1551-4448</orcidid><orcidid>https://orcid.org/0000-0003-3121-1823</orcidid><orcidid>https://orcid.org/0000-0001-6419-6653</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-9210 |
ispartof | IEEE transactions on multimedia, 2023, Vol.25, p.529-542 |
issn | 1520-9210 1941-0077 |
language | eng |
recordid | cdi_proquest_journals_2774332564 |
source | IEEE Electronic Library (IEL) |
subjects | Annotations Attention Blinking Coders Computational modeling Computer vision Convolutional neural networks Datasets Dictionaries driver drowsiness detection Driver fatigue Feature extraction Hidden Markov models isotropic self-supervised learning (IsoSSL) momentum contrast (MoCo) multimodal fusion model Optical flow (image analysis) Representations Self-supervised learning Sleepiness Traffic accidents Vehicles Videos Yawning |
title | Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T07%3A44%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Isotropic%20Self-Supervised%20Learning%20for%20Driver%20Drowsiness%20Detection%20With%20Attention-Based%20Multimodal%20Fusion&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Mou,%20Luntian&rft.date=2023&rft.volume=25&rft.spage=529&rft.epage=542&rft.pages=529-542&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2021.3128738&rft_dat=%3Cproquest_RIE%3E2774332564%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2774332564&rft_id=info:pmid/&rft_ieee_id=9618813&rfr_iscdi=true |