Visual–auditory learning network for construction equipment action detection

Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer-aided civil and infrastructure engineering 2023-09, Vol.38 (14), p.1916-1934
Hauptverfasser: Jung, Seunghoon, Jeoung, Jaewon, Lee, Dong‐Eun, Jang, Hyounseung, Hong, Taehoon
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1934
container_issue 14
container_start_page 1916
container_title Computer-aided civil and infrastructure engineering
container_volume 38
creator Jung, Seunghoon
Jeoung, Jaewon
Lee, Dong‐Eun
Jang, Hyounseung
Hong, Taehoon
description Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based action detection of various types of construction equipment. Therefore, this study aims to develop a visual–auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi‐modal feature extractor. In addition, the multi‐head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content‐based attention mechanism and the dot‐product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi‐head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively.
doi_str_mv 10.1111/mice.12983
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2859622393</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2859622393</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3013-63682f4dd73e839dca438a80c1dc085650a395c911b91695cdce192e99448f1c3</originalsourceid><addsrcrecordid>eNp9kMFKAzEURYMoWKsbv2DAnTA1mWQyyVJK1ULVjboNMclI6jRpkwylO__BP_RLTDuuvZt3eZz3LlwALhGcoKyblVVmgirO8BEYIUKbklHaHGcPOS45Zc0pOItxCbMIwSPw9GZjL7ufr2_Za5t82BWdkcFZ91E4k7Y-fBatD4XyLqbQq2S9K8ymt-uVcamQw0KbZA7uHJy0sovm4m-Owevd7GX6UC6e7-fT20WpMES4pJiyqiVaN9gwzLWSBDPJoEJaQVbTGkrMa8UReueIZqeVQbwynBPCWqTwGFwNf9fBb3oTk1j6PrgcKSpWc1pVmONMXQ-UCj7GYFqxDnYlw04gKPZ9iX1f4tBXhtEAb21ndv-Q4nE-nQ03v5DcbvQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2859622393</pqid></control><display><type>article</type><title>Visual–auditory learning network for construction equipment action detection</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Jung, Seunghoon ; Jeoung, Jaewon ; Lee, Dong‐Eun ; Jang, Hyounseung ; Hong, Taehoon</creator><creatorcontrib>Jung, Seunghoon ; Jeoung, Jaewon ; Lee, Dong‐Eun ; Jang, Hyounseung ; Hong, Taehoon</creatorcontrib><description>Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based action detection of various types of construction equipment. Therefore, this study aims to develop a visual–auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi‐modal feature extractor. In addition, the multi‐head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content‐based attention mechanism and the dot‐product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi‐head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively.</description><identifier>ISSN: 1093-9687</identifier><identifier>EISSN: 1467-8667</identifier><identifier>DOI: 10.1111/mice.12983</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc</publisher><subject>Classification ; Computer vision ; Construction equipment ; Construction sites ; Learning ; Localization ; Modules ; Signal monitoring ; Tracking devices</subject><ispartof>Computer-aided civil and infrastructure engineering, 2023-09, Vol.38 (14), p.1916-1934</ispartof><rights>2023  .</rights><rights>2023 Computer‐Aided Civil and Infrastructure Engineering.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3013-63682f4dd73e839dca438a80c1dc085650a395c911b91695cdce192e99448f1c3</citedby><cites>FETCH-LOGICAL-c3013-63682f4dd73e839dca438a80c1dc085650a395c911b91695cdce192e99448f1c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fmice.12983$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fmice.12983$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Jung, Seunghoon</creatorcontrib><creatorcontrib>Jeoung, Jaewon</creatorcontrib><creatorcontrib>Lee, Dong‐Eun</creatorcontrib><creatorcontrib>Jang, Hyounseung</creatorcontrib><creatorcontrib>Hong, Taehoon</creatorcontrib><title>Visual–auditory learning network for construction equipment action detection</title><title>Computer-aided civil and infrastructure engineering</title><description>Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based action detection of various types of construction equipment. Therefore, this study aims to develop a visual–auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi‐modal feature extractor. In addition, the multi‐head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content‐based attention mechanism and the dot‐product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi‐head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively.</description><subject>Classification</subject><subject>Computer vision</subject><subject>Construction equipment</subject><subject>Construction sites</subject><subject>Learning</subject><subject>Localization</subject><subject>Modules</subject><subject>Signal monitoring</subject><subject>Tracking devices</subject><issn>1093-9687</issn><issn>1467-8667</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKAzEURYMoWKsbv2DAnTA1mWQyyVJK1ULVjboNMclI6jRpkwylO__BP_RLTDuuvZt3eZz3LlwALhGcoKyblVVmgirO8BEYIUKbklHaHGcPOS45Zc0pOItxCbMIwSPw9GZjL7ufr2_Za5t82BWdkcFZ91E4k7Y-fBatD4XyLqbQq2S9K8ymt-uVcamQw0KbZA7uHJy0sovm4m-Owevd7GX6UC6e7-fT20WpMES4pJiyqiVaN9gwzLWSBDPJoEJaQVbTGkrMa8UReueIZqeVQbwynBPCWqTwGFwNf9fBb3oTk1j6PrgcKSpWc1pVmONMXQ-UCj7GYFqxDnYlw04gKPZ9iX1f4tBXhtEAb21ndv-Q4nE-nQ03v5DcbvQ</recordid><startdate>20230901</startdate><enddate>20230901</enddate><creator>Jung, Seunghoon</creator><creator>Jeoung, Jaewon</creator><creator>Lee, Dong‐Eun</creator><creator>Jang, Hyounseung</creator><creator>Hong, Taehoon</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20230901</creationdate><title>Visual–auditory learning network for construction equipment action detection</title><author>Jung, Seunghoon ; Jeoung, Jaewon ; Lee, Dong‐Eun ; Jang, Hyounseung ; Hong, Taehoon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3013-63682f4dd73e839dca438a80c1dc085650a395c911b91695cdce192e99448f1c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Computer vision</topic><topic>Construction equipment</topic><topic>Construction sites</topic><topic>Learning</topic><topic>Localization</topic><topic>Modules</topic><topic>Signal monitoring</topic><topic>Tracking devices</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jung, Seunghoon</creatorcontrib><creatorcontrib>Jeoung, Jaewon</creatorcontrib><creatorcontrib>Lee, Dong‐Eun</creatorcontrib><creatorcontrib>Jang, Hyounseung</creatorcontrib><creatorcontrib>Hong, Taehoon</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer-aided civil and infrastructure engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jung, Seunghoon</au><au>Jeoung, Jaewon</au><au>Lee, Dong‐Eun</au><au>Jang, Hyounseung</au><au>Hong, Taehoon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Visual–auditory learning network for construction equipment action detection</atitle><jtitle>Computer-aided civil and infrastructure engineering</jtitle><date>2023-09-01</date><risdate>2023</risdate><volume>38</volume><issue>14</issue><spage>1916</spage><epage>1934</epage><pages>1916-1934</pages><issn>1093-9687</issn><eissn>1467-8667</eissn><abstract>Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based action detection of various types of construction equipment. Therefore, this study aims to develop a visual–auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi‐modal feature extractor. In addition, the multi‐head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content‐based attention mechanism and the dot‐product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi‐head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1111/mice.12983</doi><tpages>19</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1093-9687
ispartof Computer-aided civil and infrastructure engineering, 2023-09, Vol.38 (14), p.1916-1934
issn 1093-9687
1467-8667
language eng
recordid cdi_proquest_journals_2859622393
source Wiley Online Library Journals Frontfile Complete
subjects Classification
Computer vision
Construction equipment
Construction sites
Learning
Localization
Modules
Signal monitoring
Tracking devices
title Visual–auditory learning network for construction equipment action detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T05%3A43%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Visual%E2%80%93auditory%20learning%20network%20for%20construction%20equipment%20action%20detection&rft.jtitle=Computer-aided%20civil%20and%20infrastructure%20engineering&rft.au=Jung,%20Seunghoon&rft.date=2023-09-01&rft.volume=38&rft.issue=14&rft.spage=1916&rft.epage=1934&rft.pages=1916-1934&rft.issn=1093-9687&rft.eissn=1467-8667&rft_id=info:doi/10.1111/mice.12983&rft_dat=%3Cproquest_cross%3E2859622393%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2859622393&rft_id=info:pmid/&rfr_iscdi=true