Visual–auditory learning network for construction equipment action detection
Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based...
Gespeichert in:
Veröffentlicht in: | Computer-aided civil and infrastructure engineering 2023-09, Vol.38 (14), p.1916-1934 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1934 |
---|---|
container_issue | 14 |
container_start_page | 1916 |
container_title | Computer-aided civil and infrastructure engineering |
container_volume | 38 |
creator | Jung, Seunghoon Jeoung, Jaewon Lee, Dong‐Eun Jang, Hyounseung Hong, Taehoon |
description | Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based action detection of various types of construction equipment. Therefore, this study aims to develop a visual–auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi‐modal feature extractor. In addition, the multi‐head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content‐based attention mechanism and the dot‐product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi‐head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively. |
doi_str_mv | 10.1111/mice.12983 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2859622393</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2859622393</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3013-63682f4dd73e839dca438a80c1dc085650a395c911b91695cdce192e99448f1c3</originalsourceid><addsrcrecordid>eNp9kMFKAzEURYMoWKsbv2DAnTA1mWQyyVJK1ULVjboNMclI6jRpkwylO__BP_RLTDuuvZt3eZz3LlwALhGcoKyblVVmgirO8BEYIUKbklHaHGcPOS45Zc0pOItxCbMIwSPw9GZjL7ufr2_Za5t82BWdkcFZ91E4k7Y-fBatD4XyLqbQq2S9K8ymt-uVcamQw0KbZA7uHJy0sovm4m-Owevd7GX6UC6e7-fT20WpMES4pJiyqiVaN9gwzLWSBDPJoEJaQVbTGkrMa8UReueIZqeVQbwynBPCWqTwGFwNf9fBb3oTk1j6PrgcKSpWc1pVmONMXQ-UCj7GYFqxDnYlw04gKPZ9iX1f4tBXhtEAb21ndv-Q4nE-nQ03v5DcbvQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2859622393</pqid></control><display><type>article</type><title>Visual–auditory learning network for construction equipment action detection</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Jung, Seunghoon ; Jeoung, Jaewon ; Lee, Dong‐Eun ; Jang, Hyounseung ; Hong, Taehoon</creator><creatorcontrib>Jung, Seunghoon ; Jeoung, Jaewon ; Lee, Dong‐Eun ; Jang, Hyounseung ; Hong, Taehoon</creatorcontrib><description>Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based action detection of various types of construction equipment. Therefore, this study aims to develop a visual–auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi‐modal feature extractor. In addition, the multi‐head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content‐based attention mechanism and the dot‐product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi‐head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively.</description><identifier>ISSN: 1093-9687</identifier><identifier>EISSN: 1467-8667</identifier><identifier>DOI: 10.1111/mice.12983</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc</publisher><subject>Classification ; Computer vision ; Construction equipment ; Construction sites ; Learning ; Localization ; Modules ; Signal monitoring ; Tracking devices</subject><ispartof>Computer-aided civil and infrastructure engineering, 2023-09, Vol.38 (14), p.1916-1934</ispartof><rights>2023 .</rights><rights>2023 Computer‐Aided Civil and Infrastructure Engineering.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3013-63682f4dd73e839dca438a80c1dc085650a395c911b91695cdce192e99448f1c3</citedby><cites>FETCH-LOGICAL-c3013-63682f4dd73e839dca438a80c1dc085650a395c911b91695cdce192e99448f1c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fmice.12983$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fmice.12983$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Jung, Seunghoon</creatorcontrib><creatorcontrib>Jeoung, Jaewon</creatorcontrib><creatorcontrib>Lee, Dong‐Eun</creatorcontrib><creatorcontrib>Jang, Hyounseung</creatorcontrib><creatorcontrib>Hong, Taehoon</creatorcontrib><title>Visual–auditory learning network for construction equipment action detection</title><title>Computer-aided civil and infrastructure engineering</title><description>Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based action detection of various types of construction equipment. Therefore, this study aims to develop a visual–auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi‐modal feature extractor. In addition, the multi‐head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content‐based attention mechanism and the dot‐product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi‐head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively.</description><subject>Classification</subject><subject>Computer vision</subject><subject>Construction equipment</subject><subject>Construction sites</subject><subject>Learning</subject><subject>Localization</subject><subject>Modules</subject><subject>Signal monitoring</subject><subject>Tracking devices</subject><issn>1093-9687</issn><issn>1467-8667</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kMFKAzEURYMoWKsbv2DAnTA1mWQyyVJK1ULVjboNMclI6jRpkwylO__BP_RLTDuuvZt3eZz3LlwALhGcoKyblVVmgirO8BEYIUKbklHaHGcPOS45Zc0pOItxCbMIwSPw9GZjL7ufr2_Za5t82BWdkcFZ91E4k7Y-fBatD4XyLqbQq2S9K8ymt-uVcamQw0KbZA7uHJy0sovm4m-Owevd7GX6UC6e7-fT20WpMES4pJiyqiVaN9gwzLWSBDPJoEJaQVbTGkrMa8UReueIZqeVQbwynBPCWqTwGFwNf9fBb3oTk1j6PrgcKSpWc1pVmONMXQ-UCj7GYFqxDnYlw04gKPZ9iX1f4tBXhtEAb21ndv-Q4nE-nQ03v5DcbvQ</recordid><startdate>20230901</startdate><enddate>20230901</enddate><creator>Jung, Seunghoon</creator><creator>Jeoung, Jaewon</creator><creator>Lee, Dong‐Eun</creator><creator>Jang, Hyounseung</creator><creator>Hong, Taehoon</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20230901</creationdate><title>Visual–auditory learning network for construction equipment action detection</title><author>Jung, Seunghoon ; Jeoung, Jaewon ; Lee, Dong‐Eun ; Jang, Hyounseung ; Hong, Taehoon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3013-63682f4dd73e839dca438a80c1dc085650a395c911b91695cdce192e99448f1c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Computer vision</topic><topic>Construction equipment</topic><topic>Construction sites</topic><topic>Learning</topic><topic>Localization</topic><topic>Modules</topic><topic>Signal monitoring</topic><topic>Tracking devices</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jung, Seunghoon</creatorcontrib><creatorcontrib>Jeoung, Jaewon</creatorcontrib><creatorcontrib>Lee, Dong‐Eun</creatorcontrib><creatorcontrib>Jang, Hyounseung</creatorcontrib><creatorcontrib>Hong, Taehoon</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer-aided civil and infrastructure engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jung, Seunghoon</au><au>Jeoung, Jaewon</au><au>Lee, Dong‐Eun</au><au>Jang, Hyounseung</au><au>Hong, Taehoon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Visual–auditory learning network for construction equipment action detection</atitle><jtitle>Computer-aided civil and infrastructure engineering</jtitle><date>2023-09-01</date><risdate>2023</risdate><volume>38</volume><issue>14</issue><spage>1916</spage><epage>1934</epage><pages>1916-1934</pages><issn>1093-9687</issn><eissn>1467-8667</eissn><abstract>Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision‐based action detection of various types of construction equipment. Therefore, this study aims to develop a visual–auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi‐modal feature extractor. In addition, the multi‐head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content‐based attention mechanism and the dot‐product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi‐head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1111/mice.12983</doi><tpages>19</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1093-9687 |
ispartof | Computer-aided civil and infrastructure engineering, 2023-09, Vol.38 (14), p.1916-1934 |
issn | 1093-9687 1467-8667 |
language | eng |
recordid | cdi_proquest_journals_2859622393 |
source | Wiley Online Library Journals Frontfile Complete |
subjects | Classification Computer vision Construction equipment Construction sites Learning Localization Modules Signal monitoring Tracking devices |
title | Visual–auditory learning network for construction equipment action detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T05%3A43%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Visual%E2%80%93auditory%20learning%20network%20for%20construction%20equipment%20action%20detection&rft.jtitle=Computer-aided%20civil%20and%20infrastructure%20engineering&rft.au=Jung,%20Seunghoon&rft.date=2023-09-01&rft.volume=38&rft.issue=14&rft.spage=1916&rft.epage=1934&rft.pages=1916-1934&rft.issn=1093-9687&rft.eissn=1467-8667&rft_id=info:doi/10.1111/mice.12983&rft_dat=%3Cproquest_cross%3E2859622393%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2859622393&rft_id=info:pmid/&rfr_iscdi=true |