Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection
In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2023-04, Vol.29 (2), p.587-603 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 603 |
---|---|
container_issue | 2 |
container_start_page | 587 |
container_title | Multimedia systems |
container_volume | 29 |
creator | Li, Xiao Ma, Shexiang Shan, Liqing Li, Xiao |
description | In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method. |
doi_str_mv | 10.1007/s00530-022-00993-9 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2780437834</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2780437834</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7FeOUvyCipd6Dml2Ilu3yZpkKf33bq3gzdPM4XneGV5CrjnccoD6LgGUEhgIwQCUkkydkBkvpGC8acQpmYEqBCtUJc7JRUobAF5XEmYEX8c-d2zX-Tbs6Coan1yIW4x0MNH0PfbUjakLnjo0eYxIh300266lHvMuxE864XTAFlOOnfE0xA59NvmgtJjRHrZLcuZMn_Dqd87J--PDavHMlm9PL4v7JbOSq8ys4litERsDpRNW1OuibWTBBS_Ltqp42aJxsAZXowXlKlsr29SVs5UTXAGXc3JzzB1i-Bqnl_QmjNFPJ7WoGyhkPcVNlDhSNoaUIjo9xG5r4l5z0Ic69bFOPdWpf-rUapLkUUoT7D8w_kX_Y30DeRR6hw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2780437834</pqid></control><display><type>article</type><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><source>Springer Nature - Complete Springer Journals</source><creator>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</creator><creatorcontrib>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</creatorcontrib><description>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00993-9</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Body parts ; Classification ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Datasets ; Feature extraction ; Modules ; Multilayers ; Multimedia Information Systems ; Nodes ; Object recognition ; Operating Systems ; Orientation ; Pedestrians ; Regular Paper ; Spatial dependencies ; Transformers</subject><ispartof>Multimedia systems, 2023-04, Vol.29 (2), p.587-603</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</citedby><cites>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00993-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00993-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Li, Xiao</creatorcontrib><creatorcontrib>Ma, Shexiang</creatorcontrib><creatorcontrib>Shan, Liqing</creatorcontrib><creatorcontrib>Li, Xiao</creatorcontrib><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</description><subject>Algorithms</subject><subject>Body parts</subject><subject>Classification</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Modules</subject><subject>Multilayers</subject><subject>Multimedia Information Systems</subject><subject>Nodes</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Orientation</subject><subject>Pedestrians</subject><subject>Regular Paper</subject><subject>Spatial dependencies</subject><subject>Transformers</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7FeOUvyCipd6Dml2Ilu3yZpkKf33bq3gzdPM4XneGV5CrjnccoD6LgGUEhgIwQCUkkydkBkvpGC8acQpmYEqBCtUJc7JRUobAF5XEmYEX8c-d2zX-Tbs6Coan1yIW4x0MNH0PfbUjakLnjo0eYxIh300266lHvMuxE864XTAFlOOnfE0xA59NvmgtJjRHrZLcuZMn_Dqd87J--PDavHMlm9PL4v7JbOSq8ys4litERsDpRNW1OuibWTBBS_Ltqp42aJxsAZXowXlKlsr29SVs5UTXAGXc3JzzB1i-Bqnl_QmjNFPJ7WoGyhkPcVNlDhSNoaUIjo9xG5r4l5z0Ic69bFOPdWpf-rUapLkUUoT7D8w_kX_Y30DeRR6hw</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Li, Xiao</creator><creator>Ma, Shexiang</creator><creator>Shan, Liqing</creator><creator>Li, Xiao</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230401</creationdate><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><author>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Body parts</topic><topic>Classification</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Modules</topic><topic>Multilayers</topic><topic>Multimedia Information Systems</topic><topic>Nodes</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Orientation</topic><topic>Pedestrians</topic><topic>Regular Paper</topic><topic>Spatial dependencies</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiao</creatorcontrib><creatorcontrib>Ma, Shexiang</creatorcontrib><creatorcontrib>Shan, Liqing</creatorcontrib><creatorcontrib>Li, Xiao</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiao</au><au>Ma, Shexiang</au><au>Shan, Liqing</au><au>Li, Xiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2023-04-01</date><risdate>2023</risdate><volume>29</volume><issue>2</issue><spage>587</spage><epage>603</epage><pages>587-603</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00993-9</doi><tpages>17</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0942-4962 |
ispartof | Multimedia systems, 2023-04, Vol.29 (2), p.587-603 |
issn | 0942-4962 1432-1882 |
language | eng |
recordid | cdi_proquest_journals_2780437834 |
source | Springer Nature - Complete Springer Journals |
subjects | Algorithms Body parts Classification Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Datasets Feature extraction Modules Multilayers Multimedia Information Systems Nodes Object recognition Operating Systems Orientation Pedestrians Regular Paper Spatial dependencies Transformers |
title | Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T09%3A18%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-window%20Transformer%20parallel%20fusion%20feature%20pyramid%20network%20for%20pedestrian%20orientation%20detection&rft.jtitle=Multimedia%20systems&rft.au=Li,%20Xiao&rft.date=2023-04-01&rft.volume=29&rft.issue=2&rft.spage=587&rft.epage=603&rft.pages=587-603&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00993-9&rft_dat=%3Cproquest_cross%3E2780437834%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2780437834&rft_id=info:pmid/&rfr_iscdi=true |