Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection

In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia systems 2023-04, Vol.29 (2), p.587-603
Hauptverfasser: Li, Xiao, Ma, Shexiang, Shan, Liqing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 603
container_issue 2
container_start_page 587
container_title Multimedia systems
container_volume 29
creator Li, Xiao
Ma, Shexiang
Shan, Liqing
Li, Xiao
description In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.
doi_str_mv 10.1007/s00530-022-00993-9
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2780437834</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2780437834</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7FeOUvyCipd6Dml2Ilu3yZpkKf33bq3gzdPM4XneGV5CrjnccoD6LgGUEhgIwQCUkkydkBkvpGC8acQpmYEqBCtUJc7JRUobAF5XEmYEX8c-d2zX-Tbs6Coan1yIW4x0MNH0PfbUjakLnjo0eYxIh300266lHvMuxE864XTAFlOOnfE0xA59NvmgtJjRHrZLcuZMn_Dqd87J--PDavHMlm9PL4v7JbOSq8ys4litERsDpRNW1OuibWTBBS_Ltqp42aJxsAZXowXlKlsr29SVs5UTXAGXc3JzzB1i-Bqnl_QmjNFPJ7WoGyhkPcVNlDhSNoaUIjo9xG5r4l5z0Ic69bFOPdWpf-rUapLkUUoT7D8w_kX_Y30DeRR6hw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2780437834</pqid></control><display><type>article</type><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><source>Springer Nature - Complete Springer Journals</source><creator>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</creator><creatorcontrib>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</creatorcontrib><description>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00993-9</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Body parts ; Classification ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Datasets ; Feature extraction ; Modules ; Multilayers ; Multimedia Information Systems ; Nodes ; Object recognition ; Operating Systems ; Orientation ; Pedestrians ; Regular Paper ; Spatial dependencies ; Transformers</subject><ispartof>Multimedia systems, 2023-04, Vol.29 (2), p.587-603</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</citedby><cites>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00993-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00993-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Li, Xiao</creatorcontrib><creatorcontrib>Ma, Shexiang</creatorcontrib><creatorcontrib>Shan, Liqing</creatorcontrib><creatorcontrib>Li, Xiao</creatorcontrib><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</description><subject>Algorithms</subject><subject>Body parts</subject><subject>Classification</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Modules</subject><subject>Multilayers</subject><subject>Multimedia Information Systems</subject><subject>Nodes</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Orientation</subject><subject>Pedestrians</subject><subject>Regular Paper</subject><subject>Spatial dependencies</subject><subject>Transformers</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7FeOUvyCipd6Dml2Ilu3yZpkKf33bq3gzdPM4XneGV5CrjnccoD6LgGUEhgIwQCUkkydkBkvpGC8acQpmYEqBCtUJc7JRUobAF5XEmYEX8c-d2zX-Tbs6Coan1yIW4x0MNH0PfbUjakLnjo0eYxIh300266lHvMuxE864XTAFlOOnfE0xA59NvmgtJjRHrZLcuZMn_Dqd87J--PDavHMlm9PL4v7JbOSq8ys4litERsDpRNW1OuibWTBBS_Ltqp42aJxsAZXowXlKlsr29SVs5UTXAGXc3JzzB1i-Bqnl_QmjNFPJ7WoGyhkPcVNlDhSNoaUIjo9xG5r4l5z0Ic69bFOPdWpf-rUapLkUUoT7D8w_kX_Y30DeRR6hw</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Li, Xiao</creator><creator>Ma, Shexiang</creator><creator>Shan, Liqing</creator><creator>Li, Xiao</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230401</creationdate><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><author>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Body parts</topic><topic>Classification</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Modules</topic><topic>Multilayers</topic><topic>Multimedia Information Systems</topic><topic>Nodes</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Orientation</topic><topic>Pedestrians</topic><topic>Regular Paper</topic><topic>Spatial dependencies</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiao</creatorcontrib><creatorcontrib>Ma, Shexiang</creatorcontrib><creatorcontrib>Shan, Liqing</creatorcontrib><creatorcontrib>Li, Xiao</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiao</au><au>Ma, Shexiang</au><au>Shan, Liqing</au><au>Li, Xiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2023-04-01</date><risdate>2023</risdate><volume>29</volume><issue>2</issue><spage>587</spage><epage>603</epage><pages>587-603</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00993-9</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0942-4962
ispartof Multimedia systems, 2023-04, Vol.29 (2), p.587-603
issn 0942-4962
1432-1882
language eng
recordid cdi_proquest_journals_2780437834
source Springer Nature - Complete Springer Journals
subjects Algorithms
Body parts
Classification
Computer Communication Networks
Computer Graphics
Computer Science
Cryptology
Data Storage Representation
Datasets
Feature extraction
Modules
Multilayers
Multimedia Information Systems
Nodes
Object recognition
Operating Systems
Orientation
Pedestrians
Regular Paper
Spatial dependencies
Transformers
title Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T09%3A18%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-window%20Transformer%20parallel%20fusion%20feature%20pyramid%20network%20for%20pedestrian%20orientation%20detection&rft.jtitle=Multimedia%20systems&rft.au=Li,%20Xiao&rft.date=2023-04-01&rft.volume=29&rft.issue=2&rft.spage=587&rft.epage=603&rft.pages=587-603&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00993-9&rft_dat=%3Cproquest_cross%3E2780437834%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2780437834&rft_id=info:pmid/&rfr_iscdi=true