Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection

In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia systems 2023-04, Vol.29 (2), p.587-603
Hauptverfasser:	Li, Xiao, Ma, Shexiang, Shan, Liqing
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Body parts Classification Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Datasets Feature extraction Modules Multilayers Multimedia Information Systems Nodes Object recognition Operating Systems Orientation Pedestrians Regular Paper Spatial dependencies Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	603
container_issue	2
container_start_page	587
container_title	Multimedia systems
container_volume	29
creator	Li, Xiao Ma, Shexiang Shan, Liqing Li, Xiao
description	In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.
doi_str_mv	10.1007/s00530-022-00993-9
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2780437834</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2780437834</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7FeOUvyCipd6Dml2Ilu3yZpkKf33bq3gzdPM4XneGV5CrjnccoD6LgGUEhgIwQCUkkydkBkvpGC8acQpmYEqBCtUJc7JRUobAF5XEmYEX8c-d2zX-Tbs6Coan1yIW4x0MNH0PfbUjakLnjo0eYxIh300266lHvMuxE864XTAFlOOnfE0xA59NvmgtJjRHrZLcuZMn_Dqd87J--PDavHMlm9PL4v7JbOSq8ys4litERsDpRNW1OuibWTBBS_Ltqp42aJxsAZXowXlKlsr29SVs5UTXAGXc3JzzB1i-Bqnl_QmjNFPJ7WoGyhkPcVNlDhSNoaUIjo9xG5r4l5z0Ic69bFOPdWpf-rUapLkUUoT7D8w_kX_Y30DeRR6hw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2780437834</pqid></control><display><type>article</type><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><source>Springer Nature - Complete Springer Journals</source><creator>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</creator><creatorcontrib>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</creatorcontrib><description>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-022-00993-9</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Body parts ; Classification ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Datasets ; Feature extraction ; Modules ; Multilayers ; Multimedia Information Systems ; Nodes ; Object recognition ; Operating Systems ; Orientation ; Pedestrians ; Regular Paper ; Spatial dependencies ; Transformers</subject><ispartof>Multimedia systems, 2023-04, Vol.29 (2), p.587-603</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</citedby><cites>FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-022-00993-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-022-00993-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Li, Xiao</creatorcontrib><creatorcontrib>Ma, Shexiang</creatorcontrib><creatorcontrib>Shan, Liqing</creatorcontrib><creatorcontrib>Li, Xiao</creatorcontrib><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</description><subject>Algorithms</subject><subject>Body parts</subject><subject>Classification</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Modules</subject><subject>Multilayers</subject><subject>Multimedia Information Systems</subject><subject>Nodes</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Orientation</subject><subject>Pedestrians</subject><subject>Regular Paper</subject><subject>Spatial dependencies</subject><subject>Transformers</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7FeOUvyCipd6Dml2Ilu3yZpkKf33bq3gzdPM4XneGV5CrjnccoD6LgGUEhgIwQCUkkydkBkvpGC8acQpmYEqBCtUJc7JRUobAF5XEmYEX8c-d2zX-Tbs6Coan1yIW4x0MNH0PfbUjakLnjo0eYxIh300266lHvMuxE864XTAFlOOnfE0xA59NvmgtJjRHrZLcuZMn_Dqd87J--PDavHMlm9PL4v7JbOSq8ys4litERsDpRNW1OuibWTBBS_Ltqp42aJxsAZXowXlKlsr29SVs5UTXAGXc3JzzB1i-Bqnl_QmjNFPJ7WoGyhkPcVNlDhSNoaUIjo9xG5r4l5z0Ic69bFOPdWpf-rUapLkUUoT7D8w_kX_Y30DeRR6hw</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Li, Xiao</creator><creator>Ma, Shexiang</creator><creator>Shan, Liqing</creator><creator>Li, Xiao</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230401</creationdate><title>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</title><author>Li, Xiao ; Ma, Shexiang ; Shan, Liqing ; Li, Xiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c91e6bee8a05f2c27b4d83412155d6615deaf0b0f7ec09f6c79c876fc6f219013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Body parts</topic><topic>Classification</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Modules</topic><topic>Multilayers</topic><topic>Multimedia Information Systems</topic><topic>Nodes</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Orientation</topic><topic>Pedestrians</topic><topic>Regular Paper</topic><topic>Spatial dependencies</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiao</creatorcontrib><creatorcontrib>Ma, Shexiang</creatorcontrib><creatorcontrib>Shan, Liqing</creatorcontrib><creatorcontrib>Li, Xiao</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiao</au><au>Ma, Shexiang</au><au>Shan, Liqing</au><au>Li, Xiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2023-04-01</date><risdate>2023</risdate><volume>29</volume><issue>2</issue><spage>587</spage><epage>603</epage><pages>587-603</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-022-00993-9</doi><tpages>17</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0942-4962
ispartof	Multimedia systems, 2023-04, Vol.29 (2), p.587-603
issn	0942-4962 1432-1882
language	eng
recordid	cdi_proquest_journals_2780437834
source	Springer Nature - Complete Springer Journals
subjects	Algorithms Body parts Classification Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Datasets Feature extraction Modules Multilayers Multimedia Information Systems Nodes Object recognition Operating Systems Orientation Pedestrians Regular Paper Spatial dependencies Transformers
title	Multi-window Transformer parallel fusion feature pyramid network for pedestrian orientation detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T09%3A18%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-window%20Transformer%20parallel%20fusion%20feature%20pyramid%20network%20for%20pedestrian%20orientation%20detection&rft.jtitle=Multimedia%20systems&rft.au=Li,%20Xiao&rft.date=2023-04-01&rft.volume=29&rft.issue=2&rft.spage=587&rft.epage=603&rft.pages=587-603&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-022-00993-9&rft_dat=%3Cproquest_cross%3E2780437834%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2780437834&rft_id=info:pmid/&rfr_iscdi=true