Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition
Pose-invariant facial expression recognition is one of the popular research directions within the field of computer vision, but pose variant usually change the facial appearance significantly, making the recognition results unstable from different perspectives. In this paper, a novel deep learning m...
Gespeichert in:
Veröffentlicht in: | The Visual computer 2023-07, Vol.39 (7), p.2637-2652 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2652 |
---|---|
container_issue | 7 |
container_start_page | 2637 |
container_title | The Visual computer |
container_volume | 39 |
creator | Liu, Chaoji Liu, Xingqiao Chen, Chong Wang, Qiankun |
description | Pose-invariant facial expression recognition is one of the popular research directions within the field of computer vision, but pose variant usually change the facial appearance significantly, making the recognition results unstable from different perspectives. In this paper, a novel deep learning method, namely, soft thresholding squeeze-and-excitation (ST-SE) block, was proposed to extract salient features of different channels for pose-invariant FER. For the purpose of adapting to different pose-invariant facial images better, global average pooling (GAP) operation was adopted to compute the average value of each channel of the feature map. To enhance the representational power of the network, Squeeze-and-Excitation (SE) block was embedded into the nonlinear transformation layer to filter out the redundant feature information. To further shrink the significant features, the absolute values of GAP and SE were multiplied to calculate the threshold suitable for the current view. And the developed ST-SE block was inserted into ResNet50 for the evaluation of recognition performance. In this study, extensive experiments on four pose-invariant datasets were carried out, i.e., BU-3DFE, Multi-PIE, Pose-RAF-DB and Pose-AffectNet, and the influences of different environments, poses and intensities on expression recognition were specifically analyzed. The experimental results demonstrate the feasibility and effectiveness of our method. |
doi_str_mv | 10.1007/s00371-022-02483-5 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918043634</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918043634</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-84494d325d6619d2cadcc759fd571b7e9431fb722b764dde1be532a9a129950e3</originalsourceid><addsrcrecordid>eNp9kE1LAzEURYMoWKt_wNWA62g-J5OlFL9AcKGuQybJtKk1GZNUq7_e1AruXIRH4Jz7eBeAU4zOMULiIiNEBYaIkPpYRyHfAxPMKIGEYr4PJgiLDhLRyUNwlPMS1b9gcgL0YxxKUxbJ5UVcWR_mTX5bO_floA4Wuo3xRRcfQxNc-YjppRliasaYHfThXSevQ2kGbbxeNW4z1pi8hZMzcR78VjwGB4NeZXfyO6fg-frqaXYL7x9u7maX99BQLAvsGJPMUsJt22JpidHWGMHlYLnAvXCSUTz0gpBetMxah3vHKdFSYyIlR45Owdkud0yxXpCLWsZ1CnWlIhJ3iNGWskqRHWVSzDm5QY3Jv-r0qTBS2yrVrkpVq1Q_VSpeJbqTcoXD3KW_6H-sb9cPeI0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918043634</pqid></control><display><type>article</type><title>Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition</title><source>Springer Nature - Complete Springer Journals</source><source>ProQuest Central UK/Ireland</source><source>ProQuest Central</source><creator>Liu, Chaoji ; Liu, Xingqiao ; Chen, Chong ; Wang, Qiankun</creator><creatorcontrib>Liu, Chaoji ; Liu, Xingqiao ; Chen, Chong ; Wang, Qiankun</creatorcontrib><description>Pose-invariant facial expression recognition is one of the popular research directions within the field of computer vision, but pose variant usually change the facial appearance significantly, making the recognition results unstable from different perspectives. In this paper, a novel deep learning method, namely, soft thresholding squeeze-and-excitation (ST-SE) block, was proposed to extract salient features of different channels for pose-invariant FER. For the purpose of adapting to different pose-invariant facial images better, global average pooling (GAP) operation was adopted to compute the average value of each channel of the feature map. To enhance the representational power of the network, Squeeze-and-Excitation (SE) block was embedded into the nonlinear transformation layer to filter out the redundant feature information. To further shrink the significant features, the absolute values of GAP and SE were multiplied to calculate the threshold suitable for the current view. And the developed ST-SE block was inserted into ResNet50 for the evaluation of recognition performance. In this study, extensive experiments on four pose-invariant datasets were carried out, i.e., BU-3DFE, Multi-PIE, Pose-RAF-DB and Pose-AffectNet, and the influences of different environments, poses and intensities on expression recognition were specifically analyzed. The experimental results demonstrate the feasibility and effectiveness of our method.</description><identifier>ISSN: 0178-2789</identifier><identifier>EISSN: 1432-2315</identifier><identifier>DOI: 10.1007/s00371-022-02483-5</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Artificial Intelligence ; Computer Graphics ; Computer Science ; Computer vision ; Deep learning ; Excitation ; Face recognition ; Feature maps ; Image Processing and Computer Vision ; Invariants ; Neural networks ; Original Article ; Teaching methods</subject><ispartof>The Visual computer, 2023-07, Vol.39 (7), p.2637-2652</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-84494d325d6619d2cadcc759fd571b7e9431fb722b764dde1be532a9a129950e3</citedby><cites>FETCH-LOGICAL-c319t-84494d325d6619d2cadcc759fd571b7e9431fb722b764dde1be532a9a129950e3</cites><orcidid>0000-0002-9381-5158</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00371-022-02483-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918043634?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21368,27903,27904,33723,41467,42536,43784,51297,64361,64365,72215</link.rule.ids></links><search><creatorcontrib>Liu, Chaoji</creatorcontrib><creatorcontrib>Liu, Xingqiao</creatorcontrib><creatorcontrib>Chen, Chong</creatorcontrib><creatorcontrib>Wang, Qiankun</creatorcontrib><title>Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition</title><title>The Visual computer</title><addtitle>Vis Comput</addtitle><description>Pose-invariant facial expression recognition is one of the popular research directions within the field of computer vision, but pose variant usually change the facial appearance significantly, making the recognition results unstable from different perspectives. In this paper, a novel deep learning method, namely, soft thresholding squeeze-and-excitation (ST-SE) block, was proposed to extract salient features of different channels for pose-invariant FER. For the purpose of adapting to different pose-invariant facial images better, global average pooling (GAP) operation was adopted to compute the average value of each channel of the feature map. To enhance the representational power of the network, Squeeze-and-Excitation (SE) block was embedded into the nonlinear transformation layer to filter out the redundant feature information. To further shrink the significant features, the absolute values of GAP and SE were multiplied to calculate the threshold suitable for the current view. And the developed ST-SE block was inserted into ResNet50 for the evaluation of recognition performance. In this study, extensive experiments on four pose-invariant datasets were carried out, i.e., BU-3DFE, Multi-PIE, Pose-RAF-DB and Pose-AffectNet, and the influences of different environments, poses and intensities on expression recognition were specifically analyzed. The experimental results demonstrate the feasibility and effectiveness of our method.</description><subject>Accuracy</subject><subject>Artificial Intelligence</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Deep learning</subject><subject>Excitation</subject><subject>Face recognition</subject><subject>Feature maps</subject><subject>Image Processing and Computer Vision</subject><subject>Invariants</subject><subject>Neural networks</subject><subject>Original Article</subject><subject>Teaching methods</subject><issn>0178-2789</issn><issn>1432-2315</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kE1LAzEURYMoWKt_wNWA62g-J5OlFL9AcKGuQybJtKk1GZNUq7_e1AruXIRH4Jz7eBeAU4zOMULiIiNEBYaIkPpYRyHfAxPMKIGEYr4PJgiLDhLRyUNwlPMS1b9gcgL0YxxKUxbJ5UVcWR_mTX5bO_floA4Wuo3xRRcfQxNc-YjppRliasaYHfThXSevQ2kGbbxeNW4z1pi8hZMzcR78VjwGB4NeZXfyO6fg-frqaXYL7x9u7maX99BQLAvsGJPMUsJt22JpidHWGMHlYLnAvXCSUTz0gpBetMxah3vHKdFSYyIlR45Owdkud0yxXpCLWsZ1CnWlIhJ3iNGWskqRHWVSzDm5QY3Jv-r0qTBS2yrVrkpVq1Q_VSpeJbqTcoXD3KW_6H-sb9cPeI0</recordid><startdate>20230701</startdate><enddate>20230701</enddate><creator>Liu, Chaoji</creator><creator>Liu, Xingqiao</creator><creator>Chen, Chong</creator><creator>Wang, Qiankun</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-9381-5158</orcidid></search><sort><creationdate>20230701</creationdate><title>Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition</title><author>Liu, Chaoji ; Liu, Xingqiao ; Chen, Chong ; Wang, Qiankun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-84494d325d6619d2cadcc759fd571b7e9431fb722b764dde1be532a9a129950e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Artificial Intelligence</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Deep learning</topic><topic>Excitation</topic><topic>Face recognition</topic><topic>Feature maps</topic><topic>Image Processing and Computer Vision</topic><topic>Invariants</topic><topic>Neural networks</topic><topic>Original Article</topic><topic>Teaching methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Chaoji</creatorcontrib><creatorcontrib>Liu, Xingqiao</creatorcontrib><creatorcontrib>Chen, Chong</creatorcontrib><creatorcontrib>Wang, Qiankun</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>The Visual computer</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Chaoji</au><au>Liu, Xingqiao</au><au>Chen, Chong</au><au>Wang, Qiankun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition</atitle><jtitle>The Visual computer</jtitle><stitle>Vis Comput</stitle><date>2023-07-01</date><risdate>2023</risdate><volume>39</volume><issue>7</issue><spage>2637</spage><epage>2652</epage><pages>2637-2652</pages><issn>0178-2789</issn><eissn>1432-2315</eissn><abstract>Pose-invariant facial expression recognition is one of the popular research directions within the field of computer vision, but pose variant usually change the facial appearance significantly, making the recognition results unstable from different perspectives. In this paper, a novel deep learning method, namely, soft thresholding squeeze-and-excitation (ST-SE) block, was proposed to extract salient features of different channels for pose-invariant FER. For the purpose of adapting to different pose-invariant facial images better, global average pooling (GAP) operation was adopted to compute the average value of each channel of the feature map. To enhance the representational power of the network, Squeeze-and-Excitation (SE) block was embedded into the nonlinear transformation layer to filter out the redundant feature information. To further shrink the significant features, the absolute values of GAP and SE were multiplied to calculate the threshold suitable for the current view. And the developed ST-SE block was inserted into ResNet50 for the evaluation of recognition performance. In this study, extensive experiments on four pose-invariant datasets were carried out, i.e., BU-3DFE, Multi-PIE, Pose-RAF-DB and Pose-AffectNet, and the influences of different environments, poses and intensities on expression recognition were specifically analyzed. The experimental results demonstrate the feasibility and effectiveness of our method.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00371-022-02483-5</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-9381-5158</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0178-2789 |
ispartof | The Visual computer, 2023-07, Vol.39 (7), p.2637-2652 |
issn | 0178-2789 1432-2315 |
language | eng |
recordid | cdi_proquest_journals_2918043634 |
source | Springer Nature - Complete Springer Journals; ProQuest Central UK/Ireland; ProQuest Central |
subjects | Accuracy Artificial Intelligence Computer Graphics Computer Science Computer vision Deep learning Excitation Face recognition Feature maps Image Processing and Computer Vision Invariants Neural networks Original Article Teaching methods |
title | Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T04%3A41%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Soft%20thresholding%20squeeze-and-excitation%20network%20for%20pose-invariant%20facial%20expression%20recognition&rft.jtitle=The%20Visual%20computer&rft.au=Liu,%20Chaoji&rft.date=2023-07-01&rft.volume=39&rft.issue=7&rft.spage=2637&rft.epage=2652&rft.pages=2637-2652&rft.issn=0178-2789&rft.eissn=1432-2315&rft_id=info:doi/10.1007/s00371-022-02483-5&rft_dat=%3Cproquest_cross%3E2918043634%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918043634&rft_id=info:pmid/&rfr_iscdi=true |