Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition

Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-super...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on multimedia computing communications and applications 2024-04
Hauptverfasser: Tu, Zhewei, Shu, Xiangbo, Huang, Peng, Yan, Rui, Liu, Zhenxing, Zhang, Jiachao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title ACM transactions on multimedia computing communications and applications
container_volume
creator Tu, Zhewei
Shu, Xiangbo
Huang, Peng
Yan, Rui
Liu, Zhenxing
Zhang, Jiachao
description Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.
doi_str_mv 10.1145/3655025
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3655025</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3655025</sourcerecordid><originalsourceid>FETCH-LOGICAL-a845-36d824e678f0a56416ff4e580efc9e95e19557c78b0dc585f21b8ff7e4adfdf93</originalsourceid><addsrcrecordid>eNo9kMFLwzAYxYMoOKd495Sbp2jS5UvT4xhuCgVFdy9Z8qVE1nYka8H_3tXNnd6D3493eITcC_4khITnmQLgGVyQiQAQTGkFl-cO-TW5Semb84Mm1YTYEgeMpg5tTZfRNMioaR1dotn3EdlIt_QjdnXElMKAdN7XDbZ7sw9dS30X6Rc2gaV-h3EICR2d2z_0ibar2zD2W3LlzTbh3SmnZL18WS9eWfm-elvMS2a0BDZTTmcSVa49N6CkUN5LBM3R2wILQFEA5DbXG-4saPCZ2Gjvc5TGeeeL2ZQ8Hmdt7FKK6KtdDI2JP5Xg1XhNdbrmYD4cTWObs_QPfwEWpF-c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition</title><source>ACM Digital Library Complete</source><creator>Tu, Zhewei ; Shu, Xiangbo ; Huang, Peng ; Yan, Rui ; Liu, Zhenxing ; Zhang, Jiachao</creator><creatorcontrib>Tu, Zhewei ; Shu, Xiangbo ; Huang, Peng ; Yan, Rui ; Liu, Zhenxing ; Zhang, Jiachao</creatorcontrib><description>Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.</description><identifier>ISSN: 1551-6857</identifier><identifier>EISSN: 1551-6865</identifier><identifier>DOI: 10.1145/3655025</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Do Not Use This Code, Generate the Correct Terms for Your Paper</subject><ispartof>ACM transactions on multimedia computing communications and applications, 2024-04</ispartof><rights>Copyright held by the owner/author(s). Publication rights licensed to ACM.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a845-36d824e678f0a56416ff4e580efc9e95e19557c78b0dc585f21b8ff7e4adfdf93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Tu, Zhewei</creatorcontrib><creatorcontrib>Shu, Xiangbo</creatorcontrib><creatorcontrib>Huang, Peng</creatorcontrib><creatorcontrib>Yan, Rui</creatorcontrib><creatorcontrib>Liu, Zhenxing</creatorcontrib><creatorcontrib>Zhang, Jiachao</creatorcontrib><title>Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition</title><title>ACM transactions on multimedia computing communications and applications</title><addtitle>ACM TOMM</addtitle><description>Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.</description><subject>Do Not Use This Code, Generate the Correct Terms for Your Paper</subject><issn>1551-6857</issn><issn>1551-6865</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kMFLwzAYxYMoOKd495Sbp2jS5UvT4xhuCgVFdy9Z8qVE1nYka8H_3tXNnd6D3493eITcC_4khITnmQLgGVyQiQAQTGkFl-cO-TW5Semb84Mm1YTYEgeMpg5tTZfRNMioaR1dotn3EdlIt_QjdnXElMKAdN7XDbZ7sw9dS30X6Rc2gaV-h3EICR2d2z_0ibar2zD2W3LlzTbh3SmnZL18WS9eWfm-elvMS2a0BDZTTmcSVa49N6CkUN5LBM3R2wILQFEA5DbXG-4saPCZ2Gjvc5TGeeeL2ZQ8Hmdt7FKK6KtdDI2JP5Xg1XhNdbrmYD4cTWObs_QPfwEWpF-c</recordid><startdate>20240411</startdate><enddate>20240411</enddate><creator>Tu, Zhewei</creator><creator>Shu, Xiangbo</creator><creator>Huang, Peng</creator><creator>Yan, Rui</creator><creator>Liu, Zhenxing</creator><creator>Zhang, Jiachao</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240411</creationdate><title>Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition</title><author>Tu, Zhewei ; Shu, Xiangbo ; Huang, Peng ; Yan, Rui ; Liu, Zhenxing ; Zhang, Jiachao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a845-36d824e678f0a56416ff4e580efc9e95e19557c78b0dc585f21b8ff7e4adfdf93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Do Not Use This Code, Generate the Correct Terms for Your Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tu, Zhewei</creatorcontrib><creatorcontrib>Shu, Xiangbo</creatorcontrib><creatorcontrib>Huang, Peng</creatorcontrib><creatorcontrib>Yan, Rui</creatorcontrib><creatorcontrib>Liu, Zhenxing</creatorcontrib><creatorcontrib>Zhang, Jiachao</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on multimedia computing communications and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tu, Zhewei</au><au>Shu, Xiangbo</au><au>Huang, Peng</au><au>Yan, Rui</au><au>Liu, Zhenxing</au><au>Zhang, Jiachao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition</atitle><jtitle>ACM transactions on multimedia computing communications and applications</jtitle><stitle>ACM TOMM</stitle><date>2024-04-11</date><risdate>2024</risdate><issn>1551-6857</issn><eissn>1551-6865</eissn><abstract>Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3655025</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1551-6857
ispartof ACM transactions on multimedia computing communications and applications, 2024-04
issn 1551-6857
1551-6865
language eng
recordid cdi_crossref_primary_10_1145_3655025
source ACM Digital Library Complete
subjects Do Not Use This Code, Generate the Correct Terms for Your Paper
title Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T03%3A40%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Frame-%20and%20Feature-Level%20Progressive%20Augmentation%20for%20Semi-supervised%20Action%20Recognition&rft.jtitle=ACM%20transactions%20on%20multimedia%20computing%20communications%20and%20applications&rft.au=Tu,%20Zhewei&rft.date=2024-04-11&rft.issn=1551-6857&rft.eissn=1551-6865&rft_id=info:doi/10.1145/3655025&rft_dat=%3Cacm_cross%3E3655025%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true