Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition

Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-super...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on multimedia computing communications and applications 2024-04
Hauptverfasser:	Tu, Zhewei, Shu, Xiangbo, Huang, Peng, Yan, Rui, Liu, Zhenxing, Zhang, Jiachao
Format:	Artikel
Sprache:	eng
Schlagworte:	Do Not Use This Code, Generate the Correct Terms for Your Paper
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	ACM transactions on multimedia computing communications and applications
container_volume
creator	Tu, Zhewei Shu, Xiangbo Huang, Peng Yan, Rui Liu, Zhenxing Zhang, Jiachao
description	Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.
doi_str_mv	10.1145/3655025
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3655025</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3655025</sourcerecordid><originalsourceid>FETCH-LOGICAL-a845-36d824e678f0a56416ff4e580efc9e95e19557c78b0dc585f21b8ff7e4adfdf93</originalsourceid><addsrcrecordid>eNo9kMFLwzAYxYMoOKd495Sbp2jS5UvT4xhuCgVFdy9Z8qVE1nYka8H_3tXNnd6D3493eITcC_4khITnmQLgGVyQiQAQTGkFl-cO-TW5Semb84Mm1YTYEgeMpg5tTZfRNMioaR1dotn3EdlIt_QjdnXElMKAdN7XDbZ7sw9dS30X6Rc2gaV-h3EICR2d2z_0ibar2zD2W3LlzTbh3SmnZL18WS9eWfm-elvMS2a0BDZTTmcSVa49N6CkUN5LBM3R2wILQFEA5DbXG-4saPCZ2Gjvc5TGeeeL2ZQ8Hmdt7FKK6KtdDI2JP5Xg1XhNdbrmYD4cTWObs_QPfwEWpF-c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition</title><source>ACM Digital Library Complete</source><creator>Tu, Zhewei ; Shu, Xiangbo ; Huang, Peng ; Yan, Rui ; Liu, Zhenxing ; Zhang, Jiachao</creator><creatorcontrib>Tu, Zhewei ; Shu, Xiangbo ; Huang, Peng ; Yan, Rui ; Liu, Zhenxing ; Zhang, Jiachao</creatorcontrib><description>Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.</description><identifier>ISSN: 1551-6857</identifier><identifier>EISSN: 1551-6865</identifier><identifier>DOI: 10.1145/3655025</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Do Not Use This Code, Generate the Correct Terms for Your Paper</subject><ispartof>ACM transactions on multimedia computing communications and applications, 2024-04</ispartof><rights>Copyright held by the owner/author(s). Publication rights licensed to ACM.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a845-36d824e678f0a56416ff4e580efc9e95e19557c78b0dc585f21b8ff7e4adfdf93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Tu, Zhewei</creatorcontrib><creatorcontrib>Shu, Xiangbo</creatorcontrib><creatorcontrib>Huang, Peng</creatorcontrib><creatorcontrib>Yan, Rui</creatorcontrib><creatorcontrib>Liu, Zhenxing</creatorcontrib><creatorcontrib>Zhang, Jiachao</creatorcontrib><title>Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition</title><title>ACM transactions on multimedia computing communications and applications</title><addtitle>ACM TOMM</addtitle><description>Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.</description><subject>Do Not Use This Code, Generate the Correct Terms for Your Paper</subject><issn>1551-6857</issn><issn>1551-6865</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kMFLwzAYxYMoOKd495Sbp2jS5UvT4xhuCgVFdy9Z8qVE1nYka8H_3tXNnd6D3493eITcC_4khITnmQLgGVyQiQAQTGkFl-cO-TW5Semb84Mm1YTYEgeMpg5tTZfRNMioaR1dotn3EdlIt_QjdnXElMKAdN7XDbZ7sw9dS30X6Rc2gaV-h3EICR2d2z_0ibar2zD2W3LlzTbh3SmnZL18WS9eWfm-elvMS2a0BDZTTmcSVa49N6CkUN5LBM3R2wILQFEA5DbXG-4saPCZ2Gjvc5TGeeeL2ZQ8Hmdt7FKK6KtdDI2JP5Xg1XhNdbrmYD4cTWObs_QPfwEWpF-c</recordid><startdate>20240411</startdate><enddate>20240411</enddate><creator>Tu, Zhewei</creator><creator>Shu, Xiangbo</creator><creator>Huang, Peng</creator><creator>Yan, Rui</creator><creator>Liu, Zhenxing</creator><creator>Zhang, Jiachao</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240411</creationdate><title>Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition</title><author>Tu, Zhewei ; Shu, Xiangbo ; Huang, Peng ; Yan, Rui ; Liu, Zhenxing ; Zhang, Jiachao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a845-36d824e678f0a56416ff4e580efc9e95e19557c78b0dc585f21b8ff7e4adfdf93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Do Not Use This Code, Generate the Correct Terms for Your Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tu, Zhewei</creatorcontrib><creatorcontrib>Shu, Xiangbo</creatorcontrib><creatorcontrib>Huang, Peng</creatorcontrib><creatorcontrib>Yan, Rui</creatorcontrib><creatorcontrib>Liu, Zhenxing</creatorcontrib><creatorcontrib>Zhang, Jiachao</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on multimedia computing communications and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tu, Zhewei</au><au>Shu, Xiangbo</au><au>Huang, Peng</au><au>Yan, Rui</au><au>Liu, Zhenxing</au><au>Zhang, Jiachao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition</atitle><jtitle>ACM transactions on multimedia computing communications and applications</jtitle><stitle>ACM TOMM</stitle><date>2024-04-11</date><risdate>2024</risdate><issn>1551-6857</issn><eissn>1551-6865</eissn><abstract>Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3655025</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1551-6857
ispartof	ACM transactions on multimedia computing communications and applications, 2024-04
issn	1551-6857 1551-6865
language	eng
recordid	cdi_crossref_primary_10_1145_3655025
source	ACM Digital Library Complete
subjects	Do Not Use This Code, Generate the Correct Terms for Your Paper
title	Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T03%3A40%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Frame-%20and%20Feature-Level%20Progressive%20Augmentation%20for%20Semi-supervised%20Action%20Recognition&rft.jtitle=ACM%20transactions%20on%20multimedia%20computing%20communications%20and%20applications&rft.au=Tu,%20Zhewei&rft.date=2024-04-11&rft.issn=1551-6857&rft.eissn=1551-6865&rft_id=info:doi/10.1145/3655025&rft_dat=%3Cacm_cross%3E3655025%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true