ReBotNet: Fast Real-time Video Enhancement
Most video restoration networks are slow, have high computational load, and can't be used for real-time video enhancement. In this work, we design an efficient and fast framework to perform real-time video enhancement for practical use-cases like live video calls and video streams. Our proposed...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Valanarasu, Jeya Maria Jose Garg, Rahul Toor, Andeep Tong, Xin Xi, Weijuan Lugmayr, Andreas Patel, Vishal M Menini, Anne |
description | Most video restoration networks are slow, have high computational load, and
can't be used for real-time video enhancement. In this work, we design an
efficient and fast framework to perform real-time video enhancement for
practical use-cases like live video calls and video streams. Our proposed
method, called Recurrent Bottleneck Mixer Network (ReBotNet), employs a
dual-branch framework. The first branch learns spatio-temporal features by
tokenizing the input frames along the spatial and temporal dimensions using a
ConvNext-based encoder and processing these abstract tokens using a bottleneck
mixer. To further improve temporal consistency, the second branch employs a
mixer directly on tokens extracted from individual frames. A common decoder
then merges the features form the two branches to predict the enhanced frame.
In addition, we propose a recurrent training approach where the last frame's
prediction is leveraged to efficiently enhance the current frame while
improving temporal consistency. To evaluate our method, we curate two new
datasets that emulate real-world video call and streaming scenarios, and show
extensive results on multiple datasets where ReBotNet outperforms existing
approaches with lower computations, reduced memory requirements, and faster
inference time. |
doi_str_mv | 10.48550/arxiv.2303.13504 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2303_13504</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2303_13504</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-6a1973539882f81e4643a7a0bff4537c75b7e1aafe80b0bdcf249560489837373</originalsourceid><addsrcrecordid>eNotzjsLwjAUhuEsDlL9AU52FlqTnqRJ3VS8QVEQcS2n7QkWbJUaRP-9V77h3T4exgaCh9IoxcfYPqp7GAGHUIDisstGe5pd3JbcxF_izfl7wnPgqpr8Y1XSxV80J2wKqqlxPdaxeL5R_1-PHZaLw3wdpLvVZj5NA4y1DGIUiQYFiTGRNYJkLAE18txaqUAXWuWaBKIlw3Oel4WNZKJiLk1iQL_nseHv9ovNrm1VY_vMPujsi4YXqJI5_Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ReBotNet: Fast Real-time Video Enhancement</title><source>arXiv.org</source><creator>Valanarasu, Jeya Maria Jose ; Garg, Rahul ; Toor, Andeep ; Tong, Xin ; Xi, Weijuan ; Lugmayr, Andreas ; Patel, Vishal M ; Menini, Anne</creator><creatorcontrib>Valanarasu, Jeya Maria Jose ; Garg, Rahul ; Toor, Andeep ; Tong, Xin ; Xi, Weijuan ; Lugmayr, Andreas ; Patel, Vishal M ; Menini, Anne</creatorcontrib><description>Most video restoration networks are slow, have high computational load, and
can't be used for real-time video enhancement. In this work, we design an
efficient and fast framework to perform real-time video enhancement for
practical use-cases like live video calls and video streams. Our proposed
method, called Recurrent Bottleneck Mixer Network (ReBotNet), employs a
dual-branch framework. The first branch learns spatio-temporal features by
tokenizing the input frames along the spatial and temporal dimensions using a
ConvNext-based encoder and processing these abstract tokens using a bottleneck
mixer. To further improve temporal consistency, the second branch employs a
mixer directly on tokens extracted from individual frames. A common decoder
then merges the features form the two branches to predict the enhanced frame.
In addition, we propose a recurrent training approach where the last frame's
prediction is leveraged to efficiently enhance the current frame while
improving temporal consistency. To evaluate our method, we curate two new
datasets that emulate real-world video call and streaming scenarios, and show
extensive results on multiple datasets where ReBotNet outperforms existing
approaches with lower computations, reduced memory requirements, and faster
inference time.</description><identifier>DOI: 10.48550/arxiv.2303.13504</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2303.13504$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2303.13504$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Valanarasu, Jeya Maria Jose</creatorcontrib><creatorcontrib>Garg, Rahul</creatorcontrib><creatorcontrib>Toor, Andeep</creatorcontrib><creatorcontrib>Tong, Xin</creatorcontrib><creatorcontrib>Xi, Weijuan</creatorcontrib><creatorcontrib>Lugmayr, Andreas</creatorcontrib><creatorcontrib>Patel, Vishal M</creatorcontrib><creatorcontrib>Menini, Anne</creatorcontrib><title>ReBotNet: Fast Real-time Video Enhancement</title><description>Most video restoration networks are slow, have high computational load, and
can't be used for real-time video enhancement. In this work, we design an
efficient and fast framework to perform real-time video enhancement for
practical use-cases like live video calls and video streams. Our proposed
method, called Recurrent Bottleneck Mixer Network (ReBotNet), employs a
dual-branch framework. The first branch learns spatio-temporal features by
tokenizing the input frames along the spatial and temporal dimensions using a
ConvNext-based encoder and processing these abstract tokens using a bottleneck
mixer. To further improve temporal consistency, the second branch employs a
mixer directly on tokens extracted from individual frames. A common decoder
then merges the features form the two branches to predict the enhanced frame.
In addition, we propose a recurrent training approach where the last frame's
prediction is leveraged to efficiently enhance the current frame while
improving temporal consistency. To evaluate our method, we curate two new
datasets that emulate real-world video call and streaming scenarios, and show
extensive results on multiple datasets where ReBotNet outperforms existing
approaches with lower computations, reduced memory requirements, and faster
inference time.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzjsLwjAUhuEsDlL9AU52FlqTnqRJ3VS8QVEQcS2n7QkWbJUaRP-9V77h3T4exgaCh9IoxcfYPqp7GAGHUIDisstGe5pd3JbcxF_izfl7wnPgqpr8Y1XSxV80J2wKqqlxPdaxeL5R_1-PHZaLw3wdpLvVZj5NA4y1DGIUiQYFiTGRNYJkLAE18txaqUAXWuWaBKIlw3Oel4WNZKJiLk1iQL_nseHv9ovNrm1VY_vMPujsi4YXqJI5_Q</recordid><startdate>20230323</startdate><enddate>20230323</enddate><creator>Valanarasu, Jeya Maria Jose</creator><creator>Garg, Rahul</creator><creator>Toor, Andeep</creator><creator>Tong, Xin</creator><creator>Xi, Weijuan</creator><creator>Lugmayr, Andreas</creator><creator>Patel, Vishal M</creator><creator>Menini, Anne</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230323</creationdate><title>ReBotNet: Fast Real-time Video Enhancement</title><author>Valanarasu, Jeya Maria Jose ; Garg, Rahul ; Toor, Andeep ; Tong, Xin ; Xi, Weijuan ; Lugmayr, Andreas ; Patel, Vishal M ; Menini, Anne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-6a1973539882f81e4643a7a0bff4537c75b7e1aafe80b0bdcf249560489837373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Valanarasu, Jeya Maria Jose</creatorcontrib><creatorcontrib>Garg, Rahul</creatorcontrib><creatorcontrib>Toor, Andeep</creatorcontrib><creatorcontrib>Tong, Xin</creatorcontrib><creatorcontrib>Xi, Weijuan</creatorcontrib><creatorcontrib>Lugmayr, Andreas</creatorcontrib><creatorcontrib>Patel, Vishal M</creatorcontrib><creatorcontrib>Menini, Anne</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Valanarasu, Jeya Maria Jose</au><au>Garg, Rahul</au><au>Toor, Andeep</au><au>Tong, Xin</au><au>Xi, Weijuan</au><au>Lugmayr, Andreas</au><au>Patel, Vishal M</au><au>Menini, Anne</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ReBotNet: Fast Real-time Video Enhancement</atitle><date>2023-03-23</date><risdate>2023</risdate><abstract>Most video restoration networks are slow, have high computational load, and
can't be used for real-time video enhancement. In this work, we design an
efficient and fast framework to perform real-time video enhancement for
practical use-cases like live video calls and video streams. Our proposed
method, called Recurrent Bottleneck Mixer Network (ReBotNet), employs a
dual-branch framework. The first branch learns spatio-temporal features by
tokenizing the input frames along the spatial and temporal dimensions using a
ConvNext-based encoder and processing these abstract tokens using a bottleneck
mixer. To further improve temporal consistency, the second branch employs a
mixer directly on tokens extracted from individual frames. A common decoder
then merges the features form the two branches to predict the enhanced frame.
In addition, we propose a recurrent training approach where the last frame's
prediction is leveraged to efficiently enhance the current frame while
improving temporal consistency. To evaluate our method, we curate two new
datasets that emulate real-world video call and streaming scenarios, and show
extensive results on multiple datasets where ReBotNet outperforms existing
approaches with lower computations, reduced memory requirements, and faster
inference time.</abstract><doi>10.48550/arxiv.2303.13504</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2303.13504 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2303_13504 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | ReBotNet: Fast Real-time Video Enhancement |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T02%3A05%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ReBotNet:%20Fast%20Real-time%20Video%20Enhancement&rft.au=Valanarasu,%20Jeya%20Maria%20Jose&rft.date=2023-03-23&rft_id=info:doi/10.48550/arxiv.2303.13504&rft_dat=%3Carxiv_GOX%3E2303_13504%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |