Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering

Recently, deep learning-based image compression has made significant progresses, and has achieved better rate-distortion (R-D) performance than the latest traditional method, H.266/VVC, in both MS-SSIM metric and the more challenging PSNR metric. However, a major problem is that the complexities of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2023-08, Vol.33 (8), p.1-1
Hauptverfasser:	Fu, Haisheng, Liang, Feng, Liang, Jie, Li, Binglin, Zhang, Guohe, Han, Jingning
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive sampling Asymmetry Bit rate Coders Complexity Complexity theory Decoding Deep learning Entropy coding Image coding Image compression Importance Scaling Learning-based Image Compression Measurement Multi-scale Residual Block Post-Quantization Filter Quantization (signal) Source code State of the art
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue	8
container_start_page	1
container_title	IEEE transactions on circuits and systems for video technology
container_volume	33
creator	Fu, Haisheng Liang, Feng Liang, Jie Li, Binglin Zhang, Guohe Han, Jingning
description	Recently, deep learning-based image compression has made significant progresses, and has achieved better rate-distortion (R-D) performance than the latest traditional method, H.266/VVC, in both MS-SSIM metric and the more challenging PSNR metric. However, a major problem is that the complexities of many leading learned schemes are too high. In this paper, we propose an efficient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art. First, we develop an improved multi-scale residual block (MSRB) that can expand the receptive field and capture global information more efficiently, which further reduces the spatial correlation of the latent representations. Second, an importance scaling network is introduced to directly scale the latents to achieve content-adaptive bit allocation without sending side information, which is more flexible than previous importance map methods. Third, we apply a post-quantization filter (PQF) to reduce the quantization error, motivated by the Sample Adaptive Offset (SAO) filter in video coding. Moreover, our experiments show that the performance of the system is less sensitive to the complexity of the decoder. Therefore, we design an asymmetric paradigm, in which the encoder employs three stages of MSRBs to improve the learning capacity, whereas the decoder only uses one stage of MSRB, which reduces the decoder complexity and still yields satisfactory performance. Experimental results show that compared to the state-of-the-art method, the encoding and decoding time of the proposed method are about 17 times faster, and the R-D performance is only reduced by about 1% on both Kodak and Tecnick-40 datasets, which is still better than H.266/VVC(4:4:4) and other leading learning-based methods. Our source code is publicly available at https://github.com/fengyurenpingsheng.
doi_str_mv	10.1109/TCSVT.2023.3237274
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2845761460</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10018275</ieee_id><sourcerecordid>2845761460</sourcerecordid><originalsourceid>FETCH-LOGICAL-c296t-2172646053aa5080e6b4f7352415042aea54fc042661831a317fd917642af0153</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRSMEEqXwA4iFJbZN8fgRp8tSUahUxKOFbWTSSXHJo9iOUPl6HMqC1Yw0596RThSdAx0C0NHVcrJ4XQ4ZZXzIGVdMiYOoB1KmMWNUHoadSohTBvI4OnFuQymIVKhetBu7XVWhtyYnc9S2xhWZVXqNZNJUW4vOmaYmX8a_k_u29CZe5LpE8ozOrFpdkuuyyT8GIbJtrNd1jqQDTL0eEF2vyGPjfPzU6tqbb-27qqkpPdoAnEZHhS4dnv3NfvQyvVlO7uL5w-1sMp7HORslPmagWCISKrnWkqYUkzdRKC6ZAEkF06ilKPKwJQmkHDQHVaxGoJJwKyhI3o8u971b23y26Hy2aVpbh5cZS4VUCYT2QLE9ldvGOYtFtrWm0naXAc06xdmv4qxTnP0pDqGLfcgg4r8AhZQpyX8Aj_t3eQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2845761460</pqid></control><display><type>article</type><title>Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering</title><source>IEEE Electronic Library (IEL)</source><creator>Fu, Haisheng ; Liang, Feng ; Liang, Jie ; Li, Binglin ; Zhang, Guohe ; Han, Jingning</creator><creatorcontrib>Fu, Haisheng ; Liang, Feng ; Liang, Jie ; Li, Binglin ; Zhang, Guohe ; Han, Jingning</creatorcontrib><description>Recently, deep learning-based image compression has made significant progresses, and has achieved better rate-distortion (R-D) performance than the latest traditional method, H.266/VVC, in both MS-SSIM metric and the more challenging PSNR metric. However, a major problem is that the complexities of many leading learned schemes are too high. In this paper, we propose an efficient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art. First, we develop an improved multi-scale residual block (MSRB) that can expand the receptive field and capture global information more efficiently, which further reduces the spatial correlation of the latent representations. Second, an importance scaling network is introduced to directly scale the latents to achieve content-adaptive bit allocation without sending side information, which is more flexible than previous importance map methods. Third, we apply a post-quantization filter (PQF) to reduce the quantization error, motivated by the Sample Adaptive Offset (SAO) filter in video coding. Moreover, our experiments show that the performance of the system is less sensitive to the complexity of the decoder. Therefore, we design an asymmetric paradigm, in which the encoder employs three stages of MSRBs to improve the learning capacity, whereas the decoder only uses one stage of MSRB, which reduces the decoder complexity and still yields satisfactory performance. Experimental results show that compared to the state-of-the-art method, the encoding and decoding time of the proposed method are about 17 times faster, and the R-D performance is only reduced by about 1% on both Kodak and Tecnick-40 datasets, which is still better than H.266/VVC(4:4:4) and other leading learning-based methods. Our source code is publicly available at https://github.com/fengyurenpingsheng.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2023.3237274</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptive sampling ; Asymmetry ; Bit rate ; Coders ; Complexity ; Complexity theory ; Decoding ; Deep learning ; Entropy coding ; Image coding ; Image compression ; Importance Scaling ; Learning-based Image Compression ; Measurement ; Multi-scale Residual Block ; Post-Quantization Filter ; Quantization (signal) ; Source code ; State of the art</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2023-08, Vol.33 (8), p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c296t-2172646053aa5080e6b4f7352415042aea54fc042661831a317fd917642af0153</citedby><cites>FETCH-LOGICAL-c296t-2172646053aa5080e6b4f7352415042aea54fc042661831a317fd917642af0153</cites><orcidid>0000-0001-7168-2254 ; 0000-0002-9393-6224 ; 0000-0003-3003-4343 ; 0000-0003-2798-1710 ; 0000-0002-0113-5500 ; 0000-0001-8092-8009</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10018275$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10018275$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fu, Haisheng</creatorcontrib><creatorcontrib>Liang, Feng</creatorcontrib><creatorcontrib>Liang, Jie</creatorcontrib><creatorcontrib>Li, Binglin</creatorcontrib><creatorcontrib>Zhang, Guohe</creatorcontrib><creatorcontrib>Han, Jingning</creatorcontrib><title>Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Recently, deep learning-based image compression has made significant progresses, and has achieved better rate-distortion (R-D) performance than the latest traditional method, H.266/VVC, in both MS-SSIM metric and the more challenging PSNR metric. However, a major problem is that the complexities of many leading learned schemes are too high. In this paper, we propose an efficient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art. First, we develop an improved multi-scale residual block (MSRB) that can expand the receptive field and capture global information more efficiently, which further reduces the spatial correlation of the latent representations. Second, an importance scaling network is introduced to directly scale the latents to achieve content-adaptive bit allocation without sending side information, which is more flexible than previous importance map methods. Third, we apply a post-quantization filter (PQF) to reduce the quantization error, motivated by the Sample Adaptive Offset (SAO) filter in video coding. Moreover, our experiments show that the performance of the system is less sensitive to the complexity of the decoder. Therefore, we design an asymmetric paradigm, in which the encoder employs three stages of MSRBs to improve the learning capacity, whereas the decoder only uses one stage of MSRB, which reduces the decoder complexity and still yields satisfactory performance. Experimental results show that compared to the state-of-the-art method, the encoding and decoding time of the proposed method are about 17 times faster, and the R-D performance is only reduced by about 1% on both Kodak and Tecnick-40 datasets, which is still better than H.266/VVC(4:4:4) and other leading learning-based methods. Our source code is publicly available at https://github.com/fengyurenpingsheng.</description><subject>Adaptive sampling</subject><subject>Asymmetry</subject><subject>Bit rate</subject><subject>Coders</subject><subject>Complexity</subject><subject>Complexity theory</subject><subject>Decoding</subject><subject>Deep learning</subject><subject>Entropy coding</subject><subject>Image coding</subject><subject>Image compression</subject><subject>Importance Scaling</subject><subject>Learning-based Image Compression</subject><subject>Measurement</subject><subject>Multi-scale Residual Block</subject><subject>Post-Quantization Filter</subject><subject>Quantization (signal)</subject><subject>Source code</subject><subject>State of the art</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRSMEEqXwA4iFJbZN8fgRp8tSUahUxKOFbWTSSXHJo9iOUPl6HMqC1Yw0596RThSdAx0C0NHVcrJ4XQ4ZZXzIGVdMiYOoB1KmMWNUHoadSohTBvI4OnFuQymIVKhetBu7XVWhtyYnc9S2xhWZVXqNZNJUW4vOmaYmX8a_k_u29CZe5LpE8ozOrFpdkuuyyT8GIbJtrNd1jqQDTL0eEF2vyGPjfPzU6tqbb-27qqkpPdoAnEZHhS4dnv3NfvQyvVlO7uL5w-1sMp7HORslPmagWCISKrnWkqYUkzdRKC6ZAEkF06ilKPKwJQmkHDQHVaxGoJJwKyhI3o8u971b23y26Hy2aVpbh5cZS4VUCYT2QLE9ldvGOYtFtrWm0naXAc06xdmv4qxTnP0pDqGLfcgg4r8AhZQpyX8Aj_t3eQ</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>Fu, Haisheng</creator><creator>Liang, Feng</creator><creator>Liang, Jie</creator><creator>Li, Binglin</creator><creator>Zhang, Guohe</creator><creator>Han, Jingning</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-7168-2254</orcidid><orcidid>https://orcid.org/0000-0002-9393-6224</orcidid><orcidid>https://orcid.org/0000-0003-3003-4343</orcidid><orcidid>https://orcid.org/0000-0003-2798-1710</orcidid><orcidid>https://orcid.org/0000-0002-0113-5500</orcidid><orcidid>https://orcid.org/0000-0001-8092-8009</orcidid></search><sort><creationdate>20230801</creationdate><title>Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering</title><author>Fu, Haisheng ; Liang, Feng ; Liang, Jie ; Li, Binglin ; Zhang, Guohe ; Han, Jingning</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c296t-2172646053aa5080e6b4f7352415042aea54fc042661831a317fd917642af0153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Adaptive sampling</topic><topic>Asymmetry</topic><topic>Bit rate</topic><topic>Coders</topic><topic>Complexity</topic><topic>Complexity theory</topic><topic>Decoding</topic><topic>Deep learning</topic><topic>Entropy coding</topic><topic>Image coding</topic><topic>Image compression</topic><topic>Importance Scaling</topic><topic>Learning-based Image Compression</topic><topic>Measurement</topic><topic>Multi-scale Residual Block</topic><topic>Post-Quantization Filter</topic><topic>Quantization (signal)</topic><topic>Source code</topic><topic>State of the art</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fu, Haisheng</creatorcontrib><creatorcontrib>Liang, Feng</creatorcontrib><creatorcontrib>Liang, Jie</creatorcontrib><creatorcontrib>Li, Binglin</creatorcontrib><creatorcontrib>Zhang, Guohe</creatorcontrib><creatorcontrib>Han, Jingning</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fu, Haisheng</au><au>Liang, Feng</au><au>Liang, Jie</au><au>Li, Binglin</au><au>Zhang, Guohe</au><au>Han, Jingning</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2023-08-01</date><risdate>2023</risdate><volume>33</volume><issue>8</issue><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Recently, deep learning-based image compression has made significant progresses, and has achieved better rate-distortion (R-D) performance than the latest traditional method, H.266/VVC, in both MS-SSIM metric and the more challenging PSNR metric. However, a major problem is that the complexities of many leading learned schemes are too high. In this paper, we propose an efficient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art. First, we develop an improved multi-scale residual block (MSRB) that can expand the receptive field and capture global information more efficiently, which further reduces the spatial correlation of the latent representations. Second, an importance scaling network is introduced to directly scale the latents to achieve content-adaptive bit allocation without sending side information, which is more flexible than previous importance map methods. Third, we apply a post-quantization filter (PQF) to reduce the quantization error, motivated by the Sample Adaptive Offset (SAO) filter in video coding. Moreover, our experiments show that the performance of the system is less sensitive to the complexity of the decoder. Therefore, we design an asymmetric paradigm, in which the encoder employs three stages of MSRBs to improve the learning capacity, whereas the decoder only uses one stage of MSRB, which reduces the decoder complexity and still yields satisfactory performance. Experimental results show that compared to the state-of-the-art method, the encoding and decoding time of the proposed method are about 17 times faster, and the R-D performance is only reduced by about 1% on both Kodak and Tecnick-40 datasets, which is still better than H.266/VVC(4:4:4) and other leading learning-based methods. Our source code is publicly available at https://github.com/fengyurenpingsheng.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2023.3237274</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-7168-2254</orcidid><orcidid>https://orcid.org/0000-0002-9393-6224</orcidid><orcidid>https://orcid.org/0000-0003-3003-4343</orcidid><orcidid>https://orcid.org/0000-0003-2798-1710</orcidid><orcidid>https://orcid.org/0000-0002-0113-5500</orcidid><orcidid>https://orcid.org/0000-0001-8092-8009</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2023-08, Vol.33 (8), p.1-1
issn	1051-8215 1558-2205
language	eng
recordid	cdi_proquest_journals_2845761460
source	IEEE Electronic Library (IEL)
subjects	Adaptive sampling Asymmetry Bit rate Coders Complexity Complexity theory Decoding Deep learning Entropy coding Image coding Image compression Importance Scaling Learning-based Image Compression Measurement Multi-scale Residual Block Post-Quantization Filter Quantization (signal) Source code State of the art
title	Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T09%3A49%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Asymmetric%20Learned%20Image%20Compression%20with%20Multi-Scale%20Residual%20Block,%20Importance%20Scaling,%20and%20Post-Quantization%20Filtering&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Fu,%20Haisheng&rft.date=2023-08-01&rft.volume=33&rft.issue=8&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2023.3237274&rft_dat=%3Cproquest_RIE%3E2845761460%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2845761460&rft_id=info:pmid/&rft_ieee_id=10018275&rfr_iscdi=true