A Unified Image Compression Method for Human Perception and Multiple Vision Tasks

Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Guo, Sha, Sui, Lin, Zhang, Chenlin, Chen, Zhuo, Yang, Wenhan, Duan, Lingyu
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Coding for machine Image compression Multiple tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	359
container_issue
container_start_page	342
container_title
container_volume	15129
creator	Guo, Sha Sui, Lin Zhang, Chenlin Chen, Zhuo Yang, Wenhan Duan, Lingyu
description	Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.
doi_str_mv	10.1007/978-3-031-73209-6_20
format	Book Chapter
fullrecord	<record><control><sourceid>proquest_sprin</sourceid><recordid>TN_cdi_proquest_ebookcentralchapters_31749088_291_427</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC31749088_291_427</sourcerecordid><originalsourceid>FETCH-LOGICAL-p174t-c7b4b5b44faf8c160b78a1cf5b27d84e39a8b3523352b02e70f7554dbe351a913</originalsourceid><addsrcrecordid>eNo1kFtPwjAYhusxAvIPvOgfqPa4tpeEqJhA1ASMd027dbAwttmO_28HevHlTd7Dd_EA8EDwI8FYPmmpEEOYESQZxRplhuILMGbJORnfl2BEMkIQY1xfgWnq_2dKX4MRZpgiLTm7BWPCM6lkRjC5A9MYK4eFpCLjgo7A5wxumqqsfAHfDnbr4bw9dMGnVtvAle93bQHLNsDF8WAb-OFD7rt-yGxTwNWx7quu9vCrOvXXNu7jPbgpbR399E8nYPPyvJ4v0PL99W0-W6KOSN6jXDruhOO8tKXKSYadVJbkpXBUFop7pq1yTFCWzmHqJS6lELxwngliNWETQM9_YxeqZuuDcW27j4ZgMxA0iYhhJjExJ2BmIJhG4jzqQvtz9LE3fljlvumDrfOd7XofokkgucZKGaqJ4VSyX72Xbv4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC31749088_291_427</pqid></control><display><type>book_chapter</type><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><source>Springer Books</source><creator>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu</creator><contributor>Russakovsky, Olga ; Ricci, Elisa ; Sattler, Torsten ; Leonardis, Ales ; Roth, Stefan ; Varol, Gül ; Sattler, Torsten ; Leonardis, Aleš ; Ricci, Elisa ; Varol, Gül ; Roth, Stefan ; Russakovsky, Olga</contributor><creatorcontrib>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu ; Russakovsky, Olga ; Ricci, Elisa ; Sattler, Torsten ; Leonardis, Ales ; Roth, Stefan ; Varol, Gül ; Sattler, Torsten ; Leonardis, Aleš ; Ricci, Elisa ; Varol, Gül ; Roth, Stefan ; Russakovsky, Olga</creatorcontrib><description>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783031732089</identifier><identifier>ISBN: 3031732081</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 303173209X</identifier><identifier>EISBN: 9783031732096</identifier><identifier>DOI: 10.1007/978-3-031-73209-6_20</identifier><identifier>OCLC: 1467876101</identifier><identifier>LCCallNum: TA1501-1820</identifier><language>eng</language><publisher>Switzerland: Springer</publisher><subject>Coding for machine ; Image compression ; Multiple tasks</subject><ispartof>Computer Vision - ECCV 2024, 2024, Vol.15129, p.342-359</ispartof><rights>The Author(s), under exclusive license to Springer Nature Switzerland AG 2025</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3168-1852 ; 0009-0008-9111-4084 ; 0000-0003-0563-1760 ; 0000-0002-7307-0443 ; 0000-0002-1692-0069 ; 0000-0002-4491-2023</orcidid><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/31749088-l.jpg</thumbnail><link.rule.ids>777,778,782,791,27908</link.rule.ids></links><search><contributor>Russakovsky, Olga</contributor><contributor>Ricci, Elisa</contributor><contributor>Sattler, Torsten</contributor><contributor>Leonardis, Ales</contributor><contributor>Roth, Stefan</contributor><contributor>Varol, Gül</contributor><contributor>Sattler, Torsten</contributor><contributor>Leonardis, Aleš</contributor><contributor>Ricci, Elisa</contributor><contributor>Varol, Gül</contributor><contributor>Roth, Stefan</contributor><contributor>Russakovsky, Olga</contributor><creatorcontrib>Guo, Sha</creatorcontrib><creatorcontrib>Sui, Lin</creatorcontrib><creatorcontrib>Zhang, Chenlin</creatorcontrib><creatorcontrib>Chen, Zhuo</creatorcontrib><creatorcontrib>Yang, Wenhan</creatorcontrib><creatorcontrib>Duan, Lingyu</creatorcontrib><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><title>Computer Vision - ECCV 2024</title><description>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</description><subject>Coding for machine</subject><subject>Image compression</subject><subject>Multiple tasks</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783031732089</isbn><isbn>3031732081</isbn><isbn>303173209X</isbn><isbn>9783031732096</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2024</creationdate><recordtype>book_chapter</recordtype><recordid>eNo1kFtPwjAYhusxAvIPvOgfqPa4tpeEqJhA1ASMd027dbAwttmO_28HevHlTd7Dd_EA8EDwI8FYPmmpEEOYESQZxRplhuILMGbJORnfl2BEMkIQY1xfgWnq_2dKX4MRZpgiLTm7BWPCM6lkRjC5A9MYK4eFpCLjgo7A5wxumqqsfAHfDnbr4bw9dMGnVtvAle93bQHLNsDF8WAb-OFD7rt-yGxTwNWx7quu9vCrOvXXNu7jPbgpbR399E8nYPPyvJ4v0PL99W0-W6KOSN6jXDruhOO8tKXKSYadVJbkpXBUFop7pq1yTFCWzmHqJS6lELxwngliNWETQM9_YxeqZuuDcW27j4ZgMxA0iYhhJjExJ2BmIJhG4jzqQvtz9LE3fljlvumDrfOd7XofokkgucZKGaqJ4VSyX72Xbv4</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Guo, Sha</creator><creator>Sui, Lin</creator><creator>Zhang, Chenlin</creator><creator>Chen, Zhuo</creator><creator>Yang, Wenhan</creator><creator>Duan, Lingyu</creator><general>Springer</general><general>Springer Nature Switzerland</general><scope>FFUUA</scope><orcidid>https://orcid.org/0000-0002-3168-1852</orcidid><orcidid>https://orcid.org/0009-0008-9111-4084</orcidid><orcidid>https://orcid.org/0000-0003-0563-1760</orcidid><orcidid>https://orcid.org/0000-0002-7307-0443</orcidid><orcidid>https://orcid.org/0000-0002-1692-0069</orcidid><orcidid>https://orcid.org/0000-0002-4491-2023</orcidid></search><sort><creationdate>2024</creationdate><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><author>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p174t-c7b4b5b44faf8c160b78a1cf5b27d84e39a8b3523352b02e70f7554dbe351a913</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Coding for machine</topic><topic>Image compression</topic><topic>Multiple tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guo, Sha</creatorcontrib><creatorcontrib>Sui, Lin</creatorcontrib><creatorcontrib>Zhang, Chenlin</creatorcontrib><creatorcontrib>Chen, Zhuo</creatorcontrib><creatorcontrib>Yang, Wenhan</creatorcontrib><creatorcontrib>Duan, Lingyu</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guo, Sha</au><au>Sui, Lin</au><au>Zhang, Chenlin</au><au>Chen, Zhuo</au><au>Yang, Wenhan</au><au>Duan, Lingyu</au><au>Russakovsky, Olga</au><au>Ricci, Elisa</au><au>Sattler, Torsten</au><au>Leonardis, Ales</au><au>Roth, Stefan</au><au>Varol, Gül</au><au>Sattler, Torsten</au><au>Leonardis, Aleš</au><au>Ricci, Elisa</au><au>Varol, Gül</au><au>Roth, Stefan</au><au>Russakovsky, Olga</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</atitle><btitle>Computer Vision - ECCV 2024</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2024</date><risdate>2024</risdate><volume>15129</volume><spage>342</spage><epage>359</epage><pages>342-359</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783031732089</isbn><isbn>3031732081</isbn><eisbn>303173209X</eisbn><eisbn>9783031732096</eisbn><abstract>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</abstract><cop>Switzerland</cop><pub>Springer</pub><doi>10.1007/978-3-031-73209-6_20</doi><oclcid>1467876101</oclcid><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-3168-1852</orcidid><orcidid>https://orcid.org/0009-0008-9111-4084</orcidid><orcidid>https://orcid.org/0000-0003-0563-1760</orcidid><orcidid>https://orcid.org/0000-0002-7307-0443</orcidid><orcidid>https://orcid.org/0000-0002-1692-0069</orcidid><orcidid>https://orcid.org/0000-0002-4491-2023</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	Computer Vision - ECCV 2024, 2024, Vol.15129, p.342-359
issn	0302-9743 1611-3349
language	eng
recordid	cdi_proquest_ebookcentralchapters_31749088_291_427
source	Springer Books
subjects	Coding for machine Image compression Multiple tasks
title	A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T20%3A32%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=A%20Unified%20Image%20Compression%20Method%20for%20Human%20Perception%20and%20Multiple%20Vision%20Tasks&rft.btitle=Computer%20Vision%20-%20ECCV%202024&rft.au=Guo,%20Sha&rft.date=2024&rft.volume=15129&rft.spage=342&rft.epage=359&rft.pages=342-359&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783031732089&rft.isbn_list=3031732081&rft_id=info:doi/10.1007/978-3-031-73209-6_20&rft_dat=%3Cproquest_sprin%3EEBC31749088_291_427%3C/proquest_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=303173209X&rft.eisbn_list=9783031732096&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC31749088_291_427&rft_id=info:pmid/&rfr_iscdi=true