A Unified Image Compression Method for Human Perception and Multiple Vision Tasks

Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Guo, Sha, Sui, Lin, Zhang, Chenlin, Chen, Zhuo, Yang, Wenhan, Duan, Lingyu
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 359
container_issue
container_start_page 342
container_title
container_volume 15129
creator Guo, Sha
Sui, Lin
Zhang, Chenlin
Chen, Zhuo
Yang, Wenhan
Duan, Lingyu
description Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.
doi_str_mv 10.1007/978-3-031-73209-6_20
format Book Chapter
fullrecord <record><control><sourceid>proquest_sprin</sourceid><recordid>TN_cdi_proquest_ebookcentralchapters_31749088_291_427</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC31749088_291_427</sourcerecordid><originalsourceid>FETCH-LOGICAL-p174t-c7b4b5b44faf8c160b78a1cf5b27d84e39a8b3523352b02e70f7554dbe351a913</originalsourceid><addsrcrecordid>eNo1kFtPwjAYhusxAvIPvOgfqPa4tpeEqJhA1ASMd027dbAwttmO_28HevHlTd7Dd_EA8EDwI8FYPmmpEEOYESQZxRplhuILMGbJORnfl2BEMkIQY1xfgWnq_2dKX4MRZpgiLTm7BWPCM6lkRjC5A9MYK4eFpCLjgo7A5wxumqqsfAHfDnbr4bw9dMGnVtvAle93bQHLNsDF8WAb-OFD7rt-yGxTwNWx7quu9vCrOvXXNu7jPbgpbR399E8nYPPyvJ4v0PL99W0-W6KOSN6jXDruhOO8tKXKSYadVJbkpXBUFop7pq1yTFCWzmHqJS6lELxwngliNWETQM9_YxeqZuuDcW27j4ZgMxA0iYhhJjExJ2BmIJhG4jzqQvtz9LE3fljlvumDrfOd7XofokkgucZKGaqJ4VSyX72Xbv4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC31749088_291_427</pqid></control><display><type>book_chapter</type><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><source>Springer Books</source><creator>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu</creator><contributor>Russakovsky, Olga ; Ricci, Elisa ; Sattler, Torsten ; Leonardis, Ales ; Roth, Stefan ; Varol, Gül ; Sattler, Torsten ; Leonardis, Aleš ; Ricci, Elisa ; Varol, Gül ; Roth, Stefan ; Russakovsky, Olga</contributor><creatorcontrib>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu ; Russakovsky, Olga ; Ricci, Elisa ; Sattler, Torsten ; Leonardis, Ales ; Roth, Stefan ; Varol, Gül ; Sattler, Torsten ; Leonardis, Aleš ; Ricci, Elisa ; Varol, Gül ; Roth, Stefan ; Russakovsky, Olga</creatorcontrib><description>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783031732089</identifier><identifier>ISBN: 3031732081</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 303173209X</identifier><identifier>EISBN: 9783031732096</identifier><identifier>DOI: 10.1007/978-3-031-73209-6_20</identifier><identifier>OCLC: 1467876101</identifier><identifier>LCCallNum: TA1501-1820</identifier><language>eng</language><publisher>Switzerland: Springer</publisher><subject>Coding for machine ; Image compression ; Multiple tasks</subject><ispartof>Computer Vision - ECCV 2024, 2024, Vol.15129, p.342-359</ispartof><rights>The Author(s), under exclusive license to Springer Nature Switzerland AG 2025</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3168-1852 ; 0009-0008-9111-4084 ; 0000-0003-0563-1760 ; 0000-0002-7307-0443 ; 0000-0002-1692-0069 ; 0000-0002-4491-2023</orcidid><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/31749088-l.jpg</thumbnail><link.rule.ids>777,778,782,791,27908</link.rule.ids></links><search><contributor>Russakovsky, Olga</contributor><contributor>Ricci, Elisa</contributor><contributor>Sattler, Torsten</contributor><contributor>Leonardis, Ales</contributor><contributor>Roth, Stefan</contributor><contributor>Varol, Gül</contributor><contributor>Sattler, Torsten</contributor><contributor>Leonardis, Aleš</contributor><contributor>Ricci, Elisa</contributor><contributor>Varol, Gül</contributor><contributor>Roth, Stefan</contributor><contributor>Russakovsky, Olga</contributor><creatorcontrib>Guo, Sha</creatorcontrib><creatorcontrib>Sui, Lin</creatorcontrib><creatorcontrib>Zhang, Chenlin</creatorcontrib><creatorcontrib>Chen, Zhuo</creatorcontrib><creatorcontrib>Yang, Wenhan</creatorcontrib><creatorcontrib>Duan, Lingyu</creatorcontrib><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><title>Computer Vision - ECCV 2024</title><description>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</description><subject>Coding for machine</subject><subject>Image compression</subject><subject>Multiple tasks</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783031732089</isbn><isbn>3031732081</isbn><isbn>303173209X</isbn><isbn>9783031732096</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2024</creationdate><recordtype>book_chapter</recordtype><recordid>eNo1kFtPwjAYhusxAvIPvOgfqPa4tpeEqJhA1ASMd027dbAwttmO_28HevHlTd7Dd_EA8EDwI8FYPmmpEEOYESQZxRplhuILMGbJORnfl2BEMkIQY1xfgWnq_2dKX4MRZpgiLTm7BWPCM6lkRjC5A9MYK4eFpCLjgo7A5wxumqqsfAHfDnbr4bw9dMGnVtvAle93bQHLNsDF8WAb-OFD7rt-yGxTwNWx7quu9vCrOvXXNu7jPbgpbR399E8nYPPyvJ4v0PL99W0-W6KOSN6jXDruhOO8tKXKSYadVJbkpXBUFop7pq1yTFCWzmHqJS6lELxwngliNWETQM9_YxeqZuuDcW27j4ZgMxA0iYhhJjExJ2BmIJhG4jzqQvtz9LE3fljlvumDrfOd7XofokkgucZKGaqJ4VSyX72Xbv4</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Guo, Sha</creator><creator>Sui, Lin</creator><creator>Zhang, Chenlin</creator><creator>Chen, Zhuo</creator><creator>Yang, Wenhan</creator><creator>Duan, Lingyu</creator><general>Springer</general><general>Springer Nature Switzerland</general><scope>FFUUA</scope><orcidid>https://orcid.org/0000-0002-3168-1852</orcidid><orcidid>https://orcid.org/0009-0008-9111-4084</orcidid><orcidid>https://orcid.org/0000-0003-0563-1760</orcidid><orcidid>https://orcid.org/0000-0002-7307-0443</orcidid><orcidid>https://orcid.org/0000-0002-1692-0069</orcidid><orcidid>https://orcid.org/0000-0002-4491-2023</orcidid></search><sort><creationdate>2024</creationdate><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><author>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p174t-c7b4b5b44faf8c160b78a1cf5b27d84e39a8b3523352b02e70f7554dbe351a913</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Coding for machine</topic><topic>Image compression</topic><topic>Multiple tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guo, Sha</creatorcontrib><creatorcontrib>Sui, Lin</creatorcontrib><creatorcontrib>Zhang, Chenlin</creatorcontrib><creatorcontrib>Chen, Zhuo</creatorcontrib><creatorcontrib>Yang, Wenhan</creatorcontrib><creatorcontrib>Duan, Lingyu</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guo, Sha</au><au>Sui, Lin</au><au>Zhang, Chenlin</au><au>Chen, Zhuo</au><au>Yang, Wenhan</au><au>Duan, Lingyu</au><au>Russakovsky, Olga</au><au>Ricci, Elisa</au><au>Sattler, Torsten</au><au>Leonardis, Ales</au><au>Roth, Stefan</au><au>Varol, Gül</au><au>Sattler, Torsten</au><au>Leonardis, Aleš</au><au>Ricci, Elisa</au><au>Varol, Gül</au><au>Roth, Stefan</au><au>Russakovsky, Olga</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</atitle><btitle>Computer Vision - ECCV 2024</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2024</date><risdate>2024</risdate><volume>15129</volume><spage>342</spage><epage>359</epage><pages>342-359</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783031732089</isbn><isbn>3031732081</isbn><eisbn>303173209X</eisbn><eisbn>9783031732096</eisbn><abstract>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</abstract><cop>Switzerland</cop><pub>Springer</pub><doi>10.1007/978-3-031-73209-6_20</doi><oclcid>1467876101</oclcid><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-3168-1852</orcidid><orcidid>https://orcid.org/0009-0008-9111-4084</orcidid><orcidid>https://orcid.org/0000-0003-0563-1760</orcidid><orcidid>https://orcid.org/0000-0002-7307-0443</orcidid><orcidid>https://orcid.org/0000-0002-1692-0069</orcidid><orcidid>https://orcid.org/0000-0002-4491-2023</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0302-9743
ispartof Computer Vision - ECCV 2024, 2024, Vol.15129, p.342-359
issn 0302-9743
1611-3349
language eng
recordid cdi_proquest_ebookcentralchapters_31749088_291_427
source Springer Books
subjects Coding for machine
Image compression
Multiple tasks
title A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T20%3A32%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=A%20Unified%20Image%20Compression%20Method%20for%20Human%20Perception%20and%20Multiple%20Vision%20Tasks&rft.btitle=Computer%20Vision%20-%20ECCV%202024&rft.au=Guo,%20Sha&rft.date=2024&rft.volume=15129&rft.spage=342&rft.epage=359&rft.pages=342-359&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783031732089&rft.isbn_list=3031732081&rft_id=info:doi/10.1007/978-3-031-73209-6_20&rft_dat=%3Cproquest_sprin%3EEBC31749088_291_427%3C/proquest_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=303173209X&rft.eisbn_list=9783031732096&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC31749088_291_427&rft_id=info:pmid/&rfr_iscdi=true