A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Buchkapitel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 359 |
---|---|
container_issue | |
container_start_page | 342 |
container_title | |
container_volume | 15129 |
creator | Guo, Sha Sui, Lin Zhang, Chenlin Chen, Zhuo Yang, Wenhan Duan, Lingyu |
description | Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively. |
doi_str_mv | 10.1007/978-3-031-73209-6_20 |
format | Book Chapter |
fullrecord | <record><control><sourceid>proquest_sprin</sourceid><recordid>TN_cdi_proquest_ebookcentralchapters_31749088_291_427</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC31749088_291_427</sourcerecordid><originalsourceid>FETCH-LOGICAL-p174t-c7b4b5b44faf8c160b78a1cf5b27d84e39a8b3523352b02e70f7554dbe351a913</originalsourceid><addsrcrecordid>eNo1kFtPwjAYhusxAvIPvOgfqPa4tpeEqJhA1ASMd027dbAwttmO_28HevHlTd7Dd_EA8EDwI8FYPmmpEEOYESQZxRplhuILMGbJORnfl2BEMkIQY1xfgWnq_2dKX4MRZpgiLTm7BWPCM6lkRjC5A9MYK4eFpCLjgo7A5wxumqqsfAHfDnbr4bw9dMGnVtvAle93bQHLNsDF8WAb-OFD7rt-yGxTwNWx7quu9vCrOvXXNu7jPbgpbR399E8nYPPyvJ4v0PL99W0-W6KOSN6jXDruhOO8tKXKSYadVJbkpXBUFop7pq1yTFCWzmHqJS6lELxwngliNWETQM9_YxeqZuuDcW27j4ZgMxA0iYhhJjExJ2BmIJhG4jzqQvtz9LE3fljlvumDrfOd7XofokkgucZKGaqJ4VSyX72Xbv4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC31749088_291_427</pqid></control><display><type>book_chapter</type><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><source>Springer Books</source><creator>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu</creator><contributor>Russakovsky, Olga ; Ricci, Elisa ; Sattler, Torsten ; Leonardis, Ales ; Roth, Stefan ; Varol, Gül ; Sattler, Torsten ; Leonardis, Aleš ; Ricci, Elisa ; Varol, Gül ; Roth, Stefan ; Russakovsky, Olga</contributor><creatorcontrib>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu ; Russakovsky, Olga ; Ricci, Elisa ; Sattler, Torsten ; Leonardis, Ales ; Roth, Stefan ; Varol, Gül ; Sattler, Torsten ; Leonardis, Aleš ; Ricci, Elisa ; Varol, Gül ; Roth, Stefan ; Russakovsky, Olga</creatorcontrib><description>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783031732089</identifier><identifier>ISBN: 3031732081</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 303173209X</identifier><identifier>EISBN: 9783031732096</identifier><identifier>DOI: 10.1007/978-3-031-73209-6_20</identifier><identifier>OCLC: 1467876101</identifier><identifier>LCCallNum: TA1501-1820</identifier><language>eng</language><publisher>Switzerland: Springer</publisher><subject>Coding for machine ; Image compression ; Multiple tasks</subject><ispartof>Computer Vision - ECCV 2024, 2024, Vol.15129, p.342-359</ispartof><rights>The Author(s), under exclusive license to Springer Nature Switzerland AG 2025</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3168-1852 ; 0009-0008-9111-4084 ; 0000-0003-0563-1760 ; 0000-0002-7307-0443 ; 0000-0002-1692-0069 ; 0000-0002-4491-2023</orcidid><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/31749088-l.jpg</thumbnail><link.rule.ids>777,778,782,791,27908</link.rule.ids></links><search><contributor>Russakovsky, Olga</contributor><contributor>Ricci, Elisa</contributor><contributor>Sattler, Torsten</contributor><contributor>Leonardis, Ales</contributor><contributor>Roth, Stefan</contributor><contributor>Varol, Gül</contributor><contributor>Sattler, Torsten</contributor><contributor>Leonardis, Aleš</contributor><contributor>Ricci, Elisa</contributor><contributor>Varol, Gül</contributor><contributor>Roth, Stefan</contributor><contributor>Russakovsky, Olga</contributor><creatorcontrib>Guo, Sha</creatorcontrib><creatorcontrib>Sui, Lin</creatorcontrib><creatorcontrib>Zhang, Chenlin</creatorcontrib><creatorcontrib>Chen, Zhuo</creatorcontrib><creatorcontrib>Yang, Wenhan</creatorcontrib><creatorcontrib>Duan, Lingyu</creatorcontrib><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><title>Computer Vision - ECCV 2024</title><description>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</description><subject>Coding for machine</subject><subject>Image compression</subject><subject>Multiple tasks</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783031732089</isbn><isbn>3031732081</isbn><isbn>303173209X</isbn><isbn>9783031732096</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2024</creationdate><recordtype>book_chapter</recordtype><recordid>eNo1kFtPwjAYhusxAvIPvOgfqPa4tpeEqJhA1ASMd027dbAwttmO_28HevHlTd7Dd_EA8EDwI8FYPmmpEEOYESQZxRplhuILMGbJORnfl2BEMkIQY1xfgWnq_2dKX4MRZpgiLTm7BWPCM6lkRjC5A9MYK4eFpCLjgo7A5wxumqqsfAHfDnbr4bw9dMGnVtvAle93bQHLNsDF8WAb-OFD7rt-yGxTwNWx7quu9vCrOvXXNu7jPbgpbR399E8nYPPyvJ4v0PL99W0-W6KOSN6jXDruhOO8tKXKSYadVJbkpXBUFop7pq1yTFCWzmHqJS6lELxwngliNWETQM9_YxeqZuuDcW27j4ZgMxA0iYhhJjExJ2BmIJhG4jzqQvtz9LE3fljlvumDrfOd7XofokkgucZKGaqJ4VSyX72Xbv4</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Guo, Sha</creator><creator>Sui, Lin</creator><creator>Zhang, Chenlin</creator><creator>Chen, Zhuo</creator><creator>Yang, Wenhan</creator><creator>Duan, Lingyu</creator><general>Springer</general><general>Springer Nature Switzerland</general><scope>FFUUA</scope><orcidid>https://orcid.org/0000-0002-3168-1852</orcidid><orcidid>https://orcid.org/0009-0008-9111-4084</orcidid><orcidid>https://orcid.org/0000-0003-0563-1760</orcidid><orcidid>https://orcid.org/0000-0002-7307-0443</orcidid><orcidid>https://orcid.org/0000-0002-1692-0069</orcidid><orcidid>https://orcid.org/0000-0002-4491-2023</orcidid></search><sort><creationdate>2024</creationdate><title>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</title><author>Guo, Sha ; Sui, Lin ; Zhang, Chenlin ; Chen, Zhuo ; Yang, Wenhan ; Duan, Lingyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p174t-c7b4b5b44faf8c160b78a1cf5b27d84e39a8b3523352b02e70f7554dbe351a913</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Coding for machine</topic><topic>Image compression</topic><topic>Multiple tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guo, Sha</creatorcontrib><creatorcontrib>Sui, Lin</creatorcontrib><creatorcontrib>Zhang, Chenlin</creatorcontrib><creatorcontrib>Chen, Zhuo</creatorcontrib><creatorcontrib>Yang, Wenhan</creatorcontrib><creatorcontrib>Duan, Lingyu</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guo, Sha</au><au>Sui, Lin</au><au>Zhang, Chenlin</au><au>Chen, Zhuo</au><au>Yang, Wenhan</au><au>Duan, Lingyu</au><au>Russakovsky, Olga</au><au>Ricci, Elisa</au><au>Sattler, Torsten</au><au>Leonardis, Ales</au><au>Roth, Stefan</au><au>Varol, Gül</au><au>Sattler, Torsten</au><au>Leonardis, Aleš</au><au>Ricci, Elisa</au><au>Varol, Gül</au><au>Roth, Stefan</au><au>Russakovsky, Olga</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>A Unified Image Compression Method for Human Perception and Multiple Vision Tasks</atitle><btitle>Computer Vision - ECCV 2024</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2024</date><risdate>2024</risdate><volume>15129</volume><spage>342</spage><epage>359</epage><pages>342-359</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783031732089</isbn><isbn>3031732081</isbn><eisbn>303173209X</eisbn><eisbn>9783031732096</eisbn><abstract>Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-Task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module boosts generalization toward unforeseen tasks by distilling the invariant knowledge from seen vision tasks. Experiments show that the proposed method extracts compact and versatile embeddings for human and machine vision collaborative compression, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for object detection/instance segmentation/semantic segmentation/panoptic segmentation and video question answering tasks, respectively.</abstract><cop>Switzerland</cop><pub>Springer</pub><doi>10.1007/978-3-031-73209-6_20</doi><oclcid>1467876101</oclcid><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-3168-1852</orcidid><orcidid>https://orcid.org/0009-0008-9111-4084</orcidid><orcidid>https://orcid.org/0000-0003-0563-1760</orcidid><orcidid>https://orcid.org/0000-0002-7307-0443</orcidid><orcidid>https://orcid.org/0000-0002-1692-0069</orcidid><orcidid>https://orcid.org/0000-0002-4491-2023</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0302-9743 |
ispartof | Computer Vision - ECCV 2024, 2024, Vol.15129, p.342-359 |
issn | 0302-9743 1611-3349 |
language | eng |
recordid | cdi_proquest_ebookcentralchapters_31749088_291_427 |
source | Springer Books |
subjects | Coding for machine Image compression Multiple tasks |
title | A Unified Image Compression Method for Human Perception and Multiple Vision Tasks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T20%3A32%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=A%20Unified%20Image%20Compression%20Method%20for%20Human%20Perception%20and%20Multiple%20Vision%20Tasks&rft.btitle=Computer%20Vision%20-%20ECCV%202024&rft.au=Guo,%20Sha&rft.date=2024&rft.volume=15129&rft.spage=342&rft.epage=359&rft.pages=342-359&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783031732089&rft.isbn_list=3031732081&rft_id=info:doi/10.1007/978-3-031-73209-6_20&rft_dat=%3Cproquest_sprin%3EEBC31749088_291_427%3C/proquest_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=303173209X&rft.eisbn_list=9783031732096&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC31749088_291_427&rft_id=info:pmid/&rfr_iscdi=true |