Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace

Evaluating datasets in data marketplaces, where the buyer aim to purchase valuable data, is a critical challenge. In this paper, we introduce an innovative task-agnostic data valuation method called PriArTa which is an approach for computing the distance between the distribution of the buyer's...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jahani-Nezhad, Tayyebeh, Moradi, Parsa, Maddah-Ali, Mohammad Ali, Caire, Giuseppe
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Jahani-Nezhad, Tayyebeh
Moradi, Parsa
Maddah-Ali, Mohammad Ali
Caire, Giuseppe
description Evaluating datasets in data marketplaces, where the buyer aim to purchase valuable data, is a critical challenge. In this paper, we introduce an innovative task-agnostic data valuation method called PriArTa which is an approach for computing the distance between the distribution of the buyer's existing dataset and the seller's dataset, allowing the buyer to determine how effectively the new data can enhance its dataset. PriArTa is communication-efficient, enabling the buyer to evaluate datasets without needing access to the entire dataset from each seller. Instead, the buyer requests that sellers perform specific preprocessing on their data and then send back the results. Using this information and a scoring metric, the buyer can evaluate the dataset. The preprocessing is designed to allow the buyer to compute the score while preserving the privacy of each seller's dataset, mitigating the risk of information leakage before the purchase. A key feature of PriArTa is its robustness to common data transformations, ensuring consistent value assessment and reducing the risk of purchasing redundant data. The effectiveness of PriArTa is demonstrated through experiments on real-world image datasets, showing its ability to perform privacy-preserving, augmentation-robust data valuation in data marketplaces.
doi_str_mv 10.48550/arxiv.2411.00745
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_00745</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_00745</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_007453</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwMDcx5WSIDijKLEssSdVRcCxNz03NK0ksyczP0w3KTyotLlFIzEtRCEksztZ1TM_LLy7JTFZwSSxJVAhLzCkFq1NwLCgoyk9MzlBIyy-CyPkmFmWnlhTkJCan8jCwpiXmFKfyQmluBnk31xBnD12wM-ILijJzE4sq40HOiQc7x5iwCgDOtUBG</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace</title><source>arXiv.org</source><creator>Jahani-Nezhad, Tayyebeh ; Moradi, Parsa ; Maddah-Ali, Mohammad Ali ; Caire, Giuseppe</creator><creatorcontrib>Jahani-Nezhad, Tayyebeh ; Moradi, Parsa ; Maddah-Ali, Mohammad Ali ; Caire, Giuseppe</creatorcontrib><description>Evaluating datasets in data marketplaces, where the buyer aim to purchase valuable data, is a critical challenge. In this paper, we introduce an innovative task-agnostic data valuation method called PriArTa which is an approach for computing the distance between the distribution of the buyer's existing dataset and the seller's dataset, allowing the buyer to determine how effectively the new data can enhance its dataset. PriArTa is communication-efficient, enabling the buyer to evaluate datasets without needing access to the entire dataset from each seller. Instead, the buyer requests that sellers perform specific preprocessing on their data and then send back the results. Using this information and a scoring metric, the buyer can evaluate the dataset. The preprocessing is designed to allow the buyer to compute the score while preserving the privacy of each seller's dataset, mitigating the risk of information leakage before the purchase. A key feature of PriArTa is its robustness to common data transformations, ensuring consistent value assessment and reducing the risk of purchasing redundant data. The effectiveness of PriArTa is demonstrated through experiments on real-world image datasets, showing its ability to perform privacy-preserving, augmentation-robust data valuation in data marketplaces.</description><identifier>DOI: 10.48550/arxiv.2411.00745</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Learning</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.00745$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.00745$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jahani-Nezhad, Tayyebeh</creatorcontrib><creatorcontrib>Moradi, Parsa</creatorcontrib><creatorcontrib>Maddah-Ali, Mohammad Ali</creatorcontrib><creatorcontrib>Caire, Giuseppe</creatorcontrib><title>Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace</title><description>Evaluating datasets in data marketplaces, where the buyer aim to purchase valuable data, is a critical challenge. In this paper, we introduce an innovative task-agnostic data valuation method called PriArTa which is an approach for computing the distance between the distribution of the buyer's existing dataset and the seller's dataset, allowing the buyer to determine how effectively the new data can enhance its dataset. PriArTa is communication-efficient, enabling the buyer to evaluate datasets without needing access to the entire dataset from each seller. Instead, the buyer requests that sellers perform specific preprocessing on their data and then send back the results. Using this information and a scoring metric, the buyer can evaluate the dataset. The preprocessing is designed to allow the buyer to compute the score while preserving the privacy of each seller's dataset, mitigating the risk of information leakage before the purchase. A key feature of PriArTa is its robustness to common data transformations, ensuring consistent value assessment and reducing the risk of purchasing redundant data. The effectiveness of PriArTa is demonstrated through experiments on real-world image datasets, showing its ability to perform privacy-preserving, augmentation-robust data valuation in data marketplaces.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwMDcx5WSIDijKLEssSdVRcCxNz03NK0ksyczP0w3KTyotLlFIzEtRCEksztZ1TM_LLy7JTFZwSSxJVAhLzCkFq1NwLCgoyk9MzlBIyy-CyPkmFmWnlhTkJCan8jCwpiXmFKfyQmluBnk31xBnD12wM-ILijJzE4sq40HOiQc7x5iwCgDOtUBG</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Jahani-Nezhad, Tayyebeh</creator><creator>Moradi, Parsa</creator><creator>Maddah-Ali, Mohammad Ali</creator><creator>Caire, Giuseppe</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241101</creationdate><title>Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace</title><author>Jahani-Nezhad, Tayyebeh ; Moradi, Parsa ; Maddah-Ali, Mohammad Ali ; Caire, Giuseppe</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_007453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jahani-Nezhad, Tayyebeh</creatorcontrib><creatorcontrib>Moradi, Parsa</creatorcontrib><creatorcontrib>Maddah-Ali, Mohammad Ali</creatorcontrib><creatorcontrib>Caire, Giuseppe</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jahani-Nezhad, Tayyebeh</au><au>Moradi, Parsa</au><au>Maddah-Ali, Mohammad Ali</au><au>Caire, Giuseppe</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace</atitle><date>2024-11-01</date><risdate>2024</risdate><abstract>Evaluating datasets in data marketplaces, where the buyer aim to purchase valuable data, is a critical challenge. In this paper, we introduce an innovative task-agnostic data valuation method called PriArTa which is an approach for computing the distance between the distribution of the buyer's existing dataset and the seller's dataset, allowing the buyer to determine how effectively the new data can enhance its dataset. PriArTa is communication-efficient, enabling the buyer to evaluate datasets without needing access to the entire dataset from each seller. Instead, the buyer requests that sellers perform specific preprocessing on their data and then send back the results. Using this information and a scoring metric, the buyer can evaluate the dataset. The preprocessing is designed to allow the buyer to compute the score while preserving the privacy of each seller's dataset, mitigating the risk of information leakage before the purchase. A key feature of PriArTa is its robustness to common data transformations, ensuring consistent value assessment and reducing the risk of purchasing redundant data. The effectiveness of PriArTa is demonstrated through experiments on real-world image datasets, showing its ability to perform privacy-preserving, augmentation-robust data valuation in data marketplaces.</abstract><doi>10.48550/arxiv.2411.00745</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2411.00745
ispartof
issn
language eng
recordid cdi_arxiv_primary_2411_00745
source arXiv.org
subjects Computer Science - Distributed, Parallel, and Cluster Computing
Computer Science - Learning
title Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T19%3A58%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Private,%20Augmentation-Robust%20and%20Task-Agnostic%20Data%20Valuation%20Approach%20for%20Data%20Marketplace&rft.au=Jahani-Nezhad,%20Tayyebeh&rft.date=2024-11-01&rft_id=info:doi/10.48550/arxiv.2411.00745&rft_dat=%3Carxiv_GOX%3E2411_00745%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true