Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace
Evaluating datasets in data marketplaces, where the buyer aim to purchase valuable data, is a critical challenge. In this paper, we introduce an innovative task-agnostic data valuation method called PriArTa which is an approach for computing the distance between the distribution of the buyer's...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Jahani-Nezhad, Tayyebeh Moradi, Parsa Maddah-Ali, Mohammad Ali Caire, Giuseppe |
description | Evaluating datasets in data marketplaces, where the buyer aim to purchase
valuable data, is a critical challenge. In this paper, we introduce an
innovative task-agnostic data valuation method called PriArTa which is an
approach for computing the distance between the distribution of the buyer's
existing dataset and the seller's dataset, allowing the buyer to determine how
effectively the new data can enhance its dataset. PriArTa is
communication-efficient, enabling the buyer to evaluate datasets without
needing access to the entire dataset from each seller. Instead, the buyer
requests that sellers perform specific preprocessing on their data and then
send back the results. Using this information and a scoring metric, the buyer
can evaluate the dataset. The preprocessing is designed to allow the buyer to
compute the score while preserving the privacy of each seller's dataset,
mitigating the risk of information leakage before the purchase. A key feature
of PriArTa is its robustness to common data transformations, ensuring
consistent value assessment and reducing the risk of purchasing redundant data.
The effectiveness of PriArTa is demonstrated through experiments on real-world
image datasets, showing its ability to perform privacy-preserving,
augmentation-robust data valuation in data marketplaces. |
doi_str_mv | 10.48550/arxiv.2411.00745 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_00745</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_00745</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_007453</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwMDcx5WSIDijKLEssSdVRcCxNz03NK0ksyczP0w3KTyotLlFIzEtRCEksztZ1TM_LLy7JTFZwSSxJVAhLzCkFq1NwLCgoyk9MzlBIyy-CyPkmFmWnlhTkJCan8jCwpiXmFKfyQmluBnk31xBnD12wM-ILijJzE4sq40HOiQc7x5iwCgDOtUBG</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace</title><source>arXiv.org</source><creator>Jahani-Nezhad, Tayyebeh ; Moradi, Parsa ; Maddah-Ali, Mohammad Ali ; Caire, Giuseppe</creator><creatorcontrib>Jahani-Nezhad, Tayyebeh ; Moradi, Parsa ; Maddah-Ali, Mohammad Ali ; Caire, Giuseppe</creatorcontrib><description>Evaluating datasets in data marketplaces, where the buyer aim to purchase
valuable data, is a critical challenge. In this paper, we introduce an
innovative task-agnostic data valuation method called PriArTa which is an
approach for computing the distance between the distribution of the buyer's
existing dataset and the seller's dataset, allowing the buyer to determine how
effectively the new data can enhance its dataset. PriArTa is
communication-efficient, enabling the buyer to evaluate datasets without
needing access to the entire dataset from each seller. Instead, the buyer
requests that sellers perform specific preprocessing on their data and then
send back the results. Using this information and a scoring metric, the buyer
can evaluate the dataset. The preprocessing is designed to allow the buyer to
compute the score while preserving the privacy of each seller's dataset,
mitigating the risk of information leakage before the purchase. A key feature
of PriArTa is its robustness to common data transformations, ensuring
consistent value assessment and reducing the risk of purchasing redundant data.
The effectiveness of PriArTa is demonstrated through experiments on real-world
image datasets, showing its ability to perform privacy-preserving,
augmentation-robust data valuation in data marketplaces.</description><identifier>DOI: 10.48550/arxiv.2411.00745</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Learning</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.00745$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.00745$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jahani-Nezhad, Tayyebeh</creatorcontrib><creatorcontrib>Moradi, Parsa</creatorcontrib><creatorcontrib>Maddah-Ali, Mohammad Ali</creatorcontrib><creatorcontrib>Caire, Giuseppe</creatorcontrib><title>Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace</title><description>Evaluating datasets in data marketplaces, where the buyer aim to purchase
valuable data, is a critical challenge. In this paper, we introduce an
innovative task-agnostic data valuation method called PriArTa which is an
approach for computing the distance between the distribution of the buyer's
existing dataset and the seller's dataset, allowing the buyer to determine how
effectively the new data can enhance its dataset. PriArTa is
communication-efficient, enabling the buyer to evaluate datasets without
needing access to the entire dataset from each seller. Instead, the buyer
requests that sellers perform specific preprocessing on their data and then
send back the results. Using this information and a scoring metric, the buyer
can evaluate the dataset. The preprocessing is designed to allow the buyer to
compute the score while preserving the privacy of each seller's dataset,
mitigating the risk of information leakage before the purchase. A key feature
of PriArTa is its robustness to common data transformations, ensuring
consistent value assessment and reducing the risk of purchasing redundant data.
The effectiveness of PriArTa is demonstrated through experiments on real-world
image datasets, showing its ability to perform privacy-preserving,
augmentation-robust data valuation in data marketplaces.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwMDcx5WSIDijKLEssSdVRcCxNz03NK0ksyczP0w3KTyotLlFIzEtRCEksztZ1TM_LLy7JTFZwSSxJVAhLzCkFq1NwLCgoyk9MzlBIyy-CyPkmFmWnlhTkJCan8jCwpiXmFKfyQmluBnk31xBnD12wM-ILijJzE4sq40HOiQc7x5iwCgDOtUBG</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Jahani-Nezhad, Tayyebeh</creator><creator>Moradi, Parsa</creator><creator>Maddah-Ali, Mohammad Ali</creator><creator>Caire, Giuseppe</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241101</creationdate><title>Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace</title><author>Jahani-Nezhad, Tayyebeh ; Moradi, Parsa ; Maddah-Ali, Mohammad Ali ; Caire, Giuseppe</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_007453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jahani-Nezhad, Tayyebeh</creatorcontrib><creatorcontrib>Moradi, Parsa</creatorcontrib><creatorcontrib>Maddah-Ali, Mohammad Ali</creatorcontrib><creatorcontrib>Caire, Giuseppe</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jahani-Nezhad, Tayyebeh</au><au>Moradi, Parsa</au><au>Maddah-Ali, Mohammad Ali</au><au>Caire, Giuseppe</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace</atitle><date>2024-11-01</date><risdate>2024</risdate><abstract>Evaluating datasets in data marketplaces, where the buyer aim to purchase
valuable data, is a critical challenge. In this paper, we introduce an
innovative task-agnostic data valuation method called PriArTa which is an
approach for computing the distance between the distribution of the buyer's
existing dataset and the seller's dataset, allowing the buyer to determine how
effectively the new data can enhance its dataset. PriArTa is
communication-efficient, enabling the buyer to evaluate datasets without
needing access to the entire dataset from each seller. Instead, the buyer
requests that sellers perform specific preprocessing on their data and then
send back the results. Using this information and a scoring metric, the buyer
can evaluate the dataset. The preprocessing is designed to allow the buyer to
compute the score while preserving the privacy of each seller's dataset,
mitigating the risk of information leakage before the purchase. A key feature
of PriArTa is its robustness to common data transformations, ensuring
consistent value assessment and reducing the risk of purchasing redundant data.
The effectiveness of PriArTa is demonstrated through experiments on real-world
image datasets, showing its ability to perform privacy-preserving,
augmentation-robust data valuation in data marketplaces.</abstract><doi>10.48550/arxiv.2411.00745</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2411.00745 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2411_00745 |
source | arXiv.org |
subjects | Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning |
title | Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T19%3A58%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Private,%20Augmentation-Robust%20and%20Task-Agnostic%20Data%20Valuation%20Approach%20for%20Data%20Marketplace&rft.au=Jahani-Nezhad,%20Tayyebeh&rft.date=2024-11-01&rft_id=info:doi/10.48550/arxiv.2411.00745&rft_dat=%3Carxiv_GOX%3E2411_00745%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |