Guided data repair

In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improvi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2011-02, Vol.4 (5), p.279-289
Hauptverfasser: Yakout, Mohamed, Elmagarmid, Ahmed K., Neville, Jennifer, Ouzzani, Mourad, Ilyas, Ihab F.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 289
container_issue 5
container_start_page 279
container_title Proceedings of the VLDB Endowment
container_volume 4
creator Yakout, Mohamed
Elmagarmid, Ahmed K.
Neville, Jennifer
Ouzzani, Mourad
Ilyas, Ihab F.
description In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specific updates. To rank potential updates for consultation by the user, we first group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively refine the training set for the model. We empirically evaluate GDR on a real-world dataset and show significant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.
doi_str_mv 10.14778/1952376.1952378
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_1952376_1952378</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_1952376_1952378</sourcerecordid><originalsourceid>FETCH-LOGICAL-c309t-cca2a1cac94af43ea592b4d159c40a77656585d965674b2bf774a5c754ad9d73</originalsourceid><addsrcrecordid>eNpNz7tqAzEQhWFhHPAlaVy59AusPbqMRiqDiS-w4Mb9MitpYUOCjWQXefuEeAtX36kO_EIsJaylIXIb6VFpsuuHbiSmSiJUDjyNn_ZEzEr5BLDOSjcVi_29jymuIt94ldOV-_wqXjr-KultcC7Ou4_z9lDVp_1x-15XQYO_VSGwYhk4eMOd0YnRq9ZEiT4YYCKLFh1G_yeZVrUdkWEMhIajj6TnAh63IV9Kyalrrrn_5vzTSGj-k5ohadDpXyZqPGI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Guided data repair</title><source>ACM Digital Library</source><creator>Yakout, Mohamed ; Elmagarmid, Ahmed K. ; Neville, Jennifer ; Ouzzani, Mourad ; Ilyas, Ihab F.</creator><creatorcontrib>Yakout, Mohamed ; Elmagarmid, Ahmed K. ; Neville, Jennifer ; Ouzzani, Mourad ; Ilyas, Ihab F.</creatorcontrib><description>In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specific updates. To rank potential updates for consultation by the user, we first group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively refine the training set for the model. We empirically evaluate GDR on a real-world dataset and show significant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/1952376.1952378</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2011-02, Vol.4 (5), p.279-289</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c309t-cca2a1cac94af43ea592b4d159c40a77656585d965674b2bf774a5c754ad9d73</citedby><cites>FETCH-LOGICAL-c309t-cca2a1cac94af43ea592b4d159c40a77656585d965674b2bf774a5c754ad9d73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Yakout, Mohamed</creatorcontrib><creatorcontrib>Elmagarmid, Ahmed K.</creatorcontrib><creatorcontrib>Neville, Jennifer</creatorcontrib><creatorcontrib>Ouzzani, Mourad</creatorcontrib><creatorcontrib>Ilyas, Ihab F.</creatorcontrib><title>Guided data repair</title><title>Proceedings of the VLDB Endowment</title><description>In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specific updates. To rank potential updates for consultation by the user, we first group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively refine the training set for the model. We empirically evaluate GDR on a real-world dataset and show significant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNpNz7tqAzEQhWFhHPAlaVy59AusPbqMRiqDiS-w4Mb9MitpYUOCjWQXefuEeAtX36kO_EIsJaylIXIb6VFpsuuHbiSmSiJUDjyNn_ZEzEr5BLDOSjcVi_29jymuIt94ldOV-_wqXjr-KultcC7Ou4_z9lDVp_1x-15XQYO_VSGwYhk4eMOd0YnRq9ZEiT4YYCKLFh1G_yeZVrUdkWEMhIajj6TnAh63IV9Kyalrrrn_5vzTSGj-k5ohadDpXyZqPGI</recordid><startdate>20110201</startdate><enddate>20110201</enddate><creator>Yakout, Mohamed</creator><creator>Elmagarmid, Ahmed K.</creator><creator>Neville, Jennifer</creator><creator>Ouzzani, Mourad</creator><creator>Ilyas, Ihab F.</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20110201</creationdate><title>Guided data repair</title><author>Yakout, Mohamed ; Elmagarmid, Ahmed K. ; Neville, Jennifer ; Ouzzani, Mourad ; Ilyas, Ihab F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c309t-cca2a1cac94af43ea592b4d159c40a77656585d965674b2bf774a5c754ad9d73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yakout, Mohamed</creatorcontrib><creatorcontrib>Elmagarmid, Ahmed K.</creatorcontrib><creatorcontrib>Neville, Jennifer</creatorcontrib><creatorcontrib>Ouzzani, Mourad</creatorcontrib><creatorcontrib>Ilyas, Ihab F.</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yakout, Mohamed</au><au>Elmagarmid, Ahmed K.</au><au>Neville, Jennifer</au><au>Ouzzani, Mourad</au><au>Ilyas, Ihab F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Guided data repair</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2011-02-01</date><risdate>2011</risdate><volume>4</volume><issue>5</issue><spage>279</spage><epage>289</epage><pages>279-289</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specific updates. To rank potential updates for consultation by the user, we first group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively refine the training set for the model. We empirically evaluate GDR on a real-world dataset and show significant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.</abstract><doi>10.14778/1952376.1952378</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2011-02, Vol.4 (5), p.279-289
issn 2150-8097
2150-8097
language eng
recordid cdi_crossref_primary_10_14778_1952376_1952378
source ACM Digital Library
title Guided data repair
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T10%3A40%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Guided%20data%20repair&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Yakout,%20Mohamed&rft.date=2011-02-01&rft.volume=4&rft.issue=5&rft.spage=279&rft.epage=289&rft.pages=279-289&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/1952376.1952378&rft_dat=%3Ccrossref%3E10_14778_1952376_1952378%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true