Synthesizing products for online catalogs

A comprehensive product catalog is essential to the success of Product Search engines and shopping sites such as Yahoo! Shopping, Google Product Search, and Bing Shopping. Given the large number of products and the speed at which they are released to the market, keeping catalogs up-to-date becomes a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2011-04, Vol.4 (7), p.409-418
Hauptverfasser: Nguyen, Hoa, Fuxman, Ariel, Paparizos, Stelios, Freire, Juliana, Agrawal, Rakesh
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 418
container_issue 7
container_start_page 409
container_title Proceedings of the VLDB Endowment
container_volume 4
creator Nguyen, Hoa
Fuxman, Ariel
Paparizos, Stelios
Freire, Juliana
Agrawal, Rakesh
description A comprehensive product catalog is essential to the success of Product Search engines and shopping sites such as Yahoo! Shopping, Google Product Search, and Bing Shopping. Given the large number of products and the speed at which they are released to the market, keeping catalogs up-to-date becomes a challenging task, calling for the need of automated techniques. In this paper, we introduce the problem of product synthesis, a key component of catalog creation and maintenance. Given a set of offers advertised by merchants, the goal is to identify new products and add them to the catalog, together with their (structured) attributes. A fundamental challenge in product synthesis is the scale of the problem. A Product Search engine receives data from thousands of merchants about millions of products; the product taxonomy contains thousands of categories, where each category has a different schema; and merchants use representations for products that are different from the ones used in the catalog of the Product Search engine. We propose a system that provides an end-to-end solution to the product synthesis problem, and addresses issues involved in data extraction from offers, schema reconciliation, and data fusion. For the schema reconciliation component, we developed a novel and scalable technique for schema matching which leverages knowledge about previously-known instance-level associations between offers and products; and it is trained using automatically created training sets (no manually-labeled data is needed). We present an experimental evaluation using data from Bing Shopping for more than 800K offers, a thousand merchants, and 400 categories. The evaluation confirms that our approach is able to automatically generate a large number of accurate product specifications. Furthermore, the evaluation shows that our schema reconciliation component outperforms state-of-the-art schema matching techniques in terms of precision and recall.
doi_str_mv 10.14778/1988776.1988777
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_1988776_1988777</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_1988776_1988777</sourcerecordid><originalsourceid>FETCH-LOGICAL-c285t-c1a121c7b85d032da2e7009376ba3fdf450037e09021901fb8e54c9dbd584a673</originalsourceid><addsrcrecordid>eNpNzz1PwzAUhWELFYlS2BmzMqTca8e-9ogqvqRKDMAcOf4oQSGu7DCUX49EM3R6znSkl7EbhDU2RPoOjdZEan2UztiSo4Rag6HFyb5gl6V8ASitUC_Z7dthnD5D6X_7cVftc_I_bipVTLlK49CPoXJ2skPalSt2Hu1QwvXsin08Prxvnuvt69PL5n5bO67lVDu0yNFRp6UHwb3lgQCMINVZEX1sJICgAAY4GsDY6SAbZ3znpW6sIrFicPx1OZWSQ2z3uf-2-dAitP-p7Zw6S-IPJCdFzQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Synthesizing products for online catalogs</title><source>ACM Digital Library Complete</source><creator>Nguyen, Hoa ; Fuxman, Ariel ; Paparizos, Stelios ; Freire, Juliana ; Agrawal, Rakesh</creator><creatorcontrib>Nguyen, Hoa ; Fuxman, Ariel ; Paparizos, Stelios ; Freire, Juliana ; Agrawal, Rakesh</creatorcontrib><description>A comprehensive product catalog is essential to the success of Product Search engines and shopping sites such as Yahoo! Shopping, Google Product Search, and Bing Shopping. Given the large number of products and the speed at which they are released to the market, keeping catalogs up-to-date becomes a challenging task, calling for the need of automated techniques. In this paper, we introduce the problem of product synthesis, a key component of catalog creation and maintenance. Given a set of offers advertised by merchants, the goal is to identify new products and add them to the catalog, together with their (structured) attributes. A fundamental challenge in product synthesis is the scale of the problem. A Product Search engine receives data from thousands of merchants about millions of products; the product taxonomy contains thousands of categories, where each category has a different schema; and merchants use representations for products that are different from the ones used in the catalog of the Product Search engine. We propose a system that provides an end-to-end solution to the product synthesis problem, and addresses issues involved in data extraction from offers, schema reconciliation, and data fusion. For the schema reconciliation component, we developed a novel and scalable technique for schema matching which leverages knowledge about previously-known instance-level associations between offers and products; and it is trained using automatically created training sets (no manually-labeled data is needed). We present an experimental evaluation using data from Bing Shopping for more than 800K offers, a thousand merchants, and 400 categories. The evaluation confirms that our approach is able to automatically generate a large number of accurate product specifications. Furthermore, the evaluation shows that our schema reconciliation component outperforms state-of-the-art schema matching techniques in terms of precision and recall.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/1988776.1988777</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2011-04, Vol.4 (7), p.409-418</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c285t-c1a121c7b85d032da2e7009376ba3fdf450037e09021901fb8e54c9dbd584a673</citedby><cites>FETCH-LOGICAL-c285t-c1a121c7b85d032da2e7009376ba3fdf450037e09021901fb8e54c9dbd584a673</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Nguyen, Hoa</creatorcontrib><creatorcontrib>Fuxman, Ariel</creatorcontrib><creatorcontrib>Paparizos, Stelios</creatorcontrib><creatorcontrib>Freire, Juliana</creatorcontrib><creatorcontrib>Agrawal, Rakesh</creatorcontrib><title>Synthesizing products for online catalogs</title><title>Proceedings of the VLDB Endowment</title><description>A comprehensive product catalog is essential to the success of Product Search engines and shopping sites such as Yahoo! Shopping, Google Product Search, and Bing Shopping. Given the large number of products and the speed at which they are released to the market, keeping catalogs up-to-date becomes a challenging task, calling for the need of automated techniques. In this paper, we introduce the problem of product synthesis, a key component of catalog creation and maintenance. Given a set of offers advertised by merchants, the goal is to identify new products and add them to the catalog, together with their (structured) attributes. A fundamental challenge in product synthesis is the scale of the problem. A Product Search engine receives data from thousands of merchants about millions of products; the product taxonomy contains thousands of categories, where each category has a different schema; and merchants use representations for products that are different from the ones used in the catalog of the Product Search engine. We propose a system that provides an end-to-end solution to the product synthesis problem, and addresses issues involved in data extraction from offers, schema reconciliation, and data fusion. For the schema reconciliation component, we developed a novel and scalable technique for schema matching which leverages knowledge about previously-known instance-level associations between offers and products; and it is trained using automatically created training sets (no manually-labeled data is needed). We present an experimental evaluation using data from Bing Shopping for more than 800K offers, a thousand merchants, and 400 categories. The evaluation confirms that our approach is able to automatically generate a large number of accurate product specifications. Furthermore, the evaluation shows that our schema reconciliation component outperforms state-of-the-art schema matching techniques in terms of precision and recall.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNpNzz1PwzAUhWELFYlS2BmzMqTca8e-9ogqvqRKDMAcOf4oQSGu7DCUX49EM3R6znSkl7EbhDU2RPoOjdZEan2UztiSo4Rag6HFyb5gl6V8ASitUC_Z7dthnD5D6X_7cVftc_I_bipVTLlK49CPoXJ2skPalSt2Hu1QwvXsin08Prxvnuvt69PL5n5bO67lVDu0yNFRp6UHwb3lgQCMINVZEX1sJICgAAY4GsDY6SAbZ3znpW6sIrFicPx1OZWSQ2z3uf-2-dAitP-p7Zw6S-IPJCdFzQ</recordid><startdate>20110401</startdate><enddate>20110401</enddate><creator>Nguyen, Hoa</creator><creator>Fuxman, Ariel</creator><creator>Paparizos, Stelios</creator><creator>Freire, Juliana</creator><creator>Agrawal, Rakesh</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20110401</creationdate><title>Synthesizing products for online catalogs</title><author>Nguyen, Hoa ; Fuxman, Ariel ; Paparizos, Stelios ; Freire, Juliana ; Agrawal, Rakesh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c285t-c1a121c7b85d032da2e7009376ba3fdf450037e09021901fb8e54c9dbd584a673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Hoa</creatorcontrib><creatorcontrib>Fuxman, Ariel</creatorcontrib><creatorcontrib>Paparizos, Stelios</creatorcontrib><creatorcontrib>Freire, Juliana</creatorcontrib><creatorcontrib>Agrawal, Rakesh</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen, Hoa</au><au>Fuxman, Ariel</au><au>Paparizos, Stelios</au><au>Freire, Juliana</au><au>Agrawal, Rakesh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Synthesizing products for online catalogs</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2011-04-01</date><risdate>2011</risdate><volume>4</volume><issue>7</issue><spage>409</spage><epage>418</epage><pages>409-418</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>A comprehensive product catalog is essential to the success of Product Search engines and shopping sites such as Yahoo! Shopping, Google Product Search, and Bing Shopping. Given the large number of products and the speed at which they are released to the market, keeping catalogs up-to-date becomes a challenging task, calling for the need of automated techniques. In this paper, we introduce the problem of product synthesis, a key component of catalog creation and maintenance. Given a set of offers advertised by merchants, the goal is to identify new products and add them to the catalog, together with their (structured) attributes. A fundamental challenge in product synthesis is the scale of the problem. A Product Search engine receives data from thousands of merchants about millions of products; the product taxonomy contains thousands of categories, where each category has a different schema; and merchants use representations for products that are different from the ones used in the catalog of the Product Search engine. We propose a system that provides an end-to-end solution to the product synthesis problem, and addresses issues involved in data extraction from offers, schema reconciliation, and data fusion. For the schema reconciliation component, we developed a novel and scalable technique for schema matching which leverages knowledge about previously-known instance-level associations between offers and products; and it is trained using automatically created training sets (no manually-labeled data is needed). We present an experimental evaluation using data from Bing Shopping for more than 800K offers, a thousand merchants, and 400 categories. The evaluation confirms that our approach is able to automatically generate a large number of accurate product specifications. Furthermore, the evaluation shows that our schema reconciliation component outperforms state-of-the-art schema matching techniques in terms of precision and recall.</abstract><doi>10.14778/1988776.1988777</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2011-04, Vol.4 (7), p.409-418
issn 2150-8097
2150-8097
language eng
recordid cdi_crossref_primary_10_14778_1988776_1988777
source ACM Digital Library Complete
title Synthesizing products for online catalogs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T13%3A05%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Synthesizing%20products%20for%20online%20catalogs&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Nguyen,%20Hoa&rft.date=2011-04-01&rft.volume=4&rft.issue=7&rft.spage=409&rft.epage=418&rft.pages=409-418&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/1988776.1988777&rft_dat=%3Ccrossref%3E10_14778_1988776_1988777%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true