TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification

Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit val...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Su, Yindu, Zou, Huike, Sun, Lin, Zhang, Ting, Yang, Haiyang, Chen, Liyu, Lo, David, Zhang, Qingheng, Han, Shuguang, Chen, Jufeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Information Retrieval
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Su, Yindu Zou, Huike Sun, Lin Zhang, Ting Yang, Haiyang Chen, Liyu Lo, David Zhang, Qingheng Han, Shuguang Chen, Jufeng
description	Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity to the item embedding. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds. TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial scenarios. Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Moreover, it has been successfully deployed in a real-world e-commerce platform, processing millions of product listings daily while supporting dynamic, large-scale attribute taxonomies.
doi_str_mv	10.48550/arxiv.2501.03835
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2501_03835</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2501_03835</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2501_038353</originalsourceid><addsrcrecordid>eNqFjsEKgkAURWfTIqoPaNX7AW3MBGknYiQURElbeTpPGpjGGEepv2-S9q0ul3s5HMaWAfe3cRTxNZqXHPxNxAOfh3EYTZkqkvR42UEC1xoVVooAtYCsaWQtSVu4kDWSBlRehR0JOJG9twKa1kCuRd-5FRWcTSv62kJiXa96S3BD1RPkwjGkY6GVrZ6zSYOqo8UvZ2y1z4r04I1e5dPIB5p3-fUrR7_w_-MDLHdGFg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification</title><source>arXiv.org</source><creator>Su, Yindu ; Zou, Huike ; Sun, Lin ; Zhang, Ting ; Yang, Haiyang ; Chen, Liyu ; Lo, David ; Zhang, Qingheng ; Han, Shuguang ; Chen, Jufeng</creator><creatorcontrib>Su, Yindu ; Zou, Huike ; Sun, Lin ; Zhang, Ting ; Yang, Haiyang ; Chen, Liyu ; Lo, David ; Zhang, Qingheng ; Han, Shuguang ; Chen, Jufeng</creatorcontrib><description>Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity to the item embedding. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds. TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial scenarios. Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Moreover, it has been successfully deployed in a real-world e-commerce platform, processing millions of product listings daily while supporting dynamic, large-scale attribute taxonomies.</description><identifier>DOI: 10.48550/arxiv.2501.03835</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Information Retrieval</subject><creationdate>2025-01</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2501.03835$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2501.03835$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Su, Yindu</creatorcontrib><creatorcontrib>Zou, Huike</creatorcontrib><creatorcontrib>Sun, Lin</creatorcontrib><creatorcontrib>Zhang, Ting</creatorcontrib><creatorcontrib>Yang, Haiyang</creatorcontrib><creatorcontrib>Chen, Liyu</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Zhang, Qingheng</creatorcontrib><creatorcontrib>Han, Shuguang</creatorcontrib><creatorcontrib>Chen, Jufeng</creatorcontrib><title>TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification</title><description>Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity to the item embedding. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds. TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial scenarios. Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Moreover, it has been successfully deployed in a real-world e-commerce platform, processing millions of product listings daily while supporting dynamic, large-scale attribute taxonomies.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjsEKgkAURWfTIqoPaNX7AW3MBGknYiQURElbeTpPGpjGGEepv2-S9q0ul3s5HMaWAfe3cRTxNZqXHPxNxAOfh3EYTZkqkvR42UEC1xoVVooAtYCsaWQtSVu4kDWSBlRehR0JOJG9twKa1kCuRd-5FRWcTSv62kJiXa96S3BD1RPkwjGkY6GVrZ6zSYOqo8UvZ2y1z4r04I1e5dPIB5p3-fUrR7_w_-MDLHdGFg</recordid><startdate>20250107</startdate><enddate>20250107</enddate><creator>Su, Yindu</creator><creator>Zou, Huike</creator><creator>Sun, Lin</creator><creator>Zhang, Ting</creator><creator>Yang, Haiyang</creator><creator>Chen, Liyu</creator><creator>Lo, David</creator><creator>Zhang, Qingheng</creator><creator>Han, Shuguang</creator><creator>Chen, Jufeng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20250107</creationdate><title>TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification</title><author>Su, Yindu ; Zou, Huike ; Sun, Lin ; Zhang, Ting ; Yang, Haiyang ; Chen, Liyu ; Lo, David ; Zhang, Qingheng ; Han, Shuguang ; Chen, Jufeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2501_038353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Su, Yindu</creatorcontrib><creatorcontrib>Zou, Huike</creatorcontrib><creatorcontrib>Sun, Lin</creatorcontrib><creatorcontrib>Zhang, Ting</creatorcontrib><creatorcontrib>Yang, Haiyang</creatorcontrib><creatorcontrib>Chen, Liyu</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Zhang, Qingheng</creatorcontrib><creatorcontrib>Han, Shuguang</creatorcontrib><creatorcontrib>Chen, Jufeng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Su, Yindu</au><au>Zou, Huike</au><au>Sun, Lin</au><au>Zhang, Ting</au><au>Yang, Haiyang</au><au>Chen, Liyu</au><au>Lo, David</au><au>Zhang, Qingheng</au><au>Han, Shuguang</au><au>Chen, Jufeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification</atitle><date>2025-01-07</date><risdate>2025</risdate><abstract>Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity to the item embedding. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds. TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial scenarios. Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Moreover, it has been successfully deployed in a real-world e-commerce platform, processing millions of product listings daily while supporting dynamic, large-scale attribute taxonomies.</abstract><doi>10.48550/arxiv.2501.03835</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2501.03835
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2501_03835
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Information Retrieval
title	TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T05%3A39%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TACLR:%20A%20Scalable%20and%20Efficient%20Retrieval-based%20Method%20for%20Industrial%20Product%20Attribute%20Value%20Identification&rft.au=Su,%20Yindu&rft.date=2025-01-07&rft_id=info:doi/10.48550/arxiv.2501.03835&rft_dat=%3Carxiv_GOX%3E2501_03835%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true