TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification
Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit val...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Su, Yindu Zou, Huike Sun, Lin Zhang, Ting Yang, Haiyang Chen, Liyu Lo, David Zhang, Qingheng Han, Shuguang Chen, Jufeng |
description | Product Attribute Value Identification (PAVI) involves identifying attribute
values from product profiles, a key task for improving product search,
recommendations, and business analytics on e-commerce platforms. However,
existing PAVI methods face critical challenges, such as inferring implicit
values, handling out-of-distribution (OOD) values, and producing normalized
outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive
Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR
formulates PAVI as an information retrieval task by encoding product profiles
and candidate values into embeddings and retrieving values based on their
similarity to the item embedding. It leverages contrastive training with
taxonomy-aware hard negative sampling and employs adaptive inference with
dynamic thresholds. TACLR offers three key advantages: (1) it effectively
handles implicit and OOD values while producing normalized outputs; (2) it
scales to thousands of categories, tens of thousands of attributes, and
millions of values; and (3) it supports efficient inference for high-load
industrial scenarios. Extensive experiments on proprietary and public datasets
validate the effectiveness and efficiency of TACLR. Moreover, it has been
successfully deployed in a real-world e-commerce platform, processing millions
of product listings daily while supporting dynamic, large-scale attribute
taxonomies. |
doi_str_mv | 10.48550/arxiv.2501.03835 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2501_03835</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2501_03835</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2501_038353</originalsourceid><addsrcrecordid>eNqFjsEKgkAURWfTIqoPaNX7AW3MBGknYiQURElbeTpPGpjGGEepv2-S9q0ul3s5HMaWAfe3cRTxNZqXHPxNxAOfh3EYTZkqkvR42UEC1xoVVooAtYCsaWQtSVu4kDWSBlRehR0JOJG9twKa1kCuRd-5FRWcTSv62kJiXa96S3BD1RPkwjGkY6GVrZ6zSYOqo8UvZ2y1z4r04I1e5dPIB5p3-fUrR7_w_-MDLHdGFg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification</title><source>arXiv.org</source><creator>Su, Yindu ; Zou, Huike ; Sun, Lin ; Zhang, Ting ; Yang, Haiyang ; Chen, Liyu ; Lo, David ; Zhang, Qingheng ; Han, Shuguang ; Chen, Jufeng</creator><creatorcontrib>Su, Yindu ; Zou, Huike ; Sun, Lin ; Zhang, Ting ; Yang, Haiyang ; Chen, Liyu ; Lo, David ; Zhang, Qingheng ; Han, Shuguang ; Chen, Jufeng</creatorcontrib><description>Product Attribute Value Identification (PAVI) involves identifying attribute
values from product profiles, a key task for improving product search,
recommendations, and business analytics on e-commerce platforms. However,
existing PAVI methods face critical challenges, such as inferring implicit
values, handling out-of-distribution (OOD) values, and producing normalized
outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive
Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR
formulates PAVI as an information retrieval task by encoding product profiles
and candidate values into embeddings and retrieving values based on their
similarity to the item embedding. It leverages contrastive training with
taxonomy-aware hard negative sampling and employs adaptive inference with
dynamic thresholds. TACLR offers three key advantages: (1) it effectively
handles implicit and OOD values while producing normalized outputs; (2) it
scales to thousands of categories, tens of thousands of attributes, and
millions of values; and (3) it supports efficient inference for high-load
industrial scenarios. Extensive experiments on proprietary and public datasets
validate the effectiveness and efficiency of TACLR. Moreover, it has been
successfully deployed in a real-world e-commerce platform, processing millions
of product listings daily while supporting dynamic, large-scale attribute
taxonomies.</description><identifier>DOI: 10.48550/arxiv.2501.03835</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Information Retrieval</subject><creationdate>2025-01</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2501.03835$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2501.03835$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Su, Yindu</creatorcontrib><creatorcontrib>Zou, Huike</creatorcontrib><creatorcontrib>Sun, Lin</creatorcontrib><creatorcontrib>Zhang, Ting</creatorcontrib><creatorcontrib>Yang, Haiyang</creatorcontrib><creatorcontrib>Chen, Liyu</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Zhang, Qingheng</creatorcontrib><creatorcontrib>Han, Shuguang</creatorcontrib><creatorcontrib>Chen, Jufeng</creatorcontrib><title>TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification</title><description>Product Attribute Value Identification (PAVI) involves identifying attribute
values from product profiles, a key task for improving product search,
recommendations, and business analytics on e-commerce platforms. However,
existing PAVI methods face critical challenges, such as inferring implicit
values, handling out-of-distribution (OOD) values, and producing normalized
outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive
Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR
formulates PAVI as an information retrieval task by encoding product profiles
and candidate values into embeddings and retrieving values based on their
similarity to the item embedding. It leverages contrastive training with
taxonomy-aware hard negative sampling and employs adaptive inference with
dynamic thresholds. TACLR offers three key advantages: (1) it effectively
handles implicit and OOD values while producing normalized outputs; (2) it
scales to thousands of categories, tens of thousands of attributes, and
millions of values; and (3) it supports efficient inference for high-load
industrial scenarios. Extensive experiments on proprietary and public datasets
validate the effectiveness and efficiency of TACLR. Moreover, it has been
successfully deployed in a real-world e-commerce platform, processing millions
of product listings daily while supporting dynamic, large-scale attribute
taxonomies.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Information Retrieval</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjsEKgkAURWfTIqoPaNX7AW3MBGknYiQURElbeTpPGpjGGEepv2-S9q0ul3s5HMaWAfe3cRTxNZqXHPxNxAOfh3EYTZkqkvR42UEC1xoVVooAtYCsaWQtSVu4kDWSBlRehR0JOJG9twKa1kCuRd-5FRWcTSv62kJiXa96S3BD1RPkwjGkY6GVrZ6zSYOqo8UvZ2y1z4r04I1e5dPIB5p3-fUrR7_w_-MDLHdGFg</recordid><startdate>20250107</startdate><enddate>20250107</enddate><creator>Su, Yindu</creator><creator>Zou, Huike</creator><creator>Sun, Lin</creator><creator>Zhang, Ting</creator><creator>Yang, Haiyang</creator><creator>Chen, Liyu</creator><creator>Lo, David</creator><creator>Zhang, Qingheng</creator><creator>Han, Shuguang</creator><creator>Chen, Jufeng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20250107</creationdate><title>TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification</title><author>Su, Yindu ; Zou, Huike ; Sun, Lin ; Zhang, Ting ; Yang, Haiyang ; Chen, Liyu ; Lo, David ; Zhang, Qingheng ; Han, Shuguang ; Chen, Jufeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2501_038353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Information Retrieval</topic><toplevel>online_resources</toplevel><creatorcontrib>Su, Yindu</creatorcontrib><creatorcontrib>Zou, Huike</creatorcontrib><creatorcontrib>Sun, Lin</creatorcontrib><creatorcontrib>Zhang, Ting</creatorcontrib><creatorcontrib>Yang, Haiyang</creatorcontrib><creatorcontrib>Chen, Liyu</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Zhang, Qingheng</creatorcontrib><creatorcontrib>Han, Shuguang</creatorcontrib><creatorcontrib>Chen, Jufeng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Su, Yindu</au><au>Zou, Huike</au><au>Sun, Lin</au><au>Zhang, Ting</au><au>Yang, Haiyang</au><au>Chen, Liyu</au><au>Lo, David</au><au>Zhang, Qingheng</au><au>Han, Shuguang</au><au>Chen, Jufeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification</atitle><date>2025-01-07</date><risdate>2025</risdate><abstract>Product Attribute Value Identification (PAVI) involves identifying attribute
values from product profiles, a key task for improving product search,
recommendations, and business analytics on e-commerce platforms. However,
existing PAVI methods face critical challenges, such as inferring implicit
values, handling out-of-distribution (OOD) values, and producing normalized
outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive
Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR
formulates PAVI as an information retrieval task by encoding product profiles
and candidate values into embeddings and retrieving values based on their
similarity to the item embedding. It leverages contrastive training with
taxonomy-aware hard negative sampling and employs adaptive inference with
dynamic thresholds. TACLR offers three key advantages: (1) it effectively
handles implicit and OOD values while producing normalized outputs; (2) it
scales to thousands of categories, tens of thousands of attributes, and
millions of values; and (3) it supports efficient inference for high-load
industrial scenarios. Extensive experiments on proprietary and public datasets
validate the effectiveness and efficiency of TACLR. Moreover, it has been
successfully deployed in a real-world e-commerce platform, processing millions
of product listings daily while supporting dynamic, large-scale attribute
taxonomies.</abstract><doi>10.48550/arxiv.2501.03835</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2501.03835 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2501_03835 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Information Retrieval |
title | TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T05%3A39%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TACLR:%20A%20Scalable%20and%20Efficient%20Retrieval-based%20Method%20for%20Industrial%20Product%20Attribute%20Value%20Identification&rft.au=Su,%20Yindu&rft.date=2025-01-07&rft_id=info:doi/10.48550/arxiv.2501.03835&rft_dat=%3Carxiv_GOX%3E2501_03835%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |