PUMGPT: A Large Vision-Language Model for Product Understanding
E-commerce platforms benefit from accurate product understanding to enhance user experience and operational efficiency. Traditional methods often focus on isolated tasks such as attribute extraction or categorization, posing adaptability issues to evolving tasks and leading to usability challenges w...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | E-commerce platforms benefit from accurate product understanding to enhance
user experience and operational efficiency. Traditional methods often focus on
isolated tasks such as attribute extraction or categorization, posing
adaptability issues to evolving tasks and leading to usability challenges with
noisy data from the internet. Current Large Vision Language Models (LVLMs) lack
domain-specific fine-tuning, thus falling short in precision and instruction
following. To address these issues, we introduce PumGPT, the first e-commerce
specialized LVLM designed for multi-modal product understanding tasks. We
collected and curated a dataset of over one million products from AliExpress,
filtering out non-inferable attributes using a universal hallucination
detection framework, resulting in 663k high-quality data samples. PumGPT
focuses on five essential tasks aimed at enhancing workflows for e-commerce
platforms and retailers. We also introduce PumBench, a benchmark to evaluate
product understanding across LVLMs. Our experiments show that PumGPT
outperforms five other open-source LVLMs and GPT-4V in product understanding
tasks. We also conduct extensive analytical experiments to delve deeply into
the superiority of PumGPT, demonstrating the necessity for a specialized model
in the e-commerce domain. |
---|---|
DOI: | 10.48550/arxiv.2308.09568 |