Vernacular Search Query Translation with Unsupervised Domain Adaptation
With the democratization of e-commerce platforms, an increasingly diversified user base is opting to shop online. To provide a comfortable and reliable shopping experience, it's important to enable users to interact with the platform in the language of their choice. An accurate query translatio...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the democratization of e-commerce platforms, an increasingly diversified
user base is opting to shop online. To provide a comfortable and reliable
shopping experience, it's important to enable users to interact with the
platform in the language of their choice. An accurate query translation is
essential for Cross-Lingual Information Retrieval (CLIR) with vernacular
queries. Due to internet-scale operations, e-commerce platforms get millions of
search queries every day. However, creating a parallel training set to train an
in-domain translation model is cumbersome. This paper proposes an unsupervised
domain adaptation approach to translate search queries without using any
parallel corpus. We use an open-domain translation model (trained on public
corpus) and adapt it to the query data using only the monolingual queries from
two languages. In addition, fine-tuning with a small labeled set further
improves the result. For demonstration, we show results for Hindi to English
query translation and use mBART-large-50 model as the baseline to improve upon.
Experimental results show that, without using any parallel corpus, we obtain
more than 20 BLEU points improvement over the baseline while fine-tuning with a
small 50k labeled set provides more than 27 BLEU points improvement over the
baseline. |
---|---|
DOI: | 10.48550/arxiv.2208.03711 |