Datastore Distillation for Nearest Neighbor Machine Translation

Nearest neighbor machine translation (i.e., kNN-MT) is a promising approach to enhance translation quality by equipping pre-trained neural machine translation (NMT) models with the nearest neighbor retrieval. Despite its great success, kNN-MT typically requires ample space to store its token-level d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2024, Vol.32, p.807-817
Hauptverfasser:	Dai, Yuhan, Zhang, Zhirui, Du, Yichao, Liu, Shengcai, Liu, Lemao, Xu, Tong
Format:	Artikel
Sprache:	eng
Schlagworte:	datastore distillation Design optimization Distillation Iterative decoding Iterative methods Machine translation Merging Nearest neighbor machine translation Optimization Performance degradation Speech processing Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Nearest neighbor machine translation (i.e., kNN-MT) is a promising approach to enhance translation quality by equipping pre-trained neural machine translation (NMT) models with the nearest neighbor retrieval. Despite its great success, kNN-MT typically requires ample space to store its token-level datastore, causing kNN-MT to be less practical in edge devices or online scenarios. In this paper, inspired by the concept of knowledge distillation, we provide a new perspective to ease the storage overhead by datastore distillation, which is formalized as a constrained optimization problem. We further design a novel model-agnostic iterative nearest neighbor merging method for the datastore distillation problem to obtain an effective and efficient solution. Experiments on three benchmark datasets indicate that our approach not only reduces the volume of the datastore by up to 50% without significant performance degradation, but also outperforms other baselines by a large margin at the same compression rate. Another experiment conducted on WikiText-103 further demonstrates the effectiveness of our method in the language model task.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2023.3337633