Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms
The attention-based encoder-decoder structure, such as the Transformer, has achieved state-of-the-art performance on various sequence modeling tasks, e.g., machine translation (MT) and automatic speech recognition (ASR), benefited from the superior capability of layer-wise self-attention mechanism i...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2023, Vol.31, p.3993-4003 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The attention-based encoder-decoder structure, such as the Transformer, has achieved state-of-the-art performance on various sequence modeling tasks, e.g., machine translation (MT) and automatic speech recognition (ASR), benefited from the superior capability of layer-wise self-attention mechanism in the encoder/decoder to access long-distance contextual information. Recently, analysis on the Transformer layers has shown that different levels of information, e.g., phoneme level, word level and semantic level, are represented at different layers. Effectively integrating information from various levels is important for structured prediction. However, the self-attention in the conventional Transformer structure only focuses on intra-layer integration, and does not explicitly model inter-layer information relationships. Also, attention across the encoder and decoder (cross-coder) only focuses on the top encoder layer but ignores the intermediate layers. In this article, we propose a sequence modeling structure equipped with a hierarchical attention mechanism, named Hiformer, that can consider the inter-layer and cross-coder hierarchical information to improve structured prediction performance. Extensive experiments conducted on both MT and ASR tasks demonstrate the effectiveness of the proposed Hiformer model. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2023.3313428 |