SignAttention: On the Interpretability of Transformer Models for Sign Language Translation
This paper presents the first comprehensive interpretability analysis of a Transformer-based Sign Language Translation (SLT) model, focusing on the translation from video-based Greek Sign Language to glosses and text. Leveraging the Greek Sign Language Dataset, we examine the attention mechanisms wi...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents the first comprehensive interpretability analysis of a
Transformer-based Sign Language Translation (SLT) model, focusing on the
translation from video-based Greek Sign Language to glosses and text.
Leveraging the Greek Sign Language Dataset, we examine the attention mechanisms
within the model to understand how it processes and aligns visual input with
sequential glosses. Our analysis reveals that the model pays attention to
clusters of frames rather than individual ones, with a diagonal alignment
pattern emerging between poses and glosses, which becomes less distinct as the
number of glosses increases. We also explore the relative contributions of
cross-attention and self-attention at each decoding step, finding that the
model initially relies on video frames but shifts its focus to previously
predicted tokens as the translation progresses. This work contributes to a
deeper understanding of SLT models, paving the way for the development of more
transparent and reliable translation systems essential for real-world
applications. |
---|---|
DOI: | 10.48550/arxiv.2410.14506 |