SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Spurred by the demand for interpretable models, research on eXplainable AI for language technologies has experienced significant growth, with feature attribution methods emerging as a cornerstone of this progress. While prior work in NLP explored such methods for classification tasks and textual app...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Spurred by the demand for interpretable models, research on eXplainable AI
for language technologies has experienced significant growth, with feature
attribution methods emerging as a cornerstone of this progress. While prior
work in NLP explored such methods for classification tasks and textual
applications, explainability intersecting generation and speech is lagging,
with existing techniques failing to account for the autoregressive nature of
state-of-the-art models and to provide fine-grained, phonetically meaningful
explanations. We address this gap by introducing Spectrogram Perturbation for
Explainable Speech-to-text Generation (SPES), a feature attribution technique
applicable to sequence generation tasks with autoregressive models. SPES
provides explanations for each predicted token based on both the input
spectrogram and the previously generated tokens. Extensive evaluation on speech
recognition and translation demonstrates that SPES generates explanations that
are faithful and plausible to humans. |
---|---|
DOI: | 10.48550/arxiv.2411.01710 |