SPGM: Prioritizing Local Features for enhanced speech separation performance
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, w...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Dual-path is a popular architecture for speech separation models (e.g.
Sepformer) which splits long sequences into overlapping chunks for its intra-
and inter-blocks that separately model intra-chunk local features and
inter-chunk global relationships. However, it has been found that inter-blocks,
which comprise half a dual-path model's parameters, contribute minimally to
performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to
replace inter-blocks. SPGM is named after its structure consisting of a
parameter-free global pooling module followed by a modulation module comprising
only 2% of the model's total parameters. The SPGM block allows all transformer
layers in the model to be dedicated to local feature modelling, making the
overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4
dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and
0.3 dB respectively and matches the performance of recent SOTA models with up
to 8 times fewer parameters. Model and weights are available at
huggingface.co/yipjiaqi/spgm |
---|---|
DOI: | 10.48550/arxiv.2309.12608 |