Cycle GAN-Based Audio Source Separation Using Time–Frequency Masking
Audio source separation is addressed using time–frequency filtering and conditional adversarial networks. First, pitch tracks in the mixed audio are estimated using a multi-pitch tracking algorithm, and binary masks are generated corresponding to each pitch track. Later, time–frequency filtering is...
Gespeichert in:
Veröffentlicht in: | Circuits, systems, and signal processing systems, and signal processing, 2023-02, Vol.42 (2), p.1163-1180 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Audio source separation is addressed using time–frequency filtering and conditional adversarial networks. First, pitch tracks in the mixed audio are estimated using a multi-pitch tracking algorithm, and binary masks are generated corresponding to each pitch track. Later, time–frequency filtering is done on the spectrogram of the input audio using generated binary mask. The filtered spectrogram is enhanced using conditional adversarial networks. Individual audio sources are reconstructed from the refined spectrogram using the mixed-signal phase. The performance is assessed using objective and subjective evaluation. The performance of the model is compared with that of the frequency domain deep clustering model and time-domain Conv-TasNet model. The proposed model shows a competing performance with that of the baseline models. |
---|---|
ISSN: | 0278-081X 1531-5878 |
DOI: | 10.1007/s00034-022-02178-1 |