Cycle GAN-Based Audio Source Separation Using Time–Frequency Masking

Audio source separation is addressed using time–frequency filtering and conditional adversarial networks. First, pitch tracks in the mixed audio are estimated using a multi-pitch tracking algorithm, and binary masks are generated corresponding to each pitch track. Later, time–frequency filtering is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Circuits, systems, and signal processing systems, and signal processing, 2023-02, Vol.42 (2), p.1163-1180
Hauptverfasser: Joseph, Sujo, Rajan, Rajeev
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Audio source separation is addressed using time–frequency filtering and conditional adversarial networks. First, pitch tracks in the mixed audio are estimated using a multi-pitch tracking algorithm, and binary masks are generated corresponding to each pitch track. Later, time–frequency filtering is done on the spectrogram of the input audio using generated binary mask. The filtered spectrogram is enhanced using conditional adversarial networks. Individual audio sources are reconstructed from the refined spectrogram using the mixed-signal phase. The performance is assessed using objective and subjective evaluation. The performance of the model is compared with that of the frequency domain deep clustering model and time-domain Conv-TasNet model. The proposed model shows a competing performance with that of the baseline models.
ISSN:0278-081X
1531-5878
DOI:10.1007/s00034-022-02178-1