Singing voice separation using mono-channel mask
Separating singing voice from monaural song recording is a highly difficult task. Still it is important because it has many applications such as singer identification, lyrics recognition, and melody extraction. Difficulty arises due to many musical instruments involved and time-varying spectral over...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2018-06, Vol.21 (2), p.309-318 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Separating singing voice from monaural song recording is a highly difficult task. Still it is important because it has many applications such as singer identification, lyrics recognition, and melody extraction. Difficulty arises due to many musical instruments involved and time-varying spectral overlap between singing voice and music. The goal of singing voice separation is to extract singing voice from the given monaural song recording with minimum artefacts and musical interference. We propose a three stage system for singing voice separation which helps to improve intelligibility and perceptual quality of the separated output. In the first stage, modified sub-harmonic summation algorithm finds pitch of the singing voice and its harmonic components. Here, we create a binary mask. In the second stage, frames i.e. the masked spectral amplitudes are classified as singing and non-singing frames by using a combination of Gammatone frequency cepstral coefficients (GFCC) and Mel-frequency cepstral coefficients (MFCC) features. Lastly, mono-channel mask is created and signal amplitude correction is done using kurtosis measure. We synthesize the estimate of singing voice using both binary mask and mono-channel mask. It is observed that the singing voice separated using mono-channel mask improves the GNSDR score. Performance of the proposed system is compared with the other methods, where it presents excellent improvement in terms of GNSDR. It produces higher GNSDR scores in case of two different datasets. |
---|---|
ISSN: | 1381-2416 1572-8110 |
DOI: | 10.1007/s10772-018-9509-6 |