SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech
As more speech technologies rely on a supervised deep learning approach with clean speech as the ground truth, a methodology to onboard said speech at scale is needed. However, this approach needs to minimize the dependency on human listening and annotation, only requiring a human-in-the-loop when n...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As more speech technologies rely on a supervised deep learning approach with
clean speech as the ground truth, a methodology to onboard said speech at scale
is needed. However, this approach needs to minimize the dependency on human
listening and annotation, only requiring a human-in-the-loop when needed. In
this paper, we address this issue by outlining Speech Enhancement-based
Curation Pipeline (SECP) which serves as a framework to onboard clean speech.
This clean speech can then train a speech enhancement model, which can further
refine the original dataset and thus close the iterative loop. By running two
iterative rounds, we observe that enhanced output used as ground truth does not
degrade model performance according to $\Delta_{PESQ}$, a metric used in this
paper. We also show through comparative mean opinion score (CMOS) based
subjective tests that the highest and lowest bound of refined data is
perceptually better than the original data. |
---|---|
DOI: | 10.48550/arxiv.2402.12482 |