Enhancing Spoofing Speech Detection Using Rhythm Information
Current spoofing speech detection systems need more convincing evidence. In this paper, the flaws of rhythm information inherent in the TTS-generated speech are analyzed to increase the reliability of detection systems. TTS models take text as input and utilize acoustic models to predict rhythm info...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Current spoofing speech detection systems need more convincing evidence. In
this paper, the flaws of rhythm information inherent in the TTS-generated
speech are analyzed to increase the reliability of detection systems. TTS
models take text as input and utilize acoustic models to predict rhythm
information, which introduces artifacts in the rhythm information. By filtering
out vocal tract response, the remaining glottal flow with rhythm information
retains detection ability for TTS-generated speech. Based on these analyses, a
rhythm perturbation module is proposed to enhance the copy-synthesis data
augmentation method. Fake utterances generated by the proposed method force the
detecting model to pay attention to the artifacts in rhythm information and
effectively improve the ability to detect TTS-generated speech of the
anti-spoofing countermeasures. |
---|---|
DOI: | 10.48550/arxiv.2310.12014 |