Analyzing the Impact of Splicing Artifacts in Partially Fake Speech Signals
Speech deepfake detection has recently gained significant attention within the multimedia forensics community. Related issues have also been explored, such as the identification of partially fake signals, i.e., tracks that include both real and fake speech segments. However, generating high-quality...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Speech deepfake detection has recently gained significant attention within
the multimedia forensics community. Related issues have also been explored,
such as the identification of partially fake signals, i.e., tracks that include
both real and fake speech segments. However, generating high-quality spliced
audio is not as straightforward as it may appear. Spliced signals are typically
created through basic signal concatenation. This process could introduce
noticeable artifacts that can make the generated data easier to detect. We
analyze spliced audio tracks resulting from signal concatenation, investigate
their artifacts and assess whether such artifacts introduce any bias in
existing datasets. Our findings reveal that by analyzing splicing artifacts, we
can achieve a detection EER of 6.16% and 7.36% on PartialSpoof and HAD
datasets, respectively, without needing to train any detector. These results
underscore the complexities of generating reliable spliced audio data and lead
to discussions that can help improve future research in this area. |
---|---|
DOI: | 10.48550/arxiv.2408.13784 |