Detection of audio copy-move-forgery with novel feature matching on Mel spectrogram
•Audio copy-move forgery detection and localization.•Detect forgeries even under post-processing operations applied to forged speech to hide traces of forgery.•Using a keypoint-based approach on the Mel spectrogram representation of audio.•Performance against common post-processing operations such a...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2023-03, Vol.213, p.118963, Article 118963 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •Audio copy-move forgery detection and localization.•Detect forgeries even under post-processing operations applied to forged speech to hide traces of forgery.•Using a keypoint-based approach on the Mel spectrogram representation of audio.•Performance against common post-processing operations such as noise addition, filtering operation, and especially compression operation.
Audio copy-move-forgery created by copying one or more segments of an audio file and pasting it in a different position within the same audio is one of the most widely used methods in the field of audio forensics. This type of forgery is easy to apply but difficult to detect in the case of post-processing operations applied to forged speech to hide traces of forgeries. This paper proposes a robust method for the detection and localization of the audio copy-move forgery using a keypoint-based approach to the Mel spectrogram representation of audio. In the proposed method, first, the Mel spectrogram image is created from the input audio. Then, SIFT keypoints are obtained from each RGB color channel of this image. The obtained keypoints from each channel are matched via feature vectors to reveal the clues of the forgery regions, and the image sub-blocks whose keypoints are determined to be the center are labeled as forged blocks. Then the blocks in the neighborhood of the forged blocks are investigated whether forged or not. The proposed post-processing stage completes the determination of the forged regions. This stage eliminates the possible false positives and marks the forged areas in the spectrogram image. The forged segments are marked in the audio file by utilizing the positions of the forged regions in the spectrogram image. Experimental studies are carried out on two pitch-based datasets, using TIMIT and Arabic Speech Corpus. The paper presents the detailed performance results of popular referenced studies on these datasets. The performance results prove that the proposed method is more robust against common post-processing operations such as noise addition, filtering operation, and especially compression operation. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2022.118963 |