Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection

The TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy.    This research uses transfer learni...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 2023-07, Vol.17 (3), p.291-300
Hauptverfasser: Pudya Wardana, Muhammad Zha'farudin, Wibowo, Moh. Edi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy.    This research uses transfer learning techniques to reduce training time and limits the number of training epochs to 20. From video data, the audio feature is extracted with Mel-spectrogram representation, and the visual features are picked from a video frame. The datasets were gathered by recording programs from various TV channels in Indonesia. Pre-trained CNN models such as MobileNetV2, InceptionV3, and DenseNet169 are re-trained and are used to detect commercials at the shot level. We do post-processing to cluster the shots into segments of commercials and non-commercials.    The best result is shown by Audio-Visual CNN using transfer learning with an accuracy of 93.26% with only 20 training epochs. It is faster and better than the CNN model without using transfer learning with an accuracy of 88.17% and 77 training epochs. The result by adding post-processing increases the accuracy of Audio-Visual CNN using transfer learning to 96.42%.
ISSN:1978-1520
2460-7258
DOI:10.22146/ijccs.76058