Automated Event Detection and Classification in Soccer: The Potential of Using Multiple Modalities

Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and evaluates different approac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine learning and knowledge extraction 2021-12, Vol.3 (4), p.1030-1054
Hauptverfasser: Nergård Rongved, Olav Andre, Stige, Markus, Hicks, Steven Alexander, Thambawita, Vajira Lasantha, Midoglu, Cise, Zouganeli, Evi, Johansen, Dag, Riegler, Michael Alexander, Halvorsen, Pål
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and evaluates different approaches based on neural networks where we combine visual features with audio features to detect (spot) and classify events in soccer videos. We employ model fusion to combine different modalities such as video and audio, and test these combinations against different state-of-the-art models on the SoccerNet dataset. The results show that a multimodal approach is beneficial. We also analyze how the tolerance for delays in classification and spotting time, and the tolerance for prediction accuracy, influence the results. Our experiments show that using multiple modalities improves event detection performance for certain types of events.
ISSN:2504-4990
2504-4990
DOI:10.3390/make3040051