A video indexing and retrieval computational prototype based on transcribed speech

Using the voice to interact with systems is attractive in medicine and other areas due to its friendliness and flexibility. Video indexing and retrieval have benefited from this resource. However, few initiatives use speech recognition to support both tasks. This work aims to develop and evaluate a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2021-10, Vol.80 (25), p.33971-34017
Hauptverfasser: Spolaôr, Newton, Lee, Huei Diana, Takaki, Weber Shoity Resende, Ensina, Leandro Augusto, Parmezan, Antonio Rafael Sabino, Oliva, Jefferson Tales, Coy, Claudio Saddy Rodrigues, Wu, Feng Chung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Using the voice to interact with systems is attractive in medicine and other areas due to its friendliness and flexibility. Video indexing and retrieval have benefited from this resource. However, few initiatives use speech recognition to support both tasks. This work aims to develop and evaluate a prototype system to index and retrieve videos from speech transcription. In particular, the user can narrate each video’s content, generating the utterance that is captured, transformed into text and timestamped by the computational system. Simple text processing techniques are then applied to the obtained transcript before indexing. Afterward, the user can also query by speech or text to find relevant videos previously indexed. We conducted an experimental evaluation of the prototype in sets of 50 and 10 public videos. As part of this process, one collaborator manually narrated the 50 videos, while four others narrated a subset of 13 videos. An automatic narration scheme was also applied to this subset and the set of 10 videos. The evaluation showed promising results regarding Brazilian Portuguese speech recognition and retrieval performance. For example, the average word error rate reached down to 0.03 and the mean average precision achieved up to 1.00. Besides performing well, the computational tool is flexible since few changes are required to support other languages.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-021-11401-1