Content-based Video Summarization in Object Maps

Projecte realitzat en el marc d’un programa de mobilitat amb la Technische Universität Wien (TU Wien) [ANGLÈS] The amount of digital video content available in the web is constantly increasing. Its handling requires efficient technologies: text search on large databases provides users a great amount...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Martos Asensio, Manuel
Format:	Dissertation
Sprache:	eng
Schlagworte:	Computer vision content-based description Enginyeria de la telecomunicació face recognition object detection object maps Processament de la imatge i del senyal vídeo Processament del senyal Visió per ordinador Àrees temàtiques de la UPC
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Projecte realitzat en el marc d’un programa de mobilitat amb la Technische Universität Wien (TU Wien) [ANGLÈS] The amount of digital video content available in the web is constantly increasing. Its handling requires efficient technologies: text search on large databases provides users a great amount of videos; the content results are accessible by a description. Users need a fast and visual way to access relevant video content effectively. Quick visualisation of content using static image summarisation is a sophisticated problem. However, it is worth it because it may solve video navigation problems. Users can very rapidly get an idea of the video with no need to browse through it with a sliding bar as normally done. In this work a system for automatic video summarisation is developed. It creates an object map the segments of which are extracted from an input video. It allows enhancing video browsing and large video databases management generating a visual index so that the user can rapidly grasp the most relevant content. Finally, accessing them with a simple action requires several technologies that define a complex information processing. Firstly, shot boundary detection algorithms are required to reduce time redundancy of the video. Secondly, different relevant objects are extracted from each keyframe (faces, cars, etc.). We also describe a workflow to train detection models using multiple open source solutions. Furthermore, faces are a particular and very relevant semantic class. For this reason, we use clustering methods in order to recognise them in an unsupervised recognition process. The image composition of all selected objects and faces is the final stage of the architecture. Composition is defined as the combination of distinct parts to form a whole, therefore, objects have to be rendered in the map in a visually attractive manner. To validate our approach and assess end-user satisfaction, we conducted a user study in which we compare requirements collected by analysing related literature. We analyse redundancy and informativeness as well as pleasantness. The results show that our approach effectively creates an image representation for videos and is able to summarise customisable content in an attractive way. [CASTELLÀ] La cantidad de contenido de vídeo digital disponible en la web está incrementando constantemente. Su manipulación requiere de tecnologías eficientes: las búsquedas textuales sobre grandes bases de datos dan acceso a una gran ca