Algorithms for Cross-Media Alignment of Equivalent Content (Algoritmen voor cross-mediale alignering van equivalente inhoud)

With the rapid growth of mass communication, the user information need is no longer limited to the traditional textual medium but expanded to image, audio, video media, bioinformation, and so on. The central question is how to index and organize these data collections to satisfy the user's mass...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Pham, Phi The
Format: Dissertation
Sprache:dut
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rapid growth of mass communication, the user information need is no longer limited to the traditional textual medium but expanded to image, audio, video media, bioinformation, and so on. The central question is how to index and organize these data collections to satisfy the user's mass information needs with the challenge that these data are usually depicted with noisy text. This thesis presents the possibilities to align recognized content across different media, including images, videos and their corresponding textual descriptions. Based on the quality of the visual medium and the amount and the relevance of the textual medium, the alignment algorithms range from totally unsupervised to semi-supervised with special intention to reducing as much human annotation as possible. First, we introduce novel unsupervised methods for cross-media alignment of names and faces in image-text pairs where the visual and textual content appear in nearly parallel data objects. Second, we propose new semi-supervised methods for names and faces alignment in news videos with subtitles/transcripts where learning methods face the challenging quality of the visual data, which affects the quality of the low level feature extraction, comparison and categorization, and the limited parallelism between what is shown in the video and what is mentioned in the text. Third, we propose unsupervised machine translation based methods to align names and faces in soap videos where we incorporate the weak supervision of narrative texts that describe the events in the video and that are drafted by fans. Then these unsupervised alignment methods are extended to allow for user feedback. Finally, we develop a novel method for content-based video indexing and retrieval which effectively combines, indexes cross-modally recognized concepts and provides them to users.