Detection of artificial and scene text in images and video frames

Textual information in images and video frames constitutes a valuable source of high-level semantics for multimedia indexing and retrieval systems. Text detection is the most crucial step in a multimedia text extraction system and although it has been extensively studied the past decade still, it do...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern analysis and applications : PAA 2013-08, Vol.16 (3), p.431-446
Hauptverfasser:	Anthimopoulos, Marios, Gatos, Basilis, Pratikakis, Ioannis
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer Science Computer science control theory systems Data processing. List processing. Character string processing Exact sciences and technology Industrial and Commercial Application Information retrieval. Graph Information systems. Data bases Memory organisation. Data processing Pattern Recognition Pattern recognition. Digital image processing. Computational geometry Software Theoretical computing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Textual information in images and video frames constitutes a valuable source of high-level semantics for multimedia indexing and retrieval systems. Text detection is the most crucial step in a multimedia text extraction system and although it has been extensively studied the past decade still, it does not exist a generic architecture that would work for artificial and scene text in multimedia content. In this paper we propose a system for text detection of both artificial and scene text in images and video frames. The system is based on a machine learning stage which uses an Random Forest classifier and a highly discriminative feature set produced by using a new texture operator called Multilevel Adaptive Color edge Local Binary Pattern (MACeLBP). MACeLBP describes the spatial distribution of color edges in multiple adaptive levels of contrast. Then, a gradient-based algorithm is applied to achieve distinction among text lines as well as refinement in the localization of the text lines. The whole algorithm is situated in a multiresolution framework to achieve invariance to scale for the detection of text lines. Finally, an optional connected-component step segments text lines into words based on the distances between the resulting components. The experimental results are produced by applying a concise evaluation methodology and prove the superior performance achieved by the proposed text detection system for artificial and scene text in images and video frames.
ISSN:	1433-7541 1433-755X
DOI:	10.1007/s10044-011-0237-7