A Review of Document Binarization: Main Techniques, New Challenges, and Trends

Document image binarization is a challenging task, especially when it comes to text segmentation in degraded document images. The binarization, as a pre-processing step of Optical Character Recognition (OCR), is one of the most fundamental and commonly used segmentation methods. It separates the for...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2024-04, Vol.13 (7), p.1394
Hauptverfasser:	Yang, Zhengxian, Zuo, Shikai, Zhou, Yanxi, He, Jinlong, Shi, Jianwen
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Deep learning Documents Image processing Image segmentation Machine learning Medical imaging equipment Methods Neural networks Optical character recognition Standard deviation Telecommunications equipment industry
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Document image binarization is a challenging task, especially when it comes to text segmentation in degraded document images. The binarization, as a pre-processing step of Optical Character Recognition (OCR), is one of the most fundamental and commonly used segmentation methods. It separates the foreground text from the background of the document image to facilitate subsequent image processing. In view of the different degradation degrees of document images, researchers have proposed a variety of solutions. In this paper, we have summarized some challenges and difficulties in the field of document image binarization. Approximately 60 methods documenting image binarization techniques are mentioned, including traditional algorithms and deep learning-based algorithms. Here, we evaluated the performance of 25 image binarization techniques on the H-DIBCO2016 dataset to provide some help for future research.
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics13071394