OCR-Diff: A Two-Stage Deep Learning Framework for Optical Character Recognition Using Diffusion Model in Industrial Internet of Things

Optical character recognition (OCR) is one of the key enabling technologies in industrial Internet of Things (IIoT) for extracting and utilizing useful textual information, but it is technically challenging due to poor environmental conditions. To deal with such challenges, in this letter, we propos...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE internet of things journal 2024-08, Vol.11 (15), p.25997-26000
Hauptverfasser:	Park, Chae-Won, Palakonda, Vikas, Yun, Sangseok, Kim, Il-Min, Kang, Jae-Mo
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep learning Deep learning (DL) Diffusion models Diffusion processes Feature extraction Feature recognition generative diffusion model Image quality Image recognition Image resolution Industrial applications Industrial Internet of Things industrial Internet of Things (IIoT) Internet of Things low resolution text image Optical character recognition optical character recognition (OCR) Text recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Optical character recognition (OCR) is one of the key enabling technologies in industrial Internet of Things (IIoT) for extracting and utilizing useful textual information, but it is technically challenging due to poor environmental conditions. To deal with such challenges, in this letter, we propose a novel two-stage deep learning framework for OCR using a generative diffusion model, namely, OCR-Diff. In the first stage, our customized conditional U-Net is pretrained jointly with a feature extractor with the aid of the forward diffusion process such that the quality of a low-resolution text image is improved via the reverse diffusion process. In the next stage, the pretrained conditional U-Net and feature extractor are jointly fine tuned for an off-the-shelf text recognizer to precisely recognize the texts in the image. Experimental results on TextZoom data sets substantiate the superiority and effectiveness of the proposed scheme.
ISSN:	2327-4662 2327-4662
DOI:	10.1109/JIOT.2024.3390700