OCR-Diff: A Two-Stage Deep Learning Framework for Optical Character Recognition Using Diffusion Model in Industrial Internet of Things
Optical character recognition (OCR) is one of the key enabling technologies in industrial Internet of Things (IIoT) for extracting and utilizing useful textual information, but it is technically challenging due to poor environmental conditions. To deal with such challenges, in this letter, we propos...
Gespeichert in:
Veröffentlicht in: | IEEE internet of things journal 2024-08, Vol.11 (15), p.25997-26000 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Optical character recognition (OCR) is one of the key enabling technologies in industrial Internet of Things (IIoT) for extracting and utilizing useful textual information, but it is technically challenging due to poor environmental conditions. To deal with such challenges, in this letter, we propose a novel two-stage deep learning framework for OCR using a generative diffusion model, namely, OCR-Diff. In the first stage, our customized conditional U-Net is pretrained jointly with a feature extractor with the aid of the forward diffusion process such that the quality of a low-resolution text image is improved via the reverse diffusion process. In the next stage, the pretrained conditional U-Net and feature extractor are jointly fine tuned for an off-the-shelf text recognizer to precisely recognize the texts in the image. Experimental results on TextZoom data sets substantiate the superiority and effectiveness of the proposed scheme. |
---|---|
ISSN: | 2327-4662 2327-4662 |
DOI: | 10.1109/JIOT.2024.3390700 |