DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation
Automating the annotation of scanned documents is challenging, requiring a balance between computational efficiency and accuracy. DocParseNet addresses this by combining deep learning and multi-modal learning to process both text and visual data. This model goes beyond traditional OCR and semantic s...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automating the annotation of scanned documents is challenging, requiring a
balance between computational efficiency and accuracy. DocParseNet addresses
this by combining deep learning and multi-modal learning to process both text
and visual data. This model goes beyond traditional OCR and semantic
segmentation, capturing the interplay between text and images to preserve
contextual nuances in complex document structures. Our evaluations show that
DocParseNet significantly outperforms conventional models, achieving mIoU
scores of 49.12 on validation and 49.78 on the test set. This reflects a 58%
accuracy improvement over state-of-the-art baseline models and an 18% gain
compared to the UNext baseline. Remarkably, DocParseNet achieves these results
with only 2.8 million parameters, reducing the model size by approximately 25
times and speeding up training by 5 times compared to other models. These
metrics, coupled with a computational efficiency of 0.039 TFLOPs (BS=1),
highlight DocParseNet's high performance in document annotation. The model's
adaptability and scalability make it well-suited for real-world corporate
document processing applications. The code is available at
https://github.com/ahmad-shirazi/DocParseNet |
---|---|
DOI: | 10.48550/arxiv.2406.17591 |