Test-Time Adaptation for Visual Document Understanding
For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adap...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | For visual document understanding (VDU), self-supervised pretraining has been
shown to successfully generate transferable representations, yet, effective
adaptation of such representations to distribution shifts at test-time remains
to be an unexplored area. We propose DocTTA, a novel test-time adaptation
method for documents, that does source-free domain adaptation using unlabeled
target document data. DocTTA leverages cross-modality self-supervised learning
via masked visual language modeling, as well as pseudo labeling to adapt models
learned on a \textit{source} domain to an unlabeled \textit{target} domain at
test time. We introduce new benchmarks using existing public datasets for
various VDU tasks, including entity recognition, key-value extraction, and
document visual question answering. DocTTA shows significant improvements on
these compared to the source model performance, up to 1.89\% in (F1 score),
3.43\% (F1 score), and 17.68\% (ANLS score), respectively. Our benchmark
datasets are available at \url{https://saynaebrahimi.github.io/DocTTA.html}. |
---|---|
DOI: | 10.48550/arxiv.2206.07240 |