An efficient weakly semi-supervised method for object automated annotation

Object annotation is essential for computer vision tasks, and more high-quality annotated data can effectively improve the performance of vision models. However, manual annotation is time-consuming (annotating a box takes 35s). Recent studies have explored faster automated annotation, among which we...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2024, Vol.83 (3), p.9417-9440
Hauptverfasser:	Wang, Xingzheng, Wei, Guoyao, Chen, Songwei, Liu, Jiehao
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Automation Boxes Computer Communication Networks Computer Science Computer vision Data Structures and Information Theory Labels Multimedia Information Systems Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Object annotation is essential for computer vision tasks, and more high-quality annotated data can effectively improve the performance of vision models. However, manual annotation is time-consuming (annotating a box takes 35s). Recent studies have explored faster automated annotation, among which weakly supervised methods stand out. Weakly supervised methods learn to automatically localize objects in images from weakly labeled annotations, e.g., class tags or points, replacing manual bounding box annotations. Although using a single weakly labeled annotation can reduce a large amount of time, it leads to poor annotation quality, particularly for the complex scenes containing multiple objects. To balance annotation time and quality, we propose a weakly semi-supervised automated annotation method. Its main idea is to incorporate point-labeled and fully labeled annotations into a teacher-student framework for training, to jointly localize the object bounding boxes on all point-labeled images. We also propose two effective techniques within this framework to better use of these mixed annotations. The first is a point-guided sample assignment technique which optimizes the loss calculation. The second is a pseudo-label filtering technique which generate accurate pseudo labels for model training by utilizing the points and boxes localization confidences. Extensive experiments on MSCOCO demonstrate that our method outperforms existing automated annotation methods. In particular, when using 95% point-labeled and 5% fully labeled data, our approach reduces the annotation time by approximately 52% and achieves an annotation quality of 87.4% mIoU.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-023-15305-0