An automatic quality evaluator for video object segmentation masks
Video object segmentation (VOS) has been a research hot-spot these years. However, evaluating the performance of different VOS methods requires labor-intensive and time-consuming manually labeled mask annotations, making it hard to validate the algorithm quality in field tests. In this paper, we tac...
Gespeichert in:
Veröffentlicht in: | Measurement : journal of the International Measurement Confederation 2022-05, Vol.194, p.111003, Article 111003 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Video object segmentation (VOS) has been a research hot-spot these years. However, evaluating the performance of different VOS methods requires labor-intensive and time-consuming manually labeled mask annotations, making it hard to validate the algorithm quality in field tests. In this paper, we tackle the problem of automatically measuring the mask quality for video object segmentation tasks without accessing manual annotations. We propose that with an elaborately designed network structure, we can extract quality-sensitive features to predict mask quality scores without ground-truth labels. To achieve this, we train an end-to-end convolutional neural network to capture the quality-sensitive features with both spatial reference and temporal reference. In the proposed Video Object Segmentation Evaluation Network, the VOSE-Net, the corresponding video frame and motion amplitude information are used for spatial and temporal references respectively. Instead of directly concatenating features for mask and references, we extract spatial quality cues with feature correlation, which is more rational and effective in this specific task. Taking in the segmented mask, its corresponding frame image and optical flow map, the VOSE-Net can provide an accurate quality estimation without the need for human intervention. To train and verify the proposed network, we construct a new dataset by using the DAVIS video segmentation benchmark and results from many public video object segmentation algorithms. We also demonstrate the robustness and usefulness of the proposed method on several applications, i.e. proposal selection, parameter optimization, arbitrary video mask evaluation. The experimental results and analysis show that the VOSE-Net is fast, effective and of practical use. |
---|---|
ISSN: | 0263-2241 1873-412X |
DOI: | 10.1016/j.measurement.2022.111003 |