Iterative multi-path tracking for video and volume segmentation with sparse point supervision

highlights•Ground-truth annotation are necessary to train machine learning models.•We annotate video and volumetric sequences using a single 2D point per frame.•No constraints on appearance, shape, and motion/displacement of object of interest.•Promising results on surgical tool and slitlamp videos,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Medical image analysis 2018-12, Vol.50, p.65-81
Hauptverfasser:	Lejeune, Laurent, Grossrieder, Jan, Sznitman, Raphael
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Artificial intelligence Datasets Image processing Image segmentation Iterative methods Learning algorithms Machine learning Multi-path tracking Optimization Optimization algorithms Path tracking Pixels Point-wise supervision Semantic segmentation Semi-supervised learning Shortest-path problems State of the art Tracking problem
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	highlights•Ground-truth annotation are necessary to train machine learning models.•We annotate video and volumetric sequences using a single 2D point per frame.•No constraints on appearance, shape, and motion/displacement of object of interest.•Promising results on surgical tool and slitlamp videos, brain MRI, CT scans of inner ear. [Display omitted] Recent machine learning strategies for segmentation tasks have shown great ability when trained on large pixel-wise annotated image datasets. It remains a major challenge however to aggregate such datasets, as the time and monetary cost associated with collecting extensive annotations is extremely high. This is particularly the case for generating precise pixel-wise annotations in video and volumetric image data. To this end, this work presents a novel framework to produce pixel-wise segmentations using minimal supervision. Our method relies on 2D point supervision, whereby a single 2D location within an object of interest is provided on each image of the data. Our method then estimates the object appearance in a semi-supervised fashion by learning object-image-specific features and by using these in a semi-supervised learning framework. Our object model is then used in a graph-based optimization problem that takes into account all provided locations and the image data in order to infer the complete pixel-wise segmentation. In practice, we solve this optimally as a tracking problem using a K-shortest path approach. Both the object model and segmentation are then refined iteratively to further improve the final segmentation. We show that by collecting 2D locations using a gaze tracker, our approach can provide state-of-the-art segmentations on a range of objects and image modalities (video and 3D volumes), and that these can then be used to train supervised machine learning classifiers.
ISSN:	1361-8415 1361-8423
DOI:	10.1016/j.media.2018.08.007