Keyframe extraction from laparoscopic videos based on visual saliency detection
•A novel method for keyframe extraction from laparoscopic videos.•Video shot segmentation using an objectness model.•Frame decomposition into color, motion and texture saliency maps.•Shot segmentation into states of different temporal saliency pattern.•Extensive evaluation based on content and tempo...
Gespeichert in:
Veröffentlicht in: | Computer methods and programs in biomedicine 2018-10, Vol.165, p.13-23 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A novel method for keyframe extraction from laparoscopic videos.•Video shot segmentation using an objectness model.•Frame decomposition into color, motion and texture saliency maps.•Shot segmentation into states of different temporal saliency pattern.•Extensive evaluation based on content and temporal consistency with ground truth.
Background and objective: Laparoscopic surgery offers the potential for video recording of the operation, which is important for technique evaluation, cognitive training, patient briefing and documentation. An effective way for video content representation is to extract a limited number of keyframes with semantic information. In this paper we present a novel method for keyframe extraction from individual shots of the operational video.
Methods: The laparoscopic video was first segmented into video shots using an objectness model, which was trained to capture significant changes in the endoscope field of view. Each frame of a shot was then decomposed into three saliency maps in order to model the preference of human vision to regions with higher differentiation with respect to color, motion and texture. The accumulated responses from each map provided a 3D time series of saliency variation across the shot. The time series was modeled as a multivariate autoregressive process with hidden Markov states (HMMAR model). This approach allowed the temporal segmentation of the shot into a predefined number of states. A representative keyframe was extracted from each state based on the highest state-conditional probability of the corresponding saliency vector.
Results: Our method was tested on 168 video shots extracted from various laparoscopic cholecystectomy operations from the publicly available Cholec80 dataset. Four state-of-the-art methodologies were used for comparison. The evaluation was based on two assessment metrics: Color Consistency Score (CCS), which measures the color distance between the ground truth (GT) and the closest keyframe, and Temporal Consistency Score (TCS), which considers the temporal proximity between GT and extracted keyframes. About 81% of the extracted keyframes matched the color content of the GT keyframes, compared to 77% yielded by the second-best method. The TCS of the proposed and the second-best method was close to 1.9 and 1.4 respectively.
Conclusions: Our results demonstrated that the proposed method yields superior performance in terms of content and temporal consistency to the ground truth. The ex |
---|---|
ISSN: | 0169-2607 1872-7565 |
DOI: | 10.1016/j.cmpb.2018.07.004 |