Selfie Segmentation in Video Using N-Frames Ensemble

Many camera apps and online video conference solutions support instant selfie segmentation or virtual background function for entertainment, aesthetic, privacy, and security reasons. A good number of studies show that Deep-Learning based segmentation model (DSM) is a reasonable choice for selfie seg...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.163348-163362
Hauptverfasser:	Kim, Yong-Woon, Byun, Yung-Cheol, Krishna, Addapalli V. N., Krishnan, Balachandran
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational modeling Computer architecture Computing time Deep learning Efficiency ensemble Feature extraction Frames (data processing) Image segmentation multi-frames neural network Performance evaluation Power consumption Power efficiency Real-time systems selfie Semantics soft voting Streaming media video Videoconferencing Websites
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many camera apps and online video conference solutions support instant selfie segmentation or virtual background function for entertainment, aesthetic, privacy, and security reasons. A good number of studies show that Deep-Learning based segmentation model (DSM) is a reasonable choice for selfie segmentation, and the ensemble of multiple DSMs can improve the precision of the segmentation result. However, it is not fit well when we apply these approaches directly to the image segmentation in a video. This paper proposes an N-Frames (NF) ensemble approach for a selfie segmentation in a video using an ensemble of multiple DSMs to achieve a high-performance automatic segmentation. Unlike the N-Models (NM) ensemble which executes multiple DSMs at once for every single video frame, the proposed NF ensemble executes only one DSM upon a current video frame and combines segmentation results of previous frames to produce the final result. For the experiment, we use four state-of-the-art image segmentation models to make an ensemble. We evaluated the proposed approach using 81 videos dataset with a single-person view collected from publicly available websites. To measure the performance of segmentation models, Intersection over Union (IoU), IoU standard deviation, false prediction rate, Memory Efficiency Rate and Computing power Efficiency Rate parameters were considered. The average IoU values of the Two-Models NM ensemble, Two-Frames NF ensemble, Three-Models NM ensemble and Three-Frames NF ensemble were 95.1868%, 95.1253%, 95.3667% and 95.1734% each, whereas the average IoU value of single models was 92.9653%. The result shows that the proposed NF ensemble approach improves the accuracy of selfie segmentation by more than 2% on average. The result of cost efficiency measurement shows that the proposed method consumes less computing power like single models.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3133276