VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model

With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on intelligent systems and technology 2021-08, Vol.12 (4), p.1-28
Hauptverfasser:	Sun, Guodao, Wu, Hao, Zhu, Lin, Xu, Chaoqing, Liang, Haoran, Xu, Binwei, Liang, Ronghua
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	28
container_issue	4
container_start_page	1
container_title	ACM transactions on intelligent systems and technology
container_volume	12
creator	Sun, Guodao Wu, Hao Zhu, Lin Xu, Chaoqing Liang, Haoran Xu, Binwei Liang, Ronghua
description	With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.
doi_str_mv	10.1145/3458928
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3458928</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3458928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</originalsourceid><addsrcrecordid>eNo9UE1LAzEQDaJgqcW_kJun1Xx2E29SvwoVD7a9LrP5KJFtIslW0F9vxOJc3sxj5vHmIXRJyTWlQt5wIZVm6gRNGJVtM9eUnf73RJyjWSnvpJbQTFM1Qevt22G_DeUWL-PoMpgxfDpciQMMeBOty2WEaEPc4Qr4PsAuphIKTr5uWZdwvd9DDt8whhTxS7JuuEBnHobiZkecos3jw3rx3Kxen5aLu1VjGJNjI5Ulc9FS3moFQlsKPTO0p8T1nIqWAAFp6gdGKs4058IYUL5ObS-V94pP0dWfrsmplOx895FDNfPVUdL95tEd8-A_tllRiQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><source>ACM Digital Library Complete</source><creator>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</creator><creatorcontrib>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</creatorcontrib><description>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</description><identifier>ISSN: 2157-6904</identifier><identifier>EISSN: 2157-6912</identifier><identifier>DOI: 10.1145/3458928</identifier><language>eng</language><ispartof>ACM transactions on intelligent systems and technology, 2021-08, Vol.12 (4), p.1-28</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</citedby><cites>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</cites><orcidid>0000-0002-9973-7613</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Sun, Guodao</creatorcontrib><creatorcontrib>Wu, Hao</creatorcontrib><creatorcontrib>Zhu, Lin</creatorcontrib><creatorcontrib>Xu, Chaoqing</creatorcontrib><creatorcontrib>Liang, Haoran</creatorcontrib><creatorcontrib>Xu, Binwei</creatorcontrib><creatorcontrib>Liang, Ronghua</creatorcontrib><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><title>ACM transactions on intelligent systems and technology</title><description>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</description><issn>2157-6904</issn><issn>2157-6912</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9UE1LAzEQDaJgqcW_kJun1Xx2E29SvwoVD7a9LrP5KJFtIslW0F9vxOJc3sxj5vHmIXRJyTWlQt5wIZVm6gRNGJVtM9eUnf73RJyjWSnvpJbQTFM1Qevt22G_DeUWL-PoMpgxfDpciQMMeBOty2WEaEPc4Qr4PsAuphIKTr5uWZdwvd9DDt8whhTxS7JuuEBnHobiZkecos3jw3rx3Kxen5aLu1VjGJNjI5Ulc9FS3moFQlsKPTO0p8T1nIqWAAFp6gdGKs4058IYUL5ObS-V94pP0dWfrsmplOx895FDNfPVUdL95tEd8-A_tllRiQ</recordid><startdate>20210831</startdate><enddate>20210831</enddate><creator>Sun, Guodao</creator><creator>Wu, Hao</creator><creator>Zhu, Lin</creator><creator>Xu, Chaoqing</creator><creator>Liang, Haoran</creator><creator>Xu, Binwei</creator><creator>Liang, Ronghua</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-9973-7613</orcidid></search><sort><creationdate>20210831</creationdate><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><author>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Guodao</creatorcontrib><creatorcontrib>Wu, Hao</creatorcontrib><creatorcontrib>Zhu, Lin</creatorcontrib><creatorcontrib>Xu, Chaoqing</creatorcontrib><creatorcontrib>Liang, Haoran</creatorcontrib><creatorcontrib>Xu, Binwei</creatorcontrib><creatorcontrib>Liang, Ronghua</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on intelligent systems and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Guodao</au><au>Wu, Hao</au><au>Zhu, Lin</au><au>Xu, Chaoqing</au><au>Liang, Haoran</au><au>Xu, Binwei</au><au>Liang, Ronghua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</atitle><jtitle>ACM transactions on intelligent systems and technology</jtitle><date>2021-08-31</date><risdate>2021</risdate><volume>12</volume><issue>4</issue><spage>1</spage><epage>28</epage><pages>1-28</pages><issn>2157-6904</issn><eissn>2157-6912</eissn><abstract>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</abstract><doi>10.1145/3458928</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0002-9973-7613</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2157-6904
ispartof	ACM transactions on intelligent systems and technology, 2021-08, Vol.12 (4), p.1-28
issn	2157-6904 2157-6912
language	eng
recordid	cdi_crossref_primary_10_1145_3458928
source	ACM Digital Library Complete
title	VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T15%3A14%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=VSumVis:%20Interactive%20Visual%20Understanding%20and%20Diagnosis%20of%20Video%20Summarization%20Model&rft.jtitle=ACM%20transactions%20on%20intelligent%20systems%20and%20technology&rft.au=Sun,%20Guodao&rft.date=2021-08-31&rft.volume=12&rft.issue=4&rft.spage=1&rft.epage=28&rft.pages=1-28&rft.issn=2157-6904&rft.eissn=2157-6912&rft_id=info:doi/10.1145/3458928&rft_dat=%3Ccrossref%3E10_1145_3458928%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true