VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model

With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on intelligent systems and technology 2021-08, Vol.12 (4), p.1-28
Hauptverfasser: Sun, Guodao, Wu, Hao, Zhu, Lin, Xu, Chaoqing, Liang, Haoran, Xu, Binwei, Liang, Ronghua
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 28
container_issue 4
container_start_page 1
container_title ACM transactions on intelligent systems and technology
container_volume 12
creator Sun, Guodao
Wu, Hao
Zhu, Lin
Xu, Chaoqing
Liang, Haoran
Xu, Binwei
Liang, Ronghua
description With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.
doi_str_mv 10.1145/3458928
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3458928</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3458928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</originalsourceid><addsrcrecordid>eNo9UE1LAzEQDaJgqcW_kJun1Xx2E29SvwoVD7a9LrP5KJFtIslW0F9vxOJc3sxj5vHmIXRJyTWlQt5wIZVm6gRNGJVtM9eUnf73RJyjWSnvpJbQTFM1Qevt22G_DeUWL-PoMpgxfDpciQMMeBOty2WEaEPc4Qr4PsAuphIKTr5uWZdwvd9DDt8whhTxS7JuuEBnHobiZkecos3jw3rx3Kxen5aLu1VjGJNjI5Ulc9FS3moFQlsKPTO0p8T1nIqWAAFp6gdGKs4058IYUL5ObS-V94pP0dWfrsmplOx895FDNfPVUdL95tEd8-A_tllRiQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><source>ACM Digital Library Complete</source><creator>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</creator><creatorcontrib>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</creatorcontrib><description>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</description><identifier>ISSN: 2157-6904</identifier><identifier>EISSN: 2157-6912</identifier><identifier>DOI: 10.1145/3458928</identifier><language>eng</language><ispartof>ACM transactions on intelligent systems and technology, 2021-08, Vol.12 (4), p.1-28</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</citedby><cites>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</cites><orcidid>0000-0002-9973-7613</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Sun, Guodao</creatorcontrib><creatorcontrib>Wu, Hao</creatorcontrib><creatorcontrib>Zhu, Lin</creatorcontrib><creatorcontrib>Xu, Chaoqing</creatorcontrib><creatorcontrib>Liang, Haoran</creatorcontrib><creatorcontrib>Xu, Binwei</creatorcontrib><creatorcontrib>Liang, Ronghua</creatorcontrib><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><title>ACM transactions on intelligent systems and technology</title><description>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</description><issn>2157-6904</issn><issn>2157-6912</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9UE1LAzEQDaJgqcW_kJun1Xx2E29SvwoVD7a9LrP5KJFtIslW0F9vxOJc3sxj5vHmIXRJyTWlQt5wIZVm6gRNGJVtM9eUnf73RJyjWSnvpJbQTFM1Qevt22G_DeUWL-PoMpgxfDpciQMMeBOty2WEaEPc4Qr4PsAuphIKTr5uWZdwvd9DDt8whhTxS7JuuEBnHobiZkecos3jw3rx3Kxen5aLu1VjGJNjI5Ulc9FS3moFQlsKPTO0p8T1nIqWAAFp6gdGKs4058IYUL5ObS-V94pP0dWfrsmplOx895FDNfPVUdL95tEd8-A_tllRiQ</recordid><startdate>20210831</startdate><enddate>20210831</enddate><creator>Sun, Guodao</creator><creator>Wu, Hao</creator><creator>Zhu, Lin</creator><creator>Xu, Chaoqing</creator><creator>Liang, Haoran</creator><creator>Xu, Binwei</creator><creator>Liang, Ronghua</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-9973-7613</orcidid></search><sort><creationdate>20210831</creationdate><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><author>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Guodao</creatorcontrib><creatorcontrib>Wu, Hao</creatorcontrib><creatorcontrib>Zhu, Lin</creatorcontrib><creatorcontrib>Xu, Chaoqing</creatorcontrib><creatorcontrib>Liang, Haoran</creatorcontrib><creatorcontrib>Xu, Binwei</creatorcontrib><creatorcontrib>Liang, Ronghua</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on intelligent systems and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Guodao</au><au>Wu, Hao</au><au>Zhu, Lin</au><au>Xu, Chaoqing</au><au>Liang, Haoran</au><au>Xu, Binwei</au><au>Liang, Ronghua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</atitle><jtitle>ACM transactions on intelligent systems and technology</jtitle><date>2021-08-31</date><risdate>2021</risdate><volume>12</volume><issue>4</issue><spage>1</spage><epage>28</epage><pages>1-28</pages><issn>2157-6904</issn><eissn>2157-6912</eissn><abstract>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</abstract><doi>10.1145/3458928</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0002-9973-7613</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2157-6904
ispartof ACM transactions on intelligent systems and technology, 2021-08, Vol.12 (4), p.1-28
issn 2157-6904
2157-6912
language eng
recordid cdi_crossref_primary_10_1145_3458928
source ACM Digital Library Complete
title VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T15%3A14%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=VSumVis:%20Interactive%20Visual%20Understanding%20and%20Diagnosis%20of%20Video%20Summarization%20Model&rft.jtitle=ACM%20transactions%20on%20intelligent%20systems%20and%20technology&rft.au=Sun,%20Guodao&rft.date=2021-08-31&rft.volume=12&rft.issue=4&rft.spage=1&rft.epage=28&rft.pages=1-28&rft.issn=2157-6904&rft.eissn=2157-6912&rft_id=info:doi/10.1145/3458928&rft_dat=%3Ccrossref%3E10_1145_3458928%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true