VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model
With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and...
Gespeichert in:
Veröffentlicht in: | ACM transactions on intelligent systems and technology 2021-08, Vol.12 (4), p.1-28 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 28 |
---|---|
container_issue | 4 |
container_start_page | 1 |
container_title | ACM transactions on intelligent systems and technology |
container_volume | 12 |
creator | Sun, Guodao Wu, Hao Zhu, Lin Xu, Chaoqing Liang, Haoran Xu, Binwei Liang, Ronghua |
description | With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model. |
doi_str_mv | 10.1145/3458928 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3458928</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3458928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</originalsourceid><addsrcrecordid>eNo9UE1LAzEQDaJgqcW_kJun1Xx2E29SvwoVD7a9LrP5KJFtIslW0F9vxOJc3sxj5vHmIXRJyTWlQt5wIZVm6gRNGJVtM9eUnf73RJyjWSnvpJbQTFM1Qevt22G_DeUWL-PoMpgxfDpciQMMeBOty2WEaEPc4Qr4PsAuphIKTr5uWZdwvd9DDt8whhTxS7JuuEBnHobiZkecos3jw3rx3Kxen5aLu1VjGJNjI5Ulc9FS3moFQlsKPTO0p8T1nIqWAAFp6gdGKs4058IYUL5ObS-V94pP0dWfrsmplOx895FDNfPVUdL95tEd8-A_tllRiQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><source>ACM Digital Library Complete</source><creator>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</creator><creatorcontrib>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</creatorcontrib><description>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</description><identifier>ISSN: 2157-6904</identifier><identifier>EISSN: 2157-6912</identifier><identifier>DOI: 10.1145/3458928</identifier><language>eng</language><ispartof>ACM transactions on intelligent systems and technology, 2021-08, Vol.12 (4), p.1-28</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</citedby><cites>FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</cites><orcidid>0000-0002-9973-7613</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Sun, Guodao</creatorcontrib><creatorcontrib>Wu, Hao</creatorcontrib><creatorcontrib>Zhu, Lin</creatorcontrib><creatorcontrib>Xu, Chaoqing</creatorcontrib><creatorcontrib>Liang, Haoran</creatorcontrib><creatorcontrib>Xu, Binwei</creatorcontrib><creatorcontrib>Liang, Ronghua</creatorcontrib><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><title>ACM transactions on intelligent systems and technology</title><description>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</description><issn>2157-6904</issn><issn>2157-6912</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9UE1LAzEQDaJgqcW_kJun1Xx2E29SvwoVD7a9LrP5KJFtIslW0F9vxOJc3sxj5vHmIXRJyTWlQt5wIZVm6gRNGJVtM9eUnf73RJyjWSnvpJbQTFM1Qevt22G_DeUWL-PoMpgxfDpciQMMeBOty2WEaEPc4Qr4PsAuphIKTr5uWZdwvd9DDt8whhTxS7JuuEBnHobiZkecos3jw3rx3Kxen5aLu1VjGJNjI5Ulc9FS3moFQlsKPTO0p8T1nIqWAAFp6gdGKs4058IYUL5ObS-V94pP0dWfrsmplOx895FDNfPVUdL95tEd8-A_tllRiQ</recordid><startdate>20210831</startdate><enddate>20210831</enddate><creator>Sun, Guodao</creator><creator>Wu, Hao</creator><creator>Zhu, Lin</creator><creator>Xu, Chaoqing</creator><creator>Liang, Haoran</creator><creator>Xu, Binwei</creator><creator>Liang, Ronghua</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-9973-7613</orcidid></search><sort><creationdate>20210831</creationdate><title>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</title><author>Sun, Guodao ; Wu, Hao ; Zhu, Lin ; Xu, Chaoqing ; Liang, Haoran ; Xu, Binwei ; Liang, Ronghua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c225t-58d064713798a49d1ab2c1b10eb31470a0a5c912c58329334cca8fc587b58ff83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Guodao</creatorcontrib><creatorcontrib>Wu, Hao</creatorcontrib><creatorcontrib>Zhu, Lin</creatorcontrib><creatorcontrib>Xu, Chaoqing</creatorcontrib><creatorcontrib>Liang, Haoran</creatorcontrib><creatorcontrib>Xu, Binwei</creatorcontrib><creatorcontrib>Liang, Ronghua</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on intelligent systems and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Guodao</au><au>Wu, Hao</au><au>Zhu, Lin</au><au>Xu, Chaoqing</au><au>Liang, Haoran</au><au>Xu, Binwei</au><au>Liang, Ronghua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model</atitle><jtitle>ACM transactions on intelligent systems and technology</jtitle><date>2021-08-31</date><risdate>2021</risdate><volume>12</volume><issue>4</issue><spage>1</spage><epage>28</epage><pages>1-28</pages><issn>2157-6904</issn><eissn>2157-6912</eissn><abstract>With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.</abstract><doi>10.1145/3458928</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0002-9973-7613</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2157-6904 |
ispartof | ACM transactions on intelligent systems and technology, 2021-08, Vol.12 (4), p.1-28 |
issn | 2157-6904 2157-6912 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3458928 |
source | ACM Digital Library Complete |
title | VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T15%3A14%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=VSumVis:%20Interactive%20Visual%20Understanding%20and%20Diagnosis%20of%20Video%20Summarization%20Model&rft.jtitle=ACM%20transactions%20on%20intelligent%20systems%20and%20technology&rft.au=Sun,%20Guodao&rft.date=2021-08-31&rft.volume=12&rft.issue=4&rft.spage=1&rft.epage=28&rft.pages=1-28&rft.issn=2157-6904&rft.eissn=2157-6912&rft_id=info:doi/10.1145/3458928&rft_dat=%3Ccrossref%3E10_1145_3458928%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |