Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS

This paper presents findings of the eleventh Video Browser Showdown competition, where sixteen teams competed in known-item and ad-hoc search tasks. Many of the teams utilized state-of-the-art video retrieval approaches that demonstrated high effectiveness in challenging search scenarios. In this pa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia systems 2023-12, Vol.29 (6), p.3481-3504
Hauptverfasser:	Lokoč, Jakub, Andreadis, Stelios, Bailer, Werner, Duane, Aaron, Gurrin, Cathal, Ma, Zhixin, Messina, Nicola, Nguyen, Thao-Nhu, Peška, Ladislav, Rossetto, Luca, Sauter, Loris, Schall, Konstantin, Schoeffmann, Klaus, Khan, Omar Shahbaz, Spiess, Florian, Vadicamo, Lucia, Vrochidis, Stefanos
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Multimedia Information Systems Operating Systems Regular Paper Retrieval Searching Teams
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3504
container_issue	6
container_start_page	3481
container_title	Multimedia systems
container_volume	29
creator	Lokoč, Jakub Andreadis, Stelios Bailer, Werner Duane, Aaron Gurrin, Cathal Ma, Zhixin Messina, Nicola Nguyen, Thao-Nhu Peška, Ladislav Rossetto, Luca Sauter, Loris Schall, Konstantin Schoeffmann, Klaus Khan, Omar Shahbaz Spiess, Florian Vadicamo, Lucia Vrochidis, Stefanos
description	This paper presents findings of the eleventh Video Browser Showdown competition, where sixteen teams competed in known-item and ad-hoc search tasks. Many of the teams utilized state-of-the-art video retrieval approaches that demonstrated high effectiveness in challenging search scenarios. In this paper, a broad survey of all utilized approaches is presented in connection with an analysis of the performance of participating teams. Specifically, both high-level performance indicators are presented with overall statistics as well as in-depth analysis of the performance of selected tools implementing result set logging. The analysis reveals evidence that the CLIP model represents a versatile tool for cross-modal video retrieval when combined with interactive search capabilities. Furthermore, the analysis investigates the effect of different users and text query properties on the performance in search tasks. Last but not least, lessons learned from search task preparation are presented, and a new direction for ad-hoc search based tasks at Video Browser Showdown is introduced.
doi_str_mv	10.1007/s00530-023-01143-5
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2890829537</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2890829537</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-b4c0d506487819901c670916ae796aa1f572d6c9edd651d2949f9a8ad48e5ffe3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwA6wssTaM7diJ2UHFo1IlFjy2lhtP2lRJXOy0En9PaJDYsZrNOfeOLiGXHK45QH6TAJQEBkIy4DyTTB2RyXAF40UhjskETCZYZrQ4JWcpbQB4riVMyGbe9Rhd2dd7pPvaY6AR-1jj3jW07mi_RupWSENFsapw5Dah7nqK7RK9r7sV9Yhb2gaPTbqlDaYUukSrGNqDznm_ph_3r-fkpHJNwovfOyXvjw9vs2e2eHmaz-4WrJRa9myZleAV6KzIC24M8FLnYLh2mBvtHK9ULrwuzdCtFffCZKYyrnA-K1ANL8opuRpztzF87jD1dhN2sRsqrSgMFMIomQ-UGKkyhpQiVnYb69bFL8vB_mxqx03tsKk9bGrVIMlRSgPcrTD-Rf9jfQPqMHlD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2890829537</pqid></control><display><type>article</type><title>Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS</title><source>SpringerNature Journals</source><creator>Lokoč, Jakub ; Andreadis, Stelios ; Bailer, Werner ; Duane, Aaron ; Gurrin, Cathal ; Ma, Zhixin ; Messina, Nicola ; Nguyen, Thao-Nhu ; Peška, Ladislav ; Rossetto, Luca ; Sauter, Loris ; Schall, Konstantin ; Schoeffmann, Klaus ; Khan, Omar Shahbaz ; Spiess, Florian ; Vadicamo, Lucia ; Vrochidis, Stefanos</creator><creatorcontrib>Lokoč, Jakub ; Andreadis, Stelios ; Bailer, Werner ; Duane, Aaron ; Gurrin, Cathal ; Ma, Zhixin ; Messina, Nicola ; Nguyen, Thao-Nhu ; Peška, Ladislav ; Rossetto, Luca ; Sauter, Loris ; Schall, Konstantin ; Schoeffmann, Klaus ; Khan, Omar Shahbaz ; Spiess, Florian ; Vadicamo, Lucia ; Vrochidis, Stefanos</creatorcontrib><description>This paper presents findings of the eleventh Video Browser Showdown competition, where sixteen teams competed in known-item and ad-hoc search tasks. Many of the teams utilized state-of-the-art video retrieval approaches that demonstrated high effectiveness in challenging search scenarios. In this paper, a broad survey of all utilized approaches is presented in connection with an analysis of the performance of participating teams. Specifically, both high-level performance indicators are presented with overall statistics as well as in-depth analysis of the performance of selected tools implementing result set logging. The analysis reveals evidence that the CLIP model represents a versatile tool for cross-modal video retrieval when combined with interactive search capabilities. Furthermore, the analysis investigates the effect of different users and text query properties on the performance in search tasks. Last but not least, lessons learned from search task preparation are presented, and a new direction for ad-hoc search based tasks at Video Browser Showdown is introduced.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-023-01143-5</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Multimedia Information Systems ; Operating Systems ; Regular Paper ; Retrieval ; Searching ; Teams</subject><ispartof>Multimedia systems, 2023-12, Vol.29 (6), p.3481-3504</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-b4c0d506487819901c670916ae796aa1f572d6c9edd651d2949f9a8ad48e5ffe3</citedby><cites>FETCH-LOGICAL-c363t-b4c0d506487819901c670916ae796aa1f572d6c9edd651d2949f9a8ad48e5ffe3</cites><orcidid>0000-0002-9218-1704 ; 0000-0002-3396-1516 ; 0000-0002-2505-9178 ; 0000-0003-2442-4900 ; 0000-0002-5389-9465 ; 0000-0001-9720-3645 ; 0000-0003-3548-0537 ; 0000-0003-2903-3968 ; 0000-0001-7182-7038 ; 0000-0002-9127-4175 ; 0000-0002-5519-1962 ; 0000-0001-8046-0362 ; 0000-0002-9825-1654 ; 0000-0003-3011-2487 ; 0000-0001-8082-4509 ; 0000-0003-1356-9434 ; 0000-0002-3558-4144</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-023-01143-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-023-01143-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Lokoč, Jakub</creatorcontrib><creatorcontrib>Andreadis, Stelios</creatorcontrib><creatorcontrib>Bailer, Werner</creatorcontrib><creatorcontrib>Duane, Aaron</creatorcontrib><creatorcontrib>Gurrin, Cathal</creatorcontrib><creatorcontrib>Ma, Zhixin</creatorcontrib><creatorcontrib>Messina, Nicola</creatorcontrib><creatorcontrib>Nguyen, Thao-Nhu</creatorcontrib><creatorcontrib>Peška, Ladislav</creatorcontrib><creatorcontrib>Rossetto, Luca</creatorcontrib><creatorcontrib>Sauter, Loris</creatorcontrib><creatorcontrib>Schall, Konstantin</creatorcontrib><creatorcontrib>Schoeffmann, Klaus</creatorcontrib><creatorcontrib>Khan, Omar Shahbaz</creatorcontrib><creatorcontrib>Spiess, Florian</creatorcontrib><creatorcontrib>Vadicamo, Lucia</creatorcontrib><creatorcontrib>Vrochidis, Stefanos</creatorcontrib><title>Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>This paper presents findings of the eleventh Video Browser Showdown competition, where sixteen teams competed in known-item and ad-hoc search tasks. Many of the teams utilized state-of-the-art video retrieval approaches that demonstrated high effectiveness in challenging search scenarios. In this paper, a broad survey of all utilized approaches is presented in connection with an analysis of the performance of participating teams. Specifically, both high-level performance indicators are presented with overall statistics as well as in-depth analysis of the performance of selected tools implementing result set logging. The analysis reveals evidence that the CLIP model represents a versatile tool for cross-modal video retrieval when combined with interactive search capabilities. Furthermore, the analysis investigates the effect of different users and text query properties on the performance in search tasks. Last but not least, lessons learned from search task preparation are presented, and a new direction for ad-hoc search based tasks at Video Browser Showdown is introduced.</description><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Multimedia Information Systems</subject><subject>Operating Systems</subject><subject>Regular Paper</subject><subject>Retrieval</subject><subject>Searching</subject><subject>Teams</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqXwA6wssTaM7diJ2UHFo1IlFjy2lhtP2lRJXOy0En9PaJDYsZrNOfeOLiGXHK45QH6TAJQEBkIy4DyTTB2RyXAF40UhjskETCZYZrQ4JWcpbQB4riVMyGbe9Rhd2dd7pPvaY6AR-1jj3jW07mi_RupWSENFsapw5Dah7nqK7RK9r7sV9Yhb2gaPTbqlDaYUukSrGNqDznm_ph_3r-fkpHJNwovfOyXvjw9vs2e2eHmaz-4WrJRa9myZleAV6KzIC24M8FLnYLh2mBvtHK9ULrwuzdCtFffCZKYyrnA-K1ANL8opuRpztzF87jD1dhN2sRsqrSgMFMIomQ-UGKkyhpQiVnYb69bFL8vB_mxqx03tsKk9bGrVIMlRSgPcrTD-Rf9jfQPqMHlD</recordid><startdate>20231201</startdate><enddate>20231201</enddate><creator>Lokoč, Jakub</creator><creator>Andreadis, Stelios</creator><creator>Bailer, Werner</creator><creator>Duane, Aaron</creator><creator>Gurrin, Cathal</creator><creator>Ma, Zhixin</creator><creator>Messina, Nicola</creator><creator>Nguyen, Thao-Nhu</creator><creator>Peška, Ladislav</creator><creator>Rossetto, Luca</creator><creator>Sauter, Loris</creator><creator>Schall, Konstantin</creator><creator>Schoeffmann, Klaus</creator><creator>Khan, Omar Shahbaz</creator><creator>Spiess, Florian</creator><creator>Vadicamo, Lucia</creator><creator>Vrochidis, Stefanos</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-9218-1704</orcidid><orcidid>https://orcid.org/0000-0002-3396-1516</orcidid><orcidid>https://orcid.org/0000-0002-2505-9178</orcidid><orcidid>https://orcid.org/0000-0003-2442-4900</orcidid><orcidid>https://orcid.org/0000-0002-5389-9465</orcidid><orcidid>https://orcid.org/0000-0001-9720-3645</orcidid><orcidid>https://orcid.org/0000-0003-3548-0537</orcidid><orcidid>https://orcid.org/0000-0003-2903-3968</orcidid><orcidid>https://orcid.org/0000-0001-7182-7038</orcidid><orcidid>https://orcid.org/0000-0002-9127-4175</orcidid><orcidid>https://orcid.org/0000-0002-5519-1962</orcidid><orcidid>https://orcid.org/0000-0001-8046-0362</orcidid><orcidid>https://orcid.org/0000-0002-9825-1654</orcidid><orcidid>https://orcid.org/0000-0003-3011-2487</orcidid><orcidid>https://orcid.org/0000-0001-8082-4509</orcidid><orcidid>https://orcid.org/0000-0003-1356-9434</orcidid><orcidid>https://orcid.org/0000-0002-3558-4144</orcidid></search><sort><creationdate>20231201</creationdate><title>Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS</title><author>Lokoč, Jakub ; Andreadis, Stelios ; Bailer, Werner ; Duane, Aaron ; Gurrin, Cathal ; Ma, Zhixin ; Messina, Nicola ; Nguyen, Thao-Nhu ; Peška, Ladislav ; Rossetto, Luca ; Sauter, Loris ; Schall, Konstantin ; Schoeffmann, Klaus ; Khan, Omar Shahbaz ; Spiess, Florian ; Vadicamo, Lucia ; Vrochidis, Stefanos</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-b4c0d506487819901c670916ae796aa1f572d6c9edd651d2949f9a8ad48e5ffe3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Multimedia Information Systems</topic><topic>Operating Systems</topic><topic>Regular Paper</topic><topic>Retrieval</topic><topic>Searching</topic><topic>Teams</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lokoč, Jakub</creatorcontrib><creatorcontrib>Andreadis, Stelios</creatorcontrib><creatorcontrib>Bailer, Werner</creatorcontrib><creatorcontrib>Duane, Aaron</creatorcontrib><creatorcontrib>Gurrin, Cathal</creatorcontrib><creatorcontrib>Ma, Zhixin</creatorcontrib><creatorcontrib>Messina, Nicola</creatorcontrib><creatorcontrib>Nguyen, Thao-Nhu</creatorcontrib><creatorcontrib>Peška, Ladislav</creatorcontrib><creatorcontrib>Rossetto, Luca</creatorcontrib><creatorcontrib>Sauter, Loris</creatorcontrib><creatorcontrib>Schall, Konstantin</creatorcontrib><creatorcontrib>Schoeffmann, Klaus</creatorcontrib><creatorcontrib>Khan, Omar Shahbaz</creatorcontrib><creatorcontrib>Spiess, Florian</creatorcontrib><creatorcontrib>Vadicamo, Lucia</creatorcontrib><creatorcontrib>Vrochidis, Stefanos</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lokoč, Jakub</au><au>Andreadis, Stelios</au><au>Bailer, Werner</au><au>Duane, Aaron</au><au>Gurrin, Cathal</au><au>Ma, Zhixin</au><au>Messina, Nicola</au><au>Nguyen, Thao-Nhu</au><au>Peška, Ladislav</au><au>Rossetto, Luca</au><au>Sauter, Loris</au><au>Schall, Konstantin</au><au>Schoeffmann, Klaus</au><au>Khan, Omar Shahbaz</au><au>Spiess, Florian</au><au>Vadicamo, Lucia</au><au>Vrochidis, Stefanos</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2023-12-01</date><risdate>2023</risdate><volume>29</volume><issue>6</issue><spage>3481</spage><epage>3504</epage><pages>3481-3504</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>This paper presents findings of the eleventh Video Browser Showdown competition, where sixteen teams competed in known-item and ad-hoc search tasks. Many of the teams utilized state-of-the-art video retrieval approaches that demonstrated high effectiveness in challenging search scenarios. In this paper, a broad survey of all utilized approaches is presented in connection with an analysis of the performance of participating teams. Specifically, both high-level performance indicators are presented with overall statistics as well as in-depth analysis of the performance of selected tools implementing result set logging. The analysis reveals evidence that the CLIP model represents a versatile tool for cross-modal video retrieval when combined with interactive search capabilities. Furthermore, the analysis investigates the effect of different users and text query properties on the performance in search tasks. Last but not least, lessons learned from search task preparation are presented, and a new direction for ad-hoc search based tasks at Video Browser Showdown is introduced.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-023-01143-5</doi><tpages>24</tpages><orcidid>https://orcid.org/0000-0002-9218-1704</orcidid><orcidid>https://orcid.org/0000-0002-3396-1516</orcidid><orcidid>https://orcid.org/0000-0002-2505-9178</orcidid><orcidid>https://orcid.org/0000-0003-2442-4900</orcidid><orcidid>https://orcid.org/0000-0002-5389-9465</orcidid><orcidid>https://orcid.org/0000-0001-9720-3645</orcidid><orcidid>https://orcid.org/0000-0003-3548-0537</orcidid><orcidid>https://orcid.org/0000-0003-2903-3968</orcidid><orcidid>https://orcid.org/0000-0001-7182-7038</orcidid><orcidid>https://orcid.org/0000-0002-9127-4175</orcidid><orcidid>https://orcid.org/0000-0002-5519-1962</orcidid><orcidid>https://orcid.org/0000-0001-8046-0362</orcidid><orcidid>https://orcid.org/0000-0002-9825-1654</orcidid><orcidid>https://orcid.org/0000-0003-3011-2487</orcidid><orcidid>https://orcid.org/0000-0001-8082-4509</orcidid><orcidid>https://orcid.org/0000-0003-1356-9434</orcidid><orcidid>https://orcid.org/0000-0002-3558-4144</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0942-4962
ispartof	Multimedia systems, 2023-12, Vol.29 (6), p.3481-3504
issn	0942-4962 1432-1882
language	eng
recordid	cdi_proquest_journals_2890829537
source	SpringerNature Journals
subjects	Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Multimedia Information Systems Operating Systems Regular Paper Retrieval Searching Teams
title	Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T14%3A10%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interactive%20video%20retrieval%20in%20the%20age%20of%20effective%20joint%20embedding%20deep%20models:%20lessons%20from%20the%2011th%20VBS&rft.jtitle=Multimedia%20systems&rft.au=Loko%C4%8D,%20Jakub&rft.date=2023-12-01&rft.volume=29&rft.issue=6&rft.spage=3481&rft.epage=3504&rft.pages=3481-3504&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-023-01143-5&rft_dat=%3Cproquest_cross%3E2890829537%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2890829537&rft_id=info:pmid/&rfr_iscdi=true