Modal Keywords, Ontologies, and Reasoning for Video Understanding

We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from aut...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jaimes, Alejandro, Tseng, Belle L., Smith, John R.
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Automatic Speech Recognition Computer science control theory systems Confidence Score Exact sciences and technology Pattern recognition. Digital image processing. Computational geometry Reasoning Engine Related Text Video Shot
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	259
container_issue
container_start_page	248
container_title
container_volume	2728
creator	Jaimes, Alejandro Tseng, Belle L. Smith, John R.
description	We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from automatic speech recognition, related text, etc.). We introduce the idea of modal keywords, which are keywords that represent perceptual concepts in the following categories: visual (e.g., sky), aural (e.g., scream), olfactory (e.g., vanilla), tactile (e.g., feather), and taste (e.g., candy). A method is presented to automatically classify keywords from speech recognition, queries, or related text into these categories using WordNet and TGM I. For video understanding, the following operations are performed automatically: scene cut detection, automatic speech recognition, feature extraction, and visual detection (e.g., sky, face, indoor). These operation results are used in our system by a rule-based engine that uses context information (e.g., text from speech) to enhance visual detection results. We discuss semi-automatic construction of multimedia ontologies and present experiments in which visual detector outputs are modified by simple rules that use context information available with the video.
doi_str_mv	10.1007/3-540-45113-7_25
format	Book Chapter
fullrecord	<record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_15567517</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC3073054_31_262</sourcerecordid><originalsourceid>FETCH-LOGICAL-p268t-fdcadb54ba34a2df61d3d756e128e325332542361cf7ea473a546762d33794443</originalsourceid><addsrcrecordid>eNotUMtOwzAQNE8RSu8cc-GGi-31IzmiipcoqoQoV8uNnRIIcbCDUP8e97HSalczOyPtIHRJyYQSom4AC04wF5QCVpqJAzQuVQEJ3GLyEGVUUooBeHmEzrcEkcD5McoIEIZLxeEUZaUohCSEyTM0jvGTpAIGJeEZun3x1rT5s1v_-WDjdT7vBt_6VePSbjqbvzoTfdd0q7z2IX9vrPP5orMuxCHRCb9AJ7Vpoxvv5wgt7u_epo94Nn94mt7OcM9kMeDaVsYuBV8a4IbZWlILVgnpKCscMAGpOQNJq1o5wxUYwaWSzAKoknMOI3S18-1NrExbB9NVTdR9aL5NWGsqhFSCqnQ32d3FRHUrF_TS-6-oKdGbUDXoFJPeBqg3oSYB7I2D__l1cdBuo6hcNwTTVh-mH9K3GogCIrgGqplk8A-0xnKq</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC3073054_31_262</pqid></control><display><type>book_chapter</type><title>Modal Keywords, Ontologies, and Reasoning for Video Understanding</title><source>Springer Books</source><creator>Jaimes, Alejandro ; Tseng, Belle L. ; Smith, John R.</creator><contributor>Sebe, Nicu ; Zhou, Xiang S ; Lew, Michael S ; Bakker, Erwin M ; Huang, Thomas S ; Bakker, Erwin M. ; Lew, Michael S. ; Sebe, Nicu ; Zhou, Xiang Sean ; Huang, Thomas S.</contributor><creatorcontrib>Jaimes, Alejandro ; Tseng, Belle L. ; Smith, John R. ; Sebe, Nicu ; Zhou, Xiang S ; Lew, Michael S ; Bakker, Erwin M ; Huang, Thomas S ; Bakker, Erwin M. ; Lew, Michael S. ; Sebe, Nicu ; Zhou, Xiang Sean ; Huang, Thomas S.</creatorcontrib><description>We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from automatic speech recognition, related text, etc.). We introduce the idea of modal keywords, which are keywords that represent perceptual concepts in the following categories: visual (e.g., sky), aural (e.g., scream), olfactory (e.g., vanilla), tactile (e.g., feather), and taste (e.g., candy). A method is presented to automatically classify keywords from speech recognition, queries, or related text into these categories using WordNet and TGM I. For video understanding, the following operations are performed automatically: scene cut detection, automatic speech recognition, feature extraction, and visual detection (e.g., sky, face, indoor). These operation results are used in our system by a rule-based engine that uses context information (e.g., text from speech) to enhance visual detection results. We discuss semi-automatic construction of multimedia ontologies and present experiments in which visual detector outputs are modified by simple rules that use context information available with the video.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 3540406344</identifier><identifier>ISBN: 9783540406341</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540451136</identifier><identifier>EISBN: 3540451137</identifier><identifier>DOI: 10.1007/3-540-45113-7_25</identifier><identifier>OCLC: 958560026</identifier><identifier>LCCallNum: QA75.5-76.95</identifier><language>eng</language><publisher>Germany: Springer Berlin / Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; Automatic Speech Recognition ; Computer science; control theory; systems ; Confidence Score ; Exact sciences and technology ; Pattern recognition. Digital image processing. Computational geometry ; Reasoning Engine ; Related Text ; Video Shot</subject><ispartof>Image and Video Retrieval, 2003, Vol.2728, p.248-259</ispartof><rights>Springer-Verlag Berlin Heidelberg 2003</rights><rights>2004 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/3073054-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/3-540-45113-7_25$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/3-540-45113-7_25$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4050,4051,27925,38255,41442,42511</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=15567517$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Sebe, Nicu</contributor><contributor>Zhou, Xiang S</contributor><contributor>Lew, Michael S</contributor><contributor>Bakker, Erwin M</contributor><contributor>Huang, Thomas S</contributor><contributor>Bakker, Erwin M.</contributor><contributor>Lew, Michael S.</contributor><contributor>Sebe, Nicu</contributor><contributor>Zhou, Xiang Sean</contributor><contributor>Huang, Thomas S.</contributor><creatorcontrib>Jaimes, Alejandro</creatorcontrib><creatorcontrib>Tseng, Belle L.</creatorcontrib><creatorcontrib>Smith, John R.</creatorcontrib><title>Modal Keywords, Ontologies, and Reasoning for Video Understanding</title><title>Image and Video Retrieval</title><description>We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from automatic speech recognition, related text, etc.). We introduce the idea of modal keywords, which are keywords that represent perceptual concepts in the following categories: visual (e.g., sky), aural (e.g., scream), olfactory (e.g., vanilla), tactile (e.g., feather), and taste (e.g., candy). A method is presented to automatically classify keywords from speech recognition, queries, or related text into these categories using WordNet and TGM I. For video understanding, the following operations are performed automatically: scene cut detection, automatic speech recognition, feature extraction, and visual detection (e.g., sky, face, indoor). These operation results are used in our system by a rule-based engine that uses context information (e.g., text from speech) to enhance visual detection results. We discuss semi-automatic construction of multimedia ontologies and present experiments in which visual detector outputs are modified by simple rules that use context information available with the video.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Automatic Speech Recognition</subject><subject>Computer science; control theory; systems</subject><subject>Confidence Score</subject><subject>Exact sciences and technology</subject><subject>Pattern recognition. Digital image processing. Computational geometry</subject><subject>Reasoning Engine</subject><subject>Related Text</subject><subject>Video Shot</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>3540406344</isbn><isbn>9783540406341</isbn><isbn>9783540451136</isbn><isbn>3540451137</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2003</creationdate><recordtype>book_chapter</recordtype><recordid>eNotUMtOwzAQNE8RSu8cc-GGi-31IzmiipcoqoQoV8uNnRIIcbCDUP8e97HSalczOyPtIHRJyYQSom4AC04wF5QCVpqJAzQuVQEJ3GLyEGVUUooBeHmEzrcEkcD5McoIEIZLxeEUZaUohCSEyTM0jvGTpAIGJeEZun3x1rT5s1v_-WDjdT7vBt_6VePSbjqbvzoTfdd0q7z2IX9vrPP5orMuxCHRCb9AJ7Vpoxvv5wgt7u_epo94Nn94mt7OcM9kMeDaVsYuBV8a4IbZWlILVgnpKCscMAGpOQNJq1o5wxUYwaWSzAKoknMOI3S18-1NrExbB9NVTdR9aL5NWGsqhFSCqnQ32d3FRHUrF_TS-6-oKdGbUDXoFJPeBqg3oSYB7I2D__l1cdBuo6hcNwTTVh-mH9K3GogCIrgGqplk8A-0xnKq</recordid><startdate>2003</startdate><enddate>2003</enddate><creator>Jaimes, Alejandro</creator><creator>Tseng, Belle L.</creator><creator>Smith, John R.</creator><general>Springer Berlin / Heidelberg</general><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>FFUUA</scope><scope>IQODW</scope></search><sort><creationdate>2003</creationdate><title>Modal Keywords, Ontologies, and Reasoning for Video Understanding</title><author>Jaimes, Alejandro ; Tseng, Belle L. ; Smith, John R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p268t-fdcadb54ba34a2df61d3d756e128e325332542361cf7ea473a546762d33794443</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Automatic Speech Recognition</topic><topic>Computer science; control theory; systems</topic><topic>Confidence Score</topic><topic>Exact sciences and technology</topic><topic>Pattern recognition. Digital image processing. Computational geometry</topic><topic>Reasoning Engine</topic><topic>Related Text</topic><topic>Video Shot</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jaimes, Alejandro</creatorcontrib><creatorcontrib>Tseng, Belle L.</creatorcontrib><creatorcontrib>Smith, John R.</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jaimes, Alejandro</au><au>Tseng, Belle L.</au><au>Smith, John R.</au><au>Sebe, Nicu</au><au>Zhou, Xiang S</au><au>Lew, Michael S</au><au>Bakker, Erwin M</au><au>Huang, Thomas S</au><au>Bakker, Erwin M.</au><au>Lew, Michael S.</au><au>Sebe, Nicu</au><au>Zhou, Xiang Sean</au><au>Huang, Thomas S.</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Modal Keywords, Ontologies, and Reasoning for Video Understanding</atitle><btitle>Image and Video Retrieval</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2003</date><risdate>2003</risdate><volume>2728</volume><spage>248</spage><epage>259</epage><pages>248-259</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>3540406344</isbn><isbn>9783540406341</isbn><eisbn>9783540451136</eisbn><eisbn>3540451137</eisbn><abstract>We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from automatic speech recognition, related text, etc.). We introduce the idea of modal keywords, which are keywords that represent perceptual concepts in the following categories: visual (e.g., sky), aural (e.g., scream), olfactory (e.g., vanilla), tactile (e.g., feather), and taste (e.g., candy). A method is presented to automatically classify keywords from speech recognition, queries, or related text into these categories using WordNet and TGM I. For video understanding, the following operations are performed automatically: scene cut detection, automatic speech recognition, feature extraction, and visual detection (e.g., sky, face, indoor). These operation results are used in our system by a rule-based engine that uses context information (e.g., text from speech) to enhance visual detection results. We discuss semi-automatic construction of multimedia ontologies and present experiments in which visual detector outputs are modified by simple rules that use context information available with the video.</abstract><cop>Germany</cop><pub>Springer Berlin / Heidelberg</pub><doi>10.1007/3-540-45113-7_25</doi><oclcid>958560026</oclcid><tpages>12</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	Image and Video Retrieval, 2003, Vol.2728, p.248-259
issn	0302-9743 1611-3349
language	eng
recordid	cdi_pascalfrancis_primary_15567517
source	Springer Books
subjects	Applied sciences Artificial intelligence Automatic Speech Recognition Computer science control theory systems Confidence Score Exact sciences and technology Pattern recognition. Digital image processing. Computational geometry Reasoning Engine Related Text Video Shot
title	Modal Keywords, Ontologies, and Reasoning for Video Understanding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A17%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Modal%20Keywords,%20Ontologies,%20and%20Reasoning%20for%20Video%20Understanding&rft.btitle=Image%20and%20Video%20Retrieval&rft.au=Jaimes,%20Alejandro&rft.date=2003&rft.volume=2728&rft.spage=248&rft.epage=259&rft.pages=248-259&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=3540406344&rft.isbn_list=9783540406341&rft_id=info:doi/10.1007/3-540-45113-7_25&rft_dat=%3Cproquest_pasca%3EEBC3073054_31_262%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540451136&rft.eisbn_list=3540451137&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC3073054_31_262&rft_id=info:pmid/&rfr_iscdi=true