Integrating YOLO and WordNet for automated image object summarization

The demand for methods that automatically create text summaries from images containing many things has recently grown. Our research introduces a fresh and creative way to achieve this. We bring together the WordNet dictionary and the YOLO model to make this happen. YOLO helps us find where the thing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2024-12, Vol.18 (12), p.9465-9481
Hauptverfasser: Saqib, Sheikh Muhammad, Aftab, Aamir, Mazhar, Tehseen, Iqbal, Muhammad, Shahazad, Tariq, Almogren, Ahmad, Hamam, Habib
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 9481
container_issue 12
container_start_page 9465
container_title Signal, image and video processing
container_volume 18
creator Saqib, Sheikh Muhammad
Aftab, Aamir
Mazhar, Tehseen
Iqbal, Muhammad
Shahazad, Tariq
Almogren, Ahmad
Hamam, Habib
description The demand for methods that automatically create text summaries from images containing many things has recently grown. Our research introduces a fresh and creative way to achieve this. We bring together the WordNet dictionary and the YOLO model to make this happen. YOLO helps us find where the things are in the images, while WordNet provides their meanings. Our process then crafts a summary for each object found. This new technique can have a big impact on computer vision and natural language processing. It can make understanding complicated images, filled with lots of things, much simpler. To test our approach, we used 1381 pictures from the Google Image search engine. Our results showed high accuracy, with 72% for object detection. The precision was 85%, the recall was 72%, and the F1-score was 74%.
doi_str_mv 10.1007/s11760-024-03560-z
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3124100978</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3124100978</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-a3e50b044dcb2bcc90ab87d05fddaf3042f4537283f8f9ee534b7616be6690203</originalsourceid><addsrcrecordid>eNp9kD9PwzAQxS0EElXpF2CyxBw4_0nsjKgqUKmiCwgxWXZiR61IXGxnoJ8eQxBs3HJveO_d6YfQJYFrAiBuIiGiggIoL4CVWR1P0IzIihVEEHL6q4Gdo0WMe8jDqJCVnKHVeki2Czrthg6_bjdbrIcWv_jQPtqEnQ9Yj8n3OtkW73rdWezN3jYJx7Hvddgdc9IPF-jM6bdoFz97jp7vVk_Lh2KzvV8vbzdFQwFSoZktwQDnbWOoaZoatJGihdK1rXYMOHW8ZIJK5qSrrS0ZN6IilbFVVQMFNkdXU-8h-PfRxqT2fgxDPqkYoTzTqIXMLjq5muBjDNapQ8i_hw9FQH0RUxMxlYmpb2LqmENsCsVsHjob_qr_SX0CkfZuIA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3124100978</pqid></control><display><type>article</type><title>Integrating YOLO and WordNet for automated image object summarization</title><source>Springer Nature - Complete Springer Journals</source><creator>Saqib, Sheikh Muhammad ; Aftab, Aamir ; Mazhar, Tehseen ; Iqbal, Muhammad ; Shahazad, Tariq ; Almogren, Ahmad ; Hamam, Habib</creator><creatorcontrib>Saqib, Sheikh Muhammad ; Aftab, Aamir ; Mazhar, Tehseen ; Iqbal, Muhammad ; Shahazad, Tariq ; Almogren, Ahmad ; Hamam, Habib</creatorcontrib><description>The demand for methods that automatically create text summaries from images containing many things has recently grown. Our research introduces a fresh and creative way to achieve this. We bring together the WordNet dictionary and the YOLO model to make this happen. YOLO helps us find where the things are in the images, while WordNet provides their meanings. Our process then crafts a summary for each object found. This new technique can have a big impact on computer vision and natural language processing. It can make understanding complicated images, filled with lots of things, much simpler. To test our approach, we used 1381 pictures from the Google Image search engine. Our results showed high accuracy, with 72% for object detection. The precision was 85%, the recall was 72%, and the F1-score was 74%.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03560-z</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Computer Imaging ; Computer Science ; Computer vision ; Image Processing and Computer Vision ; Multimedia Information Systems ; Natural language processing ; Object recognition ; Original Paper ; Pattern Recognition and Graphics ; Search engines ; Signal,Image and Speech Processing ; Vision</subject><ispartof>Signal, image and video processing, 2024-12, Vol.18 (12), p.9465-9481</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-a3e50b044dcb2bcc90ab87d05fddaf3042f4537283f8f9ee534b7616be6690203</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-024-03560-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-024-03560-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Saqib, Sheikh Muhammad</creatorcontrib><creatorcontrib>Aftab, Aamir</creatorcontrib><creatorcontrib>Mazhar, Tehseen</creatorcontrib><creatorcontrib>Iqbal, Muhammad</creatorcontrib><creatorcontrib>Shahazad, Tariq</creatorcontrib><creatorcontrib>Almogren, Ahmad</creatorcontrib><creatorcontrib>Hamam, Habib</creatorcontrib><title>Integrating YOLO and WordNet for automated image object summarization</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>The demand for methods that automatically create text summaries from images containing many things has recently grown. Our research introduces a fresh and creative way to achieve this. We bring together the WordNet dictionary and the YOLO model to make this happen. YOLO helps us find where the things are in the images, while WordNet provides their meanings. Our process then crafts a summary for each object found. This new technique can have a big impact on computer vision and natural language processing. It can make understanding complicated images, filled with lots of things, much simpler. To test our approach, we used 1381 pictures from the Google Image search engine. Our results showed high accuracy, with 72% for object detection. The precision was 85%, the recall was 72%, and the F1-score was 74%.</description><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Natural language processing</subject><subject>Object recognition</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Search engines</subject><subject>Signal,Image and Speech Processing</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kD9PwzAQxS0EElXpF2CyxBw4_0nsjKgqUKmiCwgxWXZiR61IXGxnoJ8eQxBs3HJveO_d6YfQJYFrAiBuIiGiggIoL4CVWR1P0IzIihVEEHL6q4Gdo0WMe8jDqJCVnKHVeki2Czrthg6_bjdbrIcWv_jQPtqEnQ9Yj8n3OtkW73rdWezN3jYJx7Hvddgdc9IPF-jM6bdoFz97jp7vVk_Lh2KzvV8vbzdFQwFSoZktwQDnbWOoaZoatJGihdK1rXYMOHW8ZIJK5qSrrS0ZN6IilbFVVQMFNkdXU-8h-PfRxqT2fgxDPqkYoTzTqIXMLjq5muBjDNapQ8i_hw9FQH0RUxMxlYmpb2LqmENsCsVsHjob_qr_SX0CkfZuIA</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Saqib, Sheikh Muhammad</creator><creator>Aftab, Aamir</creator><creator>Mazhar, Tehseen</creator><creator>Iqbal, Muhammad</creator><creator>Shahazad, Tariq</creator><creator>Almogren, Ahmad</creator><creator>Hamam, Habib</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241201</creationdate><title>Integrating YOLO and WordNet for automated image object summarization</title><author>Saqib, Sheikh Muhammad ; Aftab, Aamir ; Mazhar, Tehseen ; Iqbal, Muhammad ; Shahazad, Tariq ; Almogren, Ahmad ; Hamam, Habib</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-a3e50b044dcb2bcc90ab87d05fddaf3042f4537283f8f9ee534b7616be6690203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Natural language processing</topic><topic>Object recognition</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Search engines</topic><topic>Signal,Image and Speech Processing</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Saqib, Sheikh Muhammad</creatorcontrib><creatorcontrib>Aftab, Aamir</creatorcontrib><creatorcontrib>Mazhar, Tehseen</creatorcontrib><creatorcontrib>Iqbal, Muhammad</creatorcontrib><creatorcontrib>Shahazad, Tariq</creatorcontrib><creatorcontrib>Almogren, Ahmad</creatorcontrib><creatorcontrib>Hamam, Habib</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Saqib, Sheikh Muhammad</au><au>Aftab, Aamir</au><au>Mazhar, Tehseen</au><au>Iqbal, Muhammad</au><au>Shahazad, Tariq</au><au>Almogren, Ahmad</au><au>Hamam, Habib</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Integrating YOLO and WordNet for automated image object summarization</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024-12-01</date><risdate>2024</risdate><volume>18</volume><issue>12</issue><spage>9465</spage><epage>9481</epage><pages>9465-9481</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>The demand for methods that automatically create text summaries from images containing many things has recently grown. Our research introduces a fresh and creative way to achieve this. We bring together the WordNet dictionary and the YOLO model to make this happen. YOLO helps us find where the things are in the images, while WordNet provides their meanings. Our process then crafts a summary for each object found. This new technique can have a big impact on computer vision and natural language processing. It can make understanding complicated images, filled with lots of things, much simpler. To test our approach, we used 1381 pictures from the Google Image search engine. Our results showed high accuracy, with 72% for object detection. The precision was 85%, the recall was 72%, and the F1-score was 74%.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03560-z</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1863-1703
ispartof Signal, image and video processing, 2024-12, Vol.18 (12), p.9465-9481
issn 1863-1703
1863-1711
language eng
recordid cdi_proquest_journals_3124100978
source Springer Nature - Complete Springer Journals
subjects Computer Imaging
Computer Science
Computer vision
Image Processing and Computer Vision
Multimedia Information Systems
Natural language processing
Object recognition
Original Paper
Pattern Recognition and Graphics
Search engines
Signal,Image and Speech Processing
Vision
title Integrating YOLO and WordNet for automated image object summarization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T20%3A28%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Integrating%20YOLO%20and%20WordNet%20for%20automated%20image%20object%20summarization&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Saqib,%20Sheikh%20Muhammad&rft.date=2024-12-01&rft.volume=18&rft.issue=12&rft.spage=9465&rft.epage=9481&rft.pages=9465-9481&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03560-z&rft_dat=%3Cproquest_cross%3E3124100978%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3124100978&rft_id=info:pmid/&rfr_iscdi=true