Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization
The issue of image captioning, which comprises automatic text generation to understand an image’s visual information, has become feasible with the developments in object recognition and image classification. Deep learning has received much interest from the scientific community and can be very usefu...
Gespeichert in:
Veröffentlicht in: | Engineering, technology & applied science research technology & applied science research, 2024-10, Vol.14 (5), p.17337-17343 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 17343 |
---|---|
container_issue | 5 |
container_start_page | 17337 |
container_title | Engineering, technology & applied science research |
container_volume | 14 |
creator | Khassaf, Nuha M. Ali, Nada Hussein M. |
description | The issue of image captioning, which comprises automatic text generation to understand an image’s visual information, has become feasible with the developments in object recognition and image classification. Deep learning has received much interest from the scientific community and can be very useful in real-world applications. The proposed image captioning approach involves the use of Convolution Neural Network (CNN) pre-trained models combined with Long Short Term Memory (LSTM) to generate image captions. The process includes two stages. The first stage entails training the CNN-LSTM models using baseline hyper-parameters and the second stage encompasses training CNN-LSTM models by optimizing and adjusting the hyper-parameters of the previous stage. Improvements include the use of a new activation function, regular parameter tuning, and an improved learning rate in the later stages of training. The experimental results on the flickr8k dataset showed a noticeable and satisfactory improvement in the second stage, where a clear increment was achieved in the evaluation metrics Bleu1-4, Meteor, and Rouge-L. This increment confirmed the effectiveness of the alterations and highlighted the importance of hyper-parameter tuning in improving the performance of CNN-LSTM models in image caption tasks. |
doi_str_mv | 10.48084/etasr.8455 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_48084_etasr_8455</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_48084_etasr_8455</sourcerecordid><originalsourceid>FETCH-LOGICAL-c735-771ad67789a8367315f1c2b4f5b5f0a332f0bbf9a3059f066a587ee2bf8ae43</originalsourceid><addsrcrecordid>eNotkEtLw0AUhQdRMNSu_AOzl6nznslSgtpCX2iXQrhp79RI04SZoLS_3rZ6NmdxHouPkHvBR9pzrx-xhxRHXhtzRTLhcsk8V_aaZFJqwbT27pYMU_riJ1lvtZMZ-Zg0XWy_6_2WLiOyPkK9xw0t5nM2fV_N6Kzd4C7R0EY6aWCLtICur9v9efBT9590fOgwsiVEaLDHSBenuKmPcC7dkZsAu4TDfx-Qt5fnVTFm08XrpHiasrVThjknYGOd8zl4ZZ0SJoi1rHQwlQkclJKBV1XIQXGTB24tGO8QZRU8oFYD8vB3uo5tShFD2cW6gXgoBS8vYMoLmPIMRv0CyPZXZA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Khassaf, Nuha M. ; Ali, Nada Hussein M.</creator><creatorcontrib>Khassaf, Nuha M. ; Ali, Nada Hussein M.</creatorcontrib><description>The issue of image captioning, which comprises automatic text generation to understand an image’s visual information, has become feasible with the developments in object recognition and image classification. Deep learning has received much interest from the scientific community and can be very useful in real-world applications. The proposed image captioning approach involves the use of Convolution Neural Network (CNN) pre-trained models combined with Long Short Term Memory (LSTM) to generate image captions. The process includes two stages. The first stage entails training the CNN-LSTM models using baseline hyper-parameters and the second stage encompasses training CNN-LSTM models by optimizing and adjusting the hyper-parameters of the previous stage. Improvements include the use of a new activation function, regular parameter tuning, and an improved learning rate in the later stages of training. The experimental results on the flickr8k dataset showed a noticeable and satisfactory improvement in the second stage, where a clear increment was achieved in the evaluation metrics Bleu1-4, Meteor, and Rouge-L. This increment confirmed the effectiveness of the alterations and highlighted the importance of hyper-parameter tuning in improving the performance of CNN-LSTM models in image caption tasks.</description><identifier>ISSN: 2241-4487</identifier><identifier>EISSN: 1792-8036</identifier><identifier>DOI: 10.48084/etasr.8455</identifier><language>eng</language><ispartof>Engineering, technology & applied science research, 2024-10, Vol.14 (5), p.17337-17343</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c735-771ad67789a8367315f1c2b4f5b5f0a332f0bbf9a3059f066a587ee2bf8ae43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,781,785,27929,27930</link.rule.ids></links><search><creatorcontrib>Khassaf, Nuha M.</creatorcontrib><creatorcontrib>Ali, Nada Hussein M.</creatorcontrib><title>Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization</title><title>Engineering, technology & applied science research</title><description>The issue of image captioning, which comprises automatic text generation to understand an image’s visual information, has become feasible with the developments in object recognition and image classification. Deep learning has received much interest from the scientific community and can be very useful in real-world applications. The proposed image captioning approach involves the use of Convolution Neural Network (CNN) pre-trained models combined with Long Short Term Memory (LSTM) to generate image captions. The process includes two stages. The first stage entails training the CNN-LSTM models using baseline hyper-parameters and the second stage encompasses training CNN-LSTM models by optimizing and adjusting the hyper-parameters of the previous stage. Improvements include the use of a new activation function, regular parameter tuning, and an improved learning rate in the later stages of training. The experimental results on the flickr8k dataset showed a noticeable and satisfactory improvement in the second stage, where a clear increment was achieved in the evaluation metrics Bleu1-4, Meteor, and Rouge-L. This increment confirmed the effectiveness of the alterations and highlighted the importance of hyper-parameter tuning in improving the performance of CNN-LSTM models in image caption tasks.</description><issn>2241-4487</issn><issn>1792-8036</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNotkEtLw0AUhQdRMNSu_AOzl6nznslSgtpCX2iXQrhp79RI04SZoLS_3rZ6NmdxHouPkHvBR9pzrx-xhxRHXhtzRTLhcsk8V_aaZFJqwbT27pYMU_riJ1lvtZMZ-Zg0XWy_6_2WLiOyPkK9xw0t5nM2fV_N6Kzd4C7R0EY6aWCLtICur9v9efBT9590fOgwsiVEaLDHSBenuKmPcC7dkZsAu4TDfx-Qt5fnVTFm08XrpHiasrVThjknYGOd8zl4ZZ0SJoi1rHQwlQkclJKBV1XIQXGTB24tGO8QZRU8oFYD8vB3uo5tShFD2cW6gXgoBS8vYMoLmPIMRv0CyPZXZA</recordid><startdate>20241009</startdate><enddate>20241009</enddate><creator>Khassaf, Nuha M.</creator><creator>Ali, Nada Hussein M.</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241009</creationdate><title>Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization</title><author>Khassaf, Nuha M. ; Ali, Nada Hussein M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c735-771ad67789a8367315f1c2b4f5b5f0a332f0bbf9a3059f066a587ee2bf8ae43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Khassaf, Nuha M.</creatorcontrib><creatorcontrib>Ali, Nada Hussein M.</creatorcontrib><collection>CrossRef</collection><jtitle>Engineering, technology & applied science research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khassaf, Nuha M.</au><au>Ali, Nada Hussein M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization</atitle><jtitle>Engineering, technology & applied science research</jtitle><date>2024-10-09</date><risdate>2024</risdate><volume>14</volume><issue>5</issue><spage>17337</spage><epage>17343</epage><pages>17337-17343</pages><issn>2241-4487</issn><eissn>1792-8036</eissn><abstract>The issue of image captioning, which comprises automatic text generation to understand an image’s visual information, has become feasible with the developments in object recognition and image classification. Deep learning has received much interest from the scientific community and can be very useful in real-world applications. The proposed image captioning approach involves the use of Convolution Neural Network (CNN) pre-trained models combined with Long Short Term Memory (LSTM) to generate image captions. The process includes two stages. The first stage entails training the CNN-LSTM models using baseline hyper-parameters and the second stage encompasses training CNN-LSTM models by optimizing and adjusting the hyper-parameters of the previous stage. Improvements include the use of a new activation function, regular parameter tuning, and an improved learning rate in the later stages of training. The experimental results on the flickr8k dataset showed a noticeable and satisfactory improvement in the second stage, where a clear increment was achieved in the evaluation metrics Bleu1-4, Meteor, and Rouge-L. This increment confirmed the effectiveness of the alterations and highlighted the importance of hyper-parameter tuning in improving the performance of CNN-LSTM models in image caption tasks.</abstract><doi>10.48084/etasr.8455</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2241-4487 |
ispartof | Engineering, technology & applied science research, 2024-10, Vol.14 (5), p.17337-17343 |
issn | 2241-4487 1792-8036 |
language | eng |
recordid | cdi_crossref_primary_10_48084_etasr_8455 |
source | EZB-FREE-00999 freely available EZB journals |
title | Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T08%3A42%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Pre-trained%20CNN-LSTM%20Models%20for%20Image%20Captioning%20with%20Hyper-Parameter%20Optimization&rft.jtitle=Engineering,%20technology%20&%20applied%20science%20research&rft.au=Khassaf,%20Nuha%20M.&rft.date=2024-10-09&rft.volume=14&rft.issue=5&rft.spage=17337&rft.epage=17343&rft.pages=17337-17343&rft.issn=2241-4487&rft.eissn=1792-8036&rft_id=info:doi/10.48084/etasr.8455&rft_dat=%3Ccrossref%3E10_48084_etasr_8455%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |