Generalized Image Captioning for Multilingual Support

Image captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The purpose of image captioning research so far has be...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2023-02, Vol.13 (4), p.2446
Hauptverfasser:	Cho, Suhyun, Oh, Hayoung
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Computational linguistics Datasets Dictionaries Domains Equipment and supplies image captioning Image processing Information processing Language Language processing Learning Methods multimodal Natural language Natural language interfaces Natural language processing Neural networks NLP Pattern recognition People with disabilities Semantics vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Image captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The purpose of image captioning research so far has been to create general explanatory captions in the learning data. However, various environments in reality must be considered for practical use, as well as image descriptions that suit the purpose of use. Image caption research requires processing new learning data to generate descriptive captions for specific purposes, but it takes a lot of time and effort to create learnable data. In this study, we propose a method to solve this problem. Popular image captioning can help visually impaired people understand their surroundings by automatically recognizing and describing images into text and then into voice and is an important issue that can be applied to many places such as image search, art therapy, sports commentary, and real-time traffic information commentary. Through the domain object dictionary method proposed in this study, we propose a method to generate image captions without the need to process new learning data by adjusting the object dictionary for each domain application. The method proposed in the study is to change the dictionary of the object to focus on the domain object dictionary rather than processing the learning data, leading to the creation of various image captions by intensively explaining the objects required for each domain. In this work, we propose a filter captioning model that induces generation of image captions from various domains while maintaining the performance of existing models.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app13042446