Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

Automatic Visual Captioning (AVC) generates syntactically and semantically correct sentences by describing important objects, attributes, and their relationships with each other. It is classified into two categories: image captioning and video captioning. It is widely used in various applications su...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2023-07, Vol.221, p.119773, Article 119773
Hauptverfasser: Sharma, Dhruv, Dhiman, Chhavi, Kumar, Dinesh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Automatic Visual Captioning (AVC) generates syntactically and semantically correct sentences by describing important objects, attributes, and their relationships with each other. It is classified into two categories: image captioning and video captioning. It is widely used in various applications such as assistance for the visually impaired, human-robot interaction, video surveillance systems, scene understanding, etc. With the unprecedented success of deep-learning in Computer Vision and Natural Language Processing, the past few years have seen a surge of research in this domain. In this survey, the state-of-the-art is classified based on how they conceptualize the captioning problem, viz., traditional approaches that cast visual description either as retrieval or template-based description and deep learning approaches. A detailed review of existing methods, highlighting their pros and cons, societal impact as the number of citations, architectures used, datasets experimented on and GitHub link is presented. Moreover, the survey also provides an overview of the benchmark image and video datasets and the evaluation measures that have been developed to assess the quality of machine-generated captions. It is observed that dense or paragraph generation and Change Image Captioning (CIC) are stimulating the research community more due to the near-to-human abstraction ability. Finally, the paper explores future directions in the area of automatic visual caption generation.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.119773