From Visual Forms to Metaphors : Targeting Cultural Competence in Image Analysis

Image analysis has taken a large step forward with the development within machine learning. Today, recognizing images as well as constituent parts of images (faces, objects, etc.) is a relatively common task within machine learning. However, there is still a big difference between recognizing the co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Oestreicher, Lars, von Bonsdorff, Jan
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Art History Computer Systems cultural competence Datorsystem General Language Studies and Linguistics high-level image content Jämförande språkvetenskap och allmän lingvistik Konstvetenskap Multi-modal machine learning pictorial conventions visual metaphors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Image analysis has taken a large step forward with the development within machine learning. Today, recognizing images as well as constituent parts of images (faces, objects, etc.) is a relatively common task within machine learning. However, there is still a big difference between recognizing the content of a picture and understanding the meaning of the image. In the current project we have chosen an interdisciplinary approach to this problem, including art history, machine learning and computational linguistics. Current approaches pay large attention to details of the image when trying to describe what is in the picture, resulting, e.g., in that smiling faces will support the interpretation of the image as “positive” or “happy”, even if the picture itself is a scary scene. Other problematic issues are irony and other polyvalent messages with a large amount of ambiguity that enables for example humorous interpretations of a picture. As a starting point, we have chosen to identify visual agency, i.e., how and why pictures, when regarded as acting agents, effectively may catch the attention of the viewer. Our objective for this first phase of the project is to investigate multi-modal models’ capacity for recognizing such high-level image content as, for example, context, agency, visual narration, and metaphors. Ultimately, the goal is to improve cultural competence and visual literacy of neural networks through art-historical and humanities expertise. In the paper we will describe our current approach, the general ideas behind it, and the methods that will be used.