Skip to main content Skip to main navigation

Publication

CUTIE: A human-in-the-loop interface for the generation of personalised and contextualised image captions

Aliki Anagnostopoulou; Sara-Jane Bittner; Lavanya Govindaraju; Hasan Md Tusfiqur Alam; Daniel Sonntag
In: Mensch und Computer 2025 - Workshopband. Mensch und Computer (MuC-2025), MCI-WS04: ABIS 2025 - International Workshop on Personalization and Recommendation, August 31 - September 3, Chemnitz, Germany, Gesellschaft für Informatik e.V. 9/2025.

Abstract

Image captioning is an AI-complete task that bridges computer vision and natural language processing. Its goal is to generate textual descriptions for a given image. However, general-purpose image captioning often does not capture contextual information, such as information about the people present or the location the image was shot. To address this challenge, we propose a web-based tool that leverages automated image captioning, large foundation models, and additional deep learning modules such as object recognition and metadata analysis to accelerate the process of generating contextualised and personalised image captions. The tool allows users to create personalised and contextualised image captions efficiently. User interactions and feedback given to the various components are stored and later used for domain adaptation of the respective components. In a user study comparing our system to a proprietary baseline, the latter received slightly higher scores; however, our system demonstrated competitive performance while offering greater transparency and customisability. Our ultimate goal is to improve the efficiency and accuracy of creating personalised and contextualised image captions.

Projects