Skip to main content Skip to main navigation


Image Captioning in the Wild: How People Caption Images on Flickr

Philipp Blandfort; Tushar Karayil; Damian Borth; Andreas Dengel
In: Proceedings of the 2017 ACM on Multimedia Conference. Multimodal Understanding of Social, Affective and Subjective Attributes (MUSA2-17), An ACM MM'17 Workshop, located at ACM Multimedia 2017, ACM, 10/2017.


Automatic image captioning is a well-known problem in the field of artificial intelligence. To solve this problem efficiently, it is also required to understand how people caption images naturally (when not instructed by a set of rules, which tell them to do so in a certain way). This dimension of the problem is rarely discussed. To understand this aspect, we performed a crowdsourcing study on specific subsets of the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M) where annotators evaluate captions with respect to subjectivity, visibility, appeal and intent. We use the resulting data to systematically characterize the variations in image captions that appear "in the wild". We publish our findings here along with the annotated dataset.