Real-time Analysis and Visualization of the YFCC100m Dataset

Sebastian Kalkowski; Damian Borth; Christian Schulze; Andreas Dengel
In: Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions. ACM International Conference on Multimedia (ACM MM-15), October 26, Brisbane, Australia, Pages 25-30, ISBN 978-1-4503-3744-1, ACM, New York, NY, USA, 2015.


With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has to be made accessible by tools allowing to search for target concepts within the dataset and mechanism to browse images and videos of the dataset. Following best practice from data collections, such as ImageNet and MS COCO, this paper presents means of accessibility for the YFCC100m dataset. This includes a global analysis of the dataset and an online browser to explore and investigate subsets of the dataset in real-time. Providing statistics of the queried images and videos will enable researchers to refine their query successively, such that the users desired subset of interest can be narrowed down quickly. The final set of image and video can be downloaded as URLs from the browser for further processing.



