Topic Models for Semantics-preserving Video Compression

Jörn Wanke, Adrian Ulges, Christoph Lampert, Thomas Breuel

In: Proceedings of the International Conference Multimedia Information Retrieval. ACM International Conference on Multimedia Information Retrieval (MIR-10) March 29-31 Philadelphia Pennsylvania United States ACM New York, US 2010.


Content-based video understanding tasks such as autoannotation or clustering are based on low-level descriptors of video content, which should be compact in order to optimize storage requirements and efficiency. In this paper, we address the semantic compression of video, i.e. the reduction of low-level descriptors to a few semantically expressive dimensions. To achieve this, topic models have been proposed, which cluster visual content into a low number of latent aspects and have successfully been applied to still images before. In this paper, we investigate topic models for the video domain, addressing several key questions that have been unanswered so far: (1) data: ­ first, we confirm the good performance of topic models for concept detection on web video data, showing that a performance comparable to bag-of-visual-words descriptors can be reached at a compression rate of 1/20. (2) diversity: ­we demonstrate that topic models perform best when trained on large-scale, diverse datasets, i.e. no tedious manual pre-selection is required. (3) multi-modal integration:­ we show how topic models can benefit from an integration of multi-modal features, like motion and patches, and finally (4) temporal structure: ­by extending topic models such that the shot structure of video is taken into account, we show that a better coverage between topics and semantic categories can be achieved.


mir066-wanke.pdf (pdf, 2 MB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz