Statistical Modelling of Online Video Content

  • Duration:

The automatic detection of semantic concepts like objects, locations, and events in video streams is becoming an urgent problem, as the amount of digital video being stored and published grows rapidly. Such tagging systems are usually trained on a dataset of manually annotated videos. The acquisition of such training data is time-consuming and cost-intensive, such that current standard benchmarks provide high-quality, but small training sets.

In contrast to this, the human visual system permanently learns from a plethora of visual information, parts of which are digitized and publicly available in large-scale video archives such as youtube. The overall goal of the MOONVID project is to exploit such web video portals for visual learning. Particularly, three scientific questions of fundamental importance are addressed:

  1. How can proper features for the inference of semantics from video be selected and combined?
  2. How can visual learning be made robust with respect to irrelevant content and weak annotations?
  3. Can motion segmentation, which separates object from the background, be used to realize an improved detection of objects?

Share project:

Publications about the project

In: Proceedings of the International Conference on Multimedia. ACM International Conference on Multimedia (ACM MM-2011) November 28-December 1 Scottsdale Arizona United States ACM 11/2011.

To the publication

In: Proceedings of the International Conference on Multimedia and Expo. IEEE International Conference on Multimedia and Expo (ICME-2011) July 11-15 Barcelona Spain IEEE 7/2011.

To the publication
Marcel Worring,

In: Sheila S. Hemami (editor). IEEE Transactions on Multimedia (TransMM) 13 3 Pages 1-12 IEEE Computer Society 4/2011.

To the publication

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz