Visual Concept Learning from Weakly Labeled Web Videos

Adrian Ulges, Damian Borth, Thomas Breuel

In: Dan Schonfeld , Caifeng Shan , Dacheng Tao , Liang Wang (Hrsg.). Video Search and Mining. Kapitel 8 Springer 2010.


Concept detection is a core component of video database search, concerned with the automatic recognition of visually diverse categories of objects ("airplane"), locations ("desert"), or activities ("interview"). The task poses a difficult challenge as the amount of accurately labeled data available for supervised training is limited and coverage of concept classes is poor. In order to overcome these problems, we describe the use of videos found on the web as training data for concept detectors, using tagging and folksonomies as annotation sources. This permits us to scale up training to very large data sets and concept vocabularies. In order to take advantage of user-supplied tags on the web, we need to overcome problems of label weakness; web tags are context-dependent, unreliable and coarse. Our approach to addressing this problem is to automatically identify and filter non-relevant material. We demonstrate on a large database of videos retrieved from the web that this approach ­ called relevance filtering ­ leads to significant improvements over supervised learning techniques for categorization. In addition, we show how the approach can be combined with active learning to achieve additional performance improvements at moderate annotation cost.


chapter.pdf (pdf, 4 MB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence