Relevance Filtering meets Active Learning: Improving Web-based Concept Detectors

Damian Borth; Adrian Ulges; Thomas Breuel
In: Proceedings of the International Conference Multimedia Information Retrieval. ACM International Conference on Multimedia Information Retrieval (MIR-2010), March 29-31, Philadelphia, Pennsylvania, USA, ACM, New York, US, 3/2010.


We address the challenge of training visual concept detectors on web video as available from portals such as YouTube. In contrast to high-quality but small manually acquired training sets, this setup permits us to scale up concept detection to very large training sets and concept vocabularies. On the downside, web tags are only weak indicators of concept presence, and web video training data contains lots of non- relevant content. So far, there are two general strategies to overcome this label noise problem, both targeted at discarding non-relevant training content: (1) a manual refinement supported by active learning sample selection, (2) an automatic refinement using relevance filtering. In this paper, we present a highly efficient approach combining these two strategies in an interleaved setup: manually refined samples are directly used to improve relevance filtering, which again provides a good basis for the next active learning sample selection. Our results demonstrate that the proposed combination - called active relevance filtering - outperforms both a purely automatic filtering and a manual one based on active learning. For example, by using 50 manual labels per concept, an improvement of 5% over an automatic filtering is achieved, and 6% over active learning. By labeling only 25% of the weak positive labels in the training set, a performance comparable to training on ground truth labels is reached.



Weitere Links