Multiple Instance Learning on Weakly Labeled Videos

Adrian Ulges; Christian Schulze; Thomas Breuel

In: Workshop on Cross-Media Information Analysis, Extraction and Management. Workshop on Cross-Media Information Analysis, Extraction and Management, located at SAMT 2008, December 3, Koblenz, Germany, SAMT Workshop on Cross-Media Information Analysis and Retrieval, Springer, 12/2008.


Automatic video tagging systems are targeted at assigning semantic concepts (``tags'') to videos by linking textual descriptions with the audio-visual video content. To train such systems, we investigate online video from portals such as YouTube as a large-scale, freely available knowledge source. Tags provided by video owners serve as weak annotations indicating that a target concept appears in a video, but not when it appears. This situation resembles the multiple instance learning (MIL) scenario, in which classifiers are trained on labeled bags (videos) of unlabeled samples (the frames of a video). We study MIL in quantitative experiments on real-world online videos. Our key findings are: (1) conventional MIL tends to neglect valuable information in the training data and thus performs poorly. (2) By relaxing the MIL assumption, a tagging system can be built that performs comparable or better than its supervised counterpart. (3) Improvements by MIL are minor compared to a kernel-based model we proposed recently.


Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence