DFKI-LT - Cross-Modal Learning Of Visual Categories Using Different Levels of Supervision
Cross-Modal Learning Of Visual Categories Using Different Levels of Supervision
1 Proceedings of the International Conference on Computer Vision Systems (ICVS 2007), Bielefeld, Germany, ICVS, 3/2007
Today’s ob ject categorization methods use either supervised or unsupervised training methods. While supervised methods tend to produce more accurate results, unsupervised methods are highly attrac- tive due to their potential to use far more and unlabeled training data. This paper proposes a novel method that uses unsupervised training to obtain visual groupings of ob jects and a cross-modal learning scheme to overcome inherent limitations of purely unsupervised training. The method uses a uniﬁed and scale-invariant ob ject representation that al- lows to handle labeled as well as unlabeled information in a coherent way. One of the potential settings is to learn ob ject category models from many unlabeled observations and a few dialogue interactions that can be ambiguous or even erroneous. First experiments demonstrate the ability of the system to learn meaningful generalizations across ob jects already from a few dialogue interactions.
Files: BibTeX, fritz+etal-icvs07.pdf