Learning Task Structure from Video Examples for Workflow Tracking and Authoring

Nils Petersen, Didier Stricker

In: 11th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2012. IEEE International Symposium on Mixed and Augmented Reality (ISMAR-2012) 11th November 5-8 Atlanta Georgia United States IEEE Computer Society Press 2012.


We present a robust real-time capable and simple framework for segmenting video sequences and live-streams of manual workflows into the comprising single tasks. Using classifiers trained on these segments we can follow a user that is performing the workflow in real-time as well as learn task variants from additional video examples. Our proposed method neither requires object detection nor high-level features. Instead we propose a novel measure derived from image distance that evaluates image properties jointly without prior segmentation. Our method can cope with repetitive and free-hand activities and the results are in many cases comparable or equal to manual task segmentation. One important application of our method is the automatic creation of a step-by-step task documentation from a video demonstration. The entire process to automatically create a fully functional augmented reality manual will be explained in detail and results are shown.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence