Skip to main content Skip to main navigation


Fine-Grained Semantic Segmentation of Motion Capture Data using Convolutional Neural Networks

Noshaba Cheema
Mastersthesis, Saarland University, 3/2019.


Human motion capture data has been widely used in data-driven character animation. In order to generate realistic, natural-looking motions, most data-driven approaches require considerable efforts of pre-processing, including motion segmentation, annotation, and so on. Existing (semi-) automatic solutions either require hand-crafted features for motion segmentation or do not produce the semantic annotations required for motion synthesis and building large-scale motion databases. In this thesis, an approach for a semi-automatic framework for semantic segmentation of motion capture data based on (semi-) supervised machine learning techniques is developed. The motion capture data is first transformed into a “motion image” to apply common convolutional neural networks for image segmentation. Convolutions over the time domain enable the extraction of temporal information and dilated convolutions are used to enlarge the receptive field exponentially using comparably few layers and parameters. The finally developed dilated temporal fully-convolutional model is compared against state-of-the-art models in action segmentation, as well as a popular network for sequence modeling. The models are further tested on noisy and inaccurate training labels and the developed model is found to be surprisingly robust and self-correcting.