Skip to main content Skip to main navigation


Towards Visual-Inertial SLAM for Mobile Augmented Reality

Gabriele Bleser-Taetz
PhD-Thesis, Technische Universität Kaiserslautern, 2009.


The basic idea of augmented reality is to augment the view of a user or camera with virtual objects. Real-time camera tracking is an enabling technology for augmented reality. Besides the high estimation precision that is needed to allow for pixel accurate augmentations, mobile augmented reality applications impose further requirements on the tracking method. One important aspect is robustness in the presence of quick and erratic camera motions, which are typical for a handheld or head-mounted camera. This thesis investigates robustness and accuracy of real-time markerless camera tracking for mobile augmented reality applications considering both known and unknown environments of different complexity. The major solution strategies are: visual-inertial sensor fusion, decoupling of pose and structure estimation and unification of computer vision and recursive filtering techniques. First, a model-based camera tracking system that fuses visual and inertial measurements in the extended Kalman filter is developed. It uses an affine illumination invariant image processing method that exploits the pose prediction obtained from the sensor fusion algorithm and a textured CAD model of the environment to predict the appearances of corner features in the camera images. In several experiments, the system is demonstrated to work robustly in realistic environments of different complexity, under varying light conditions, fast and erratic camera motions and even short periods without visible features. Compared to vision-only tracking, the system shows less jitter and the computational costs are reduced both due to an accurate prediction of the feature appearances and locations in the images and a reduced demand on features in general. Tracking in partially known environments is addressed by developing a vision-only system, which requires minimal pre-knowledge about the structure of the target scene and derives 3D information online. The system combines robust and efficient sequential structure from motion methods with a simplified stochastic model and recursive localisation of 3D point features. It scales with the size of the map, since pose and structure estimation are decoupled. Compared to an ordinary sequential structure from motion system, the developed method provides higher accuracy in both camera and feature localisation and significantly less drift. This is demonstrated in different experiments based on simulated data and in realistic mid-scale environments. Based on these results, a conceptual solution for visual-inertial simultaneous localisation and mapping in large-scale environments is developed. The idea is to combine the marginalised particle filter for visual-inertial pose estimation with undelayed feature initialisation and localisation on a per-particle basis. The essential algorithms for pose and structure estimation are developed and a proof-of-concept implementation is evaluated in a simple test environment, demonstrating that the novel tracking strategy unifies and enhances the previous developments.