On Learning Models of Appearance for Robust Long-term Visual Navigation

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Simultaneous localization and mapping (SLAM) is a class of techniques that allow robots to navigate unknown environments using onboard sensors. With inexpensive commercial cameras as the primary sensor, visual SLAM has become an important and widely used approach to enabling mobile robot autonomy. However, traditional visual SLAM algorithms use only a fraction of the information available from conventional cameras: in addition to the basic geometric cues typically used in visual SLAM, colour images encode information about the camera itself, environmental illumination, surface materials, vehicle motion, and other factors influencing the image formation process. Moreover, visual localization performance degrades quickly in long-term deployments due to environmental appearance changes caused by lighting, weather, or seasonal effects. This is especially problematic when continuous metric localization is required to drive vision-in-the-loop systems such as autonomous route following. This thesis explores several novel approaches to exploiting additional information from cameras to improve the accuracy and reliability of metric visual SLAM algorithms in short- and long-term deployments. First, we develop a technique for reducing drift error in visual odometry (VO) by estimating the position of a known light source such as the sun using indirect illumination cues available from existing image streams. We build and evaluate hand-engineered and learned models for single-image sun detection and achieve significant reductions in drift error over 30~km of driving in urban and planetary analogue environments. Second, we explore deep image-to-image translation as a means of improving metric visual localization under time-varying illumination. Using images captured under different illumination conditions in a common environment, we demonstrate that localization accuracy and reliability can be substantially improved by learning a many-to-one mapping to a user-selected canonical appearance condition. Finally, we develop a self-supervised method for learning a canonical appearance optimized for high-quality localization. By defining a differentiable surrogate loss function related to the performance of a non-differentiable localization pipeline, we train an optimal RGB-to-grayscale mapping for a given environment, sensor, and pipeline. Using synthetic and real-world long-term vision datasets, we demonstrate significant improvements in localization performance compared to standard grayscale images, enabling continuous metric localization over day-night cycles using a single mapping experience.

Description

Keywords

Appearance estimation, Computer vision, Deep learning, Long-term autonomy, Mobile robotics, State estimation

Citation

ISSN

Related Outputs

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution 4.0 International

Items in TSpace are protected by copyright, with all rights reserved, unless otherwise indicated.