[Paper] [arXiv] [Poster] [Code] [Video]

How much can we trust self-supervised monocular depth estimation? From a single input image, we estimate depth and uncertainty maps.

Abstract. Self-supervised paradigms for monocular depth estimation are very appealing since they do not require ground truth annotations at all. Despite the astonishing results yielded by such methodologies, learning to reason about the uncertainty of the estimated depth maps is of paramount importance for practical applications, yet uncharted in the literature. Purposely, we explore for the first time how to estimate the uncertainty for this task and how this affects depth accuracy, proposing a novel peculiar technique specifically designed for self-supervised approaches. On the standard KITTI dataset, we exhaustively assess the performance of each method with different self-supervised paradigms. Such evaluation highlights that our proposal i) always improves depth accuracy significantly and ii) yields state-of-the-art results concerning uncertainty estimation when training on sequences and competitive results uniquely deploying stereo pairs.

A network T is trained in self-supervised fashion, e.g. on monocular sequences [t-1, t, t+1]. A new instance S of the same is trained on supervised by T.

Citation:

@inproceedings{Poggi_CVPR_2020,
    title={On the uncertainty of self-supervised monocular depth estimation},
    author={Poggi, Matteo and Aleotti, Filippo and Tosi, Fabio 
            and Mattoccia, Stefano},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
    note={CVPR},
    year={2020}
}