3D computer vision

Why 3D Computer Vision?

Making an intelligent system that can see and understand visual information is still a challenging problem. Although some state-of-the-art computer vision systems, using deep learning algorithms, surpass the human performance (e.g., unconstrained face recognition, classification of wildlife), it is not smart enough in the sense that the systems/algorithms require significant human efforts such as a large number of human labels and specifically designed domain-dependant architectures.

I have been addressing this challenging problem with the concept of '3D computer vision'. Having explicit 3D representations is the key of this approach. Given an image or a sequence of images, we infer the 3D objects/scene and compare the expected (rendered) 2D images with the input data. We can say the system understands the scene if it is able to continuously predict the scene using the inferred 3D.

This approach combines detection, tracking, 3D modeling, spatio-temporal pattern analysis, prediction, and active vision in a single framework.

Applications of 3D computer vision

  • 3D face/body modeling for Biometrics, AR/VR, games, and medical applications
  • 3D structure reconstruction for scientific research
  • City scale building reconstruction for self-driving cars, movies, energy, and communications
  • Object and scene reconstruction for service robotics
  • 3D object detection, recognition, tracking, and prediction