In ICHumans, we envision methods and technical systems that permit low-cost, unobtrusive quantification of human motion in 3D, and the semantic interpretation of human activities based on markerless visual input. In an era where visual content is produced at exploding rates, such methods and systems unlock tremendous opportunities in application domains that are limited only by human imagination.
Accurate methods for human motion capture and interpretation do exist, but are invasive (i.e., require the person to wear special suits and/or markers) and rely on expensive, cumbersome hardware that is deployable in laboratory environments, only. In ICHumans, we aim at developing methods that permit accurate and markerless human motion capture and interpretation with affordable, commonly available cameras. To do so, we rely on our past, award-winning, pioneering work on model-based approaches to human motion capture and interpretation and we combine it with the rapidly evolving machine learning approaches. We advocate that hybrid methods that combine approaches based on domain knowledge with deep learning/CNN-based ones, is the key innovative methodological element that will drive us to the achievement of our ambitious goals.