Research Activities


Computer vision

My research activities on computer vision mainly deal with geometry, 3D reconstruction and structure from motion for (general) cameras

Stereo Camera tracking on mobile phone

(work in progress results)


Stereo Computer Vision on Mobile Devices (project MOOV3D)

The objective of MOOV3D is to build a prototype mobile platform equipped with a stereoscopic camera, and several different types of  3D display output: an auto-stereoscopic screen on the mobile device, 3D glasses, or 3D high-definition television connected to the mobile. 
    In collaboration with the other industrial partners, applications of Augmented Reality and 3D navigation are developed: using the 3D output of the platform, image analysis technique are used to precisely locate objects in space environment, insert and display synthetic objects in relief  in this scene. The 3D image quality will be studied in relation to the modelling of human visual perception of depth. The reaction of users will be evaluated during the project and guide its direction.

    Partners of the project:

    Some more information and work in progress results here
    Project page here

    References: C.18C.19


    Automatic Shoulder Surfing


    Fast, Automatic Shoulder Surfing for Mobile Phones.

    Spying on a person is an easy and effective method to obtain sensitive informations, even when the victim is well protected against common digital attacks. Modern mobile devices allow people to perform some information sensitive actions in unsafe places, where anyone could easily observe the victim while typing. What if your mobile phone has a cool touchscreen interface that gives you graphical feedback as you type (iPhone, Android, BlackBerry Torch)? Does it make shoulder surfing easier or, worse, automatable? Researchers believe so, and to demonstrate it, they developed a practical shoulder surfing attack that automatically reconstructs the sequence of keystrokes by aiming a camera at the target touchscreen while the victim is typing.

    Our attack exploits feedback such as magnified keys, often appearing in predictable positions. This feedback mechanism has been adopted by the top three touchscreen vendors (Apple iOS, Google Android, RIM BlackBerry); in newer version of these mobile OSs, the user has no way to disable it. To demonstrate the effectiveness of the approach, they implemented it against the iPhone (the most popular one), but it can be easily adapted to similar devices with minor modifications. The attack takes into account that, in real-world scenarios, both the victim's device and attacker's spying camera are not standing in fixed positions. To compensate their movements and misalignments, the system detects and rectifies the target screen before identifying keystrokes. By doing that, they are able to automatically recognize up to 97.07% of the keystrokes, with as low as 1.15% errors and an average processing speed that makes it a fast and quasi-real-time alternative to shoulder surfing. 
    Reference: [C.17]


    Real time camera Tracker


    Real-time Match Moving for Cinema (ANR Project ROM)

    The aim of the project ROM ("Real-time On set Matchmoving") is to bridge the gap between the production and the post production process in filmaking. The main idea is to develop a system that allows the director of the movie to have at least a rough preview of the digital effects (3D rendering) that will be later added during the post-production process. This requires the development of tools for real-time camera tracking able to recover the position of the camera using either natural features or artificial reference markers, or even both of them. 
    The novelty of the research relies on three main aspects:
    • Flexibility: the system can work in many different scenarios, such as in a production studios, indoor scenes or even outdoor settings. In particular the system can work either in presence of artificial landmarks (e.g. AR markers) or natural features or with a mixed combination of the two. Therefore the use of markers can be reduced on the scene and less effort is required in the post-production step to digitally remove them.
    • Visual re-localization: the innovative feature of the system is that a (visual) database is built during a pre-shooting step, in which the tracked features (artificial and/or natural) are stored by means of their descriptors (e.g. SIFT, SURF...) and their associated 3D points. Also, some key-frames of the video sequence are stored to create a visual-dictionary; the dictionary can be later queried during any subsequent shot to determine the most similar key-frame, and thus, using the features database, the initial position of the camera in the scene.
    • Real-Time preview: using the database of collected features, during the final shooting the 3D artifacts are rendered in real-time thanks to an innovative architecture based on state-of-the-art hardware and software. Also, thanks to the modularity of the system, a plug-in version of the software can run in the Maya environment.

    Some more information and results here

    Plane-Based Calibration of Central Catadioptric Cameras.

    We present a novel calibration technique for all central catadioptric cameras using images of planar grids. We adopted the well-known sphere camera model to describe the catadioptric projection. Using the so-called lifted coordinates, a linear relation mapping the grid points to the corresponding points on the image plane can be written as a 6x.6 matrix Hcata, which acts like the classical 3x.3 homography for perspective cameras and encode the calibration parameters of the camera. Thus, just like in the classic perspective case, the image of the absolute conic (IAC) can be computed from at least 3 homographies and and from that the intrinsic parameters of the catadioptric camera can be computed. In the case of paracatadioptric cameras one such homography is enough to estimate the IAC, thus allowing the calibration from a single image.

    Reference: [C.15]

    Multi-View Geometry from Lines for General Cameras

    In this work we introduced a new hierarchy of camera models that allows to describe the camera according to the number of points and lines that have a non-empty intersection with all the projection rays of the camera, such as fully non central cameras (rays are unconstrained), axial (rays cross a common line) and x-slit (rays cross two common lines) cameras, and central ones (rays meet at a common point).Using such an hierarchy we extended the multi-view geometry of general camera models by developing novel multi-view matching tensors for line images, i.e. that work on projection rays that cross the same 3D line. By imposing that the projection rays associated to the images of the same 3D line across different views cross a common line in space, we proposed the theoretical formulation and a complete characterization of the matching tensors for any kind of cameras described by the model.

    Reference: [B.1] [C.13]

    Calibration of Off-Axis Cameras Using Single Images of Lines

    We present a novel calibration method for off-axis catadioptric cameras, i.e. standard perspective cameras placed in a generic position w.r.t. an axial-symmetric mirror of unknown shape. The proposed method estimates the intrinsic parameters of the natural perspective camera, the 3D shape of the mirror and its pose w.r.t. the camera. The peculiarity of our approach is that, unlike several other calibration methods, we do not require any cross-section of the mirror to be visible in the image. Instead, we require that the catadioptric image contains at least the image of one generic space line. We then derive some constraints that, combined with the harmonic homology relating the apparent contours of the mirror, allow us to calibrate the off-axis camera.

    References: [B.12]

    3D reconstruction of lines from a single catadioptric image

    In this project we investigate and study the geometry of catadioptric cameras (standard perspective camera placed in front of a curved shaped mirror). The aim of this project is to determine sufficient condition for 3D reconstruction of straight line in space from a single image taken with a non-central catadioptric camera. We first investigated axial symmetric systems obtained by placing the camera pinhole on the symmetry axis of a rotationally symmetric mirror. We found sufficient condition both for general system and for systems based on conical mirror [B.4]. Then we deal with the more general and challenging case where the camera is in a general position with respect to the mirror [B.5]. Off axis systems are easier to set up, since axial symmetric ones require an accurate alignment which is, in general, difficult to check. We have determined sufficient conditions for 3D reconstruction with a general class of mirror [B.7]. We also proposed some robust methods for line localization [B.11].

    3D reconstruction of spheres from single catadioptric images

    The aim of this project is to reconstruct spheres from a single catadioptric image. Using standard perspective and, more in general, central cameras, spheres can not be reconstructed from single image unless a priori knowledge is given (e.g. either radius or distance). Non central catadioptric cameras can be exploited as multi-viewpoint systems. We are developing methods for 3D reconstruction of spheres that determine both the radius and the position using only a single image taken with a non central catadioptric sensor. We have devised a method that exploits the axial symmetry of a catadioptric system, providing sufficiently accurate estimates of both radius and position [A.2].
    Recently we have devised a general method that relaxes the axial symmetry constraint and allow the 3D reconstruction of the sphere when the camera is in a general position with respect to the mirror [B.10]. The method finds many applications in computer vision and, in particular, in robot vision, where it can be used to improve the robot capabilities in playing with a flying ball in Robocup contests.

    Smart Cameras for videosurveillance

    The aim of the project was to develop a method that automatically detect robbery situations, e.g. by detecting the “hands up” pose. The method was implemented aboard a smart camera provided with a Texas Instruments TMS320C6400 processor. The proposed method exploits simple computer vision techniques (e.g. background subtraction, skin and face detection) in order to suit the limited computational and memory resources available.
    Some demos of the system during development (~15MB each, external links):

    • Demo 1 Hands up recognition using background segmentation, skin detection and blob analysis
    • Demo 2 Same as previous with face detection
    • Demo 3 Testing face detection by template matching

    MEPOS – "Optical MEasurements of POsition and Size of wood panels for intelligent automation of sanding machines"

    MEPOS was an European project aimed to the automation of wood-sanding machine, through the development of a robotic agency. In order to control the sanding process and to optimize panel surface finish it is necessary that panel dimensions, shape and position are measured on line, while the panel is moving on the transport belt and entering the machine and that actuators are controlled in force in order to apply the required force distribution on the panel surface, so that each part of the panel is machined at the desired pressure level. Within this project, we developed the vision system that measured the panel position and dimensions. The system was composed of an high performance camera and a LED illuminator that projects a shadow profile over the panel surface: observing and processing the image of the shadow profile the 3D profile of the panel can be recovered. The system was able to able to recover the 3D profile of the panel with an accuracy of about 0.1mm and to provide up to 50 profile measures per second. The prototype of the vision system has been installed aboard the sanding machine and successfully tested.

     Official site of the project.


    My research activities on robotics mainly deal with map building for mobile robots and visual odometry for mobile robots.

    Map Building without odometry information

    This project aims to propose a method to build a global geometric segment-based map by integrating scans collected by laser range scanners without using any knowledge about the robots poses. The current techniques for building maps of environments with a mobile robot work incrementally by integrating a sequence of partial maps acquired by the exploration sensors of the robot. Usually, this integration is based on the use of other sensors, called localization sensors, that give information on the robot position in the already built map. This project aims at overcoming this circularity between exploration and localization by proposing a map building approach that integrates partial maps only on the basis of their geometrical features, without using any information about robot position. We have proposed methods for integrating two partial maps and a sequence of partial maps.

    References: [A.1][B.1][B.2][B.3][B.3] 

    Good Experimental Methodologies for robot mapping

    We propose a methodology for performing experimental activities in the area of robotic mapping. More specifically, we concentrate on mapping methods that operate on segment-based maps. The proposed methodology prescribes a number of issues that, when addressed in the experimental validation of a mapping method, will enable the replication and cross-checking of experiments and the comparison with other methods. We also present the application of the proposed methodology to a specific mapping system we have developed in previous work.

    References: [B.8]EURON GEM SIG

    Visual Odometry

    In this project we develop a technique for visual odometry on the ground plane, based on a single, uncalibrated fixed camera mounted on a mobile robot. The odometric estimate is based on the observation of features (e.g., salient points) on the floor by means of the camera mounted on the mobile robot. The presented odometric technique produces an estimate of the transformation between the ground plane prior to a displacement and the ground plane after the displacement. In addition, the technique estimates the homographic transformation between ground plane and image plane: this allows to determine the 2D structure of the observed features on the ground. A method to estimate both transformations from the extracted points of two images has been developed.

    References: [B.9]

    Free blog counters
    Simone Gasparini,
    23 Jul 2012, 05:19
    Simone Gasparini,
    23 Jul 2012, 05:19