ROM project


The aim of the project ROM ("Real-time On set Matchmoving") is to bridge the gap between the production and the post production process in filmaking. The main idea is to develop a system that allows the director of the movie to have at least a rough preview of the digital effects (3D rendering) that will be later added during the post-production process. This requires the development of tools for real-time camera tracking able to recover the position of the camera using either natural features or artificial reference markers, or even both of them. 
The novelty of the research relies on three main aspects:
  • Flexibility: the system can work in many different scenarios, such as in a production studios, indoor scenes or even outdoor settings. In particular the system can work either in presence of artificial landmarks (e.g. AR markers) or natural features or with a mixed combination of the two. Therefore the use of markers can be reduced on the scene and less effort is required in the post-production step to digitally remove them.
  • Visual re-localization: the innovative feature of the system is that a (visual) database is built during a pre-shooting step, in which the tracked features (artificial and/or natural) are stored by means of their descriptors (e.g. SIFT, SURF...) and their associated 3D points. Also, some key-frames of the video sequence are stored to create a visual-dictionary; the dictionary can be later queried during any subsequent shot to determine the most similar key-frame, and thus, using the features database, the initial position of the camera in the scene.
  • Real-Time preview: using the database of collected features, during the final shooting the 3D artifacts are rendered in real-time thanks to an innovative architecture based on state-of-the-art hardware and software. Also, thanks to the modularity of the system, a plug-in version of the software can run in the Maya environment.
Partners of the project were

Real-Time Camera Tracker


Camera Tracker

Camera tracker implements a real-time Structure-From-Motion pipeline using natural features and markers to recover the position of the camera. In this demo we used the AR markers and ARTookKitPlus to detect the marker, but any other type of marker and marker detector can be easily integrated in the system. A final version of the software integrate the Concentric Circles Tag detector (CCTag) that have been developed at IRIT (see next demo).

The tracker is flexible since it can work indifferently with any combination of natural and artificial markers: either natural features only or artificial markers only or both of them at the same time. In the latter case AR markers are considered more reliable than the natural features, therefore they have higher priority in all the algorithms that recover the camera position.

On the other hand the markers are used as features, ie no other information than the unique ID and their position in the image are exploited to get the camera position.

The video shows the camera trajectory (green), the 3D points reconstructed by the camera (colored points) and the detected markers along with their 3D points (red squares). Once an AR marker has been detected the middle point of the marker is taken as the feature point representing the marker. In the camera view, u can see the reprojection of the markers in the image, with the red squares overlapping the marker images.

 MAYA plug-in

The camera tracker has been integrated in a real-time plug-in for MAYA to be used in the common VFX pipeline in post-production activities.

In this version the CCTag are used as markers.

Relocalization with Vocabulary tree

 This is a proof of concept for the relocalization of the camera in the scene. A vocabulary tree has been built using the kframe and the features extracted on a previous shooting of the scene. Then in a new video of the scene the camera can be re-localized in the scene by searching for the closest kframe and then after feature matching using the 3D points to relocalize the camera.

In this demo, a proof of concept has been developed. After building a vocabulary tree from an initial video, we tracked the position of the camera in a new video of the same scene searching for the closest kframe for each frame. In a real application relocalization is only done at the beginning of the video and whenever the tracking is lost for some reason.

The original video was a full 1080p HD video, and the result has been speed up 8x. Note that in this case the main bottleneck is SIFT extraction (some seconds), while the query to the vocabulary tree is pretty fast (~5ms).

In the next video, a teapot has been placed in the scene with our RomHost and used to assess the quality of the localization. 

Project History 

(powered by gource)

ROM History

Technical Notes

  • C/C++
Other libraries
Building Framework

Free blog counters