Research

My research activity is mainly devoted to the video processing area whereby I have been working since 2004 when I started my M.Sc. Thesis. My contribution to this area can be briefly summarised as follows.

Past research activities:

Current research activities:

Video surveillance (top)

Video surveillance aims to study techniques and methodologies to automatically monitor indoor and outdoor environments in order to prevent terroristic attacks, robberies and vandalism. During my M.Sc. Thesis, me and my collegue Dr. Davide Migliore have developed a video surveillance algorithm capable to detect and track moving objects across the scene being monitored. This video surveillance algorithm adresses the well problems of foreground aperture, ghosting and waking person in the motion detection phase and is also able to follow objects with generic shape. You can find the whole system description in my M.Sc. thesis, the paper presented at the ACM international workshop on Video Surveillance and Sensor Networks (VSSN) or the paper presented at the Workshop on Ambient Intelligence in the Conference of Italian Association for Artificial Intelligence. Finally, some demos of the whole system can be downloaded below.


Example 1: An object enters in the scene and is first recognized and then followed by the system

Example 2 : Two persons enter together in the scene, they change their trajectories and two objects are created


Example 3 : Two persons enter alone in the scene, they meet and the merge is done


Example 4: Three persons enter in the scene and go out then one of them enters again and is recognized

Video transcoding (top)

Video transcoding technologies provide methodologies to convert one coded format (X) into another one (Y). This conversion might involve either bitstream parameters such as bitrate, frame rate, error resiliency or syntax parameters (i.e. the used standard):

The first case of transcoding is called homogeneous transcoding while the latter heterogeneous transcoding. In my Ph.D. research I have investigated both these two scenarios. For the former case, I have considered the H.264/AVC to MPEG-2 conversion. This kind of transcoding is very interesting, especially in Digital Terrestrial Television (DTT) applications, since broadcasters will be more and more interested in transmitting their contents with the H.264/AVC format while several set-top boxes are still compliant with the MPEG-2 standard and a sudden migration towards H.264/AVC might be unfeseable for most of the consumers. The proposed transcoding architecture maximises the reuse of the information present in the incoming bitstream in order to convert it by involving low computational complexity and limiting the deployment cost. More details are available in my Ph.D. Thesis.


Error resilient video coding (top)

Error resilient video coding provides methodologies to add the channel redundancy to coded bitstream in order to minimise the distortion induced by channel losses. In this area, my contribution can be summarized in two proposals. The first one is an error resilient transcoder which exploits the H.264/AVC Flexible Macroblock Ordering (FMO) tool in order to cluster the macroblocks of each frame into two slice groups, A and B, whereby slice group A will receive more channel redundancy. Macroblock classification is done using data provided by running entropy decoding only. This avoids more complex processing such as motion compensation and keeps the classification phase at low complexity. You can find more details on this error resilient transcoder both on my Ph.D. Thesis as well as on this paper.


The second proposals in the error resilient video coding area is a scheme which adds channel protection "on-top" of a coded bitstream and in an unequal fashion. The addition of channel protection is done by means of coding tools derived from the distributed source coding theory. More details on this error resilient scheme can be found in my Ph.D. Thesis, this conference paper and also in this journal paper.



Video quality assessment (top)

Video quality assessment aims to provide techniques to subjectively or objectively evaluate the quality of transmitted videos. In particular, the main focus in video quality assessment is mostly concerned in the design of quality metrics that aim to predict the subjective rating given by human being observers. My contribution in this area mainly regards the design of the NORM algorithm (NO-Reference video quality Monitoring) which estimates the Mean Square Error (MSE) induced by channel losses:


Furthermore, I have also asseded, by means of subjective tests, that the NORM MSE estimate can be used as a good predictor of the perceived quality of videos impaired by channel losses. Below you have some examples of the results coming from the subjective campaign.

 

 


More details on the NORM algorithm can be found in my Ph.D. Thesis and this journal paper. Regarding the subjective study, the details are in this paper. Finally I have also collaborate in the design of a subjective database which collects subjective scores for videos transmitted through error prone IP networks. Check it out at http://vqa.como.polimi.it/.

Perceptual video coding (top)

The Human Visual System (HVS) shows a space and time varying sensitivity to the distortion introduced by quantisation in lossy coding. This varying distortion sensitivity is related to some visual masking phenomena of the HVS as exemplified in the following image which has been coded with the JPEG standard and a uniform quantisation step:


As you may notice, although the quantisation step was uniform over the whole image, there are image areas whereby the coding artifacts (blockiness) are more noticeable (e.g. actor's face). Perceptual video coding relies on this space varying distortion sensitivity to design and develop video coding algorithms which improve the compression efficiency by performing coarser quantisation in image areas where the HVS is less sensitive and finer quantisation otherwise. In this setup, I have studied the integration of some state-of-the-art Just Noticeable Distortion (JND) models in the H.264/AVC video coding standard in order to develop a novel perceptual video codec architecture denoted as Instituto Superior Técnico - Perceptual Video Codec (IST-PVC). You can download some IST-PVC video demos here. The whole codec description is available in this journal paper. The designed JND model integration has been also extended to the codec considered when the HEVC standardisation started (TMuC codec) and the results can be found in this paper presented at ICASSP in 2011. Since I joined BBC R&D I started working towards a simplified version of the considered JND model integration in the HEVC standard which has then been presented for consideration at the JCT-VC. Together with Prof. David Bull's research group at University of Bristol, I collaborated in some studies on the design of luminance masking models and perceptual quantisation for compression of High Dynamic Range (HDR) video content. The main results can be found in one paper presented at ICIP and another at PCS in 2013.


 Standardisation of video compression technology (top)

Video coding standards play a fundamental role in video compression technology by specifying the bitstream's syntax and semantics to enable interoperability among different parties involved in the video content production and delivery chain (e.g. codec manufactures, broadcasters, etc.). Any video coding standard specifies the bitstream syntax and semantics from the decoder perspective as illustrated in the following figure:



Since I joined BBC R&D in 2011 I had the opportunity to closely follow the standardisation of HEVC and its related extensions, i.e. scalable, range, etc. In this area I mainly study and develop coding tools useful to a broadcaster as the BBC. More precisely, I developed and proposed a perceptual quantisation coding based on the luminance masking phenomenon of the human visual system (see this ICME paper for more details). I've also collaborated to the development of the transform skip mode (see this paper for further details) and after the finalisation of HEVC Version 1 in January 2013, my research focus moved towards the Range Extension (RExt). In this area I proposed an extension of the residual DPCM tool for inter coded blocks (see this PCS paper) and I designed a Supplementary Enhanced Information (SEI) message to carry some useful information associated to alpha channels for studio and post-production video coding applications.
 
 Coding algorithms optimisation for UHD content storage and delivery (top)

HEVC shows excellent performance when coding High Definition (HD) and Ultra High Definition (UHD) content thanks to its novel coding tools. This compression efficiency improvement comes at the cost of an increased computational complexity.



In practical video coding applications which deal with UHD content, it is crucial researching on optimisation techniques which speed up the coding processing without sacrificing compression performance. In this area I've worked on a project co-funded by the Technology Strategy Board (TSB) to devise such algorithms for UHD content storage and transmission (web page). The main output of a project is a software HEVC encoder which is released as open source and denoted as the Turing codec. For this codec I'm one of the main developers working on the extension and maintenance.