Research

My research activity is mainly devoted to the video processing and compression area whereby I have been working since 2004 when I started my M.Sc. Thesis. In my professional career, I've worked and contributed to different topics which are outlined in the following. Use the hyperlinks to jump to a given section of interest.

Video surveillance

Video transcoding

Error resilient video coding

Video quality assessment

Perceptual video coding

Standardisation of video compression technology

Coding algorithms optimisation for UHD content storage and delivery

High dynamic range imaging

Video compression for universal docking

Video surveillance

Video surveillance aims to study techniques and methodologies to automatically monitor indoor and outdoor environments in order to prevent terroristic attacks, robberies and vandalism. During my M.Sc. Thesis, me and my colleague Dr. Davide Migliore have developed a video surveillance algorithm capable to detect and track moving objects across the scene being monitored. This video surveillance algorithm addresses the well problems of foreground aperture, ghosting and waking person in the motion detection phase and is also able to follow objects with generic shape. You can find the whole system description in my M.Sc. thesis, this paper presented at the ACM international workshop on Video Surveillance and Sensor Networks (VSSN) or the paper presented at the Workshop on Ambient Intelligence in the Conference of Italian Association for Artificial Intelligence. Finally, some demos of the whole system can be downloaded/streamed below.

Example 1: An object enters in the scene, the system detects it and then follows it across the frame.

Example 2 : Two persons enter together in the scene so are treated as single object. Soon as they split, two objects are created and tracked.

Example 3 : Two persons enter separately in the scene and once they meet a single object is created and tracked.

Example 4: Three persons traverse the scene. One of them re-enters again and the system recognises him.

Video transcoding

Video transcoding technologies provide methodologies to convert one coded format (X) into another one (Y). This conversion might involve either bitstream parameters such as bitrate, frame rate, error resiliency or syntax parameters (i.e. the compression format - standard - used):

The first case of transcoding is called homogeneous transcoding while the latter heterogeneous transcoding. In my PhD research I have investigated both these two scenarios. For the latter case, I have considered the H.264/AVC to MPEG-2 conversion. This kind of transcoding is very interesting, especially in Digital Terrestrial Television (DTT) applications, since broadcasters will be more and more interested in transmitting their contents with the H.264/AVC format while several set-top boxes are still compliant with the MPEG-2 standard and a sudden migration towards H.264/AVC might not be affordable for most of the consumers. The proposed transcoding architecture maximises the reuse of the information present in the incoming bitstream in order to convert it by involving low computational complexity and limiting the deployment cost. More details are available in my Ph.D. Thesis.

Error resilient video coding

Error resilient video coding provides methodologies to add the channel redundancy to a coded bitstream in order to minimise the distortion induced by channel losses. In this area, my contribution can be summarised in two proposals. The first one is an error resilient transcoder which exploits the H.264/AVC Flexible Macroblock Ordering (FMO) tool in order to cluster the macroblocks of each frame into two slice groups, A and B, whereby slice group A will receive more channel redundancy. Macroblock classification is done using data provided by running entropy decoding only. This avoids more complex processing such as motion compensation and keeps the classification phase at low complexity. You can find more details on this error resilient transcoder both on my Ph.D. Thesis as well as on this paper.

The second proposals in the error resilient video coding area is a scheme which adds channel protection "on-top" of a coded bitstream and in an unequal fashion. The addition of channel protection is done by means of coding tools derived from distributed source coding principles. More details on this error resilient scheme can be found in my Ph.D. Thesis, this conference paper and also in this journal paper.

Video quality assessment

Video quality assessment aims to provide techniques to subjectively or objectively evaluate the quality of transmitted videos. In particular, the main focus in video quality assessment is mostly concerned in the design of quality metrics that aim to predict the subjective rating given by human being observers. My contribution in this area mainly regards the design of the NORM algorithm (NO-Reference video quality Monitoring) which estimates the Mean Square Error (MSE) induced by channel losses:

Furthermore, I have also assessed, by means of subjective tests, that the NORM MSE estimate can be used as a good predictor of the perceived quality of videos impaired by channel losses. Below you have some examples of the results coming from the subjective campaign.

More details on the NORM algorithm can be found in my Ph.D. Thesis and this journal paper. Regarding the subjective study, the details are in this paper. Finally I have also collaborate in the design of a subjective database which collects subjective scores for videos transmitted through error prone IP networks. Check it out at http://vqa.como.polimi.it/.

Perceptual video coding

The Human Visual System (HVS) shows a space and time varying sensitivity to the distortion introduced by quantisation in lossy coding. This varying distortion sensitivity is related to some visual masking phenomena of the HVS as exemplified in the following image which has been coded with the JPEG standard and the quantisation step is kept constant throughout the whole picture:

As you may notice, there are image areas whereby the coding artefacts (blockiness) are more noticeable (e.g. man's face). Perceptual video coding relies on this space varying distortion sensitivity to design and develop video coding algorithms which improve the compression efficiency by performing coarser quantisation in image areas where the HVS is less sensitive and finer quantisation otherwise. In this setup, I have studied the integration of some state-of-the-art Just Noticeable Distortion (JND) models in the H.264/AVC video coding standard in order to develop a novel perceptual video codec architecture denoted as Instituto Superior Técnico - Perceptual Video Codec (IST-PVC). You can download some IST-PVC video demos here. The whole codec description is available in this journal paper. The designed JND model integration has been also extended to the codec considered when the HEVC standardisation started (TMuC codec) and the results can be found in this paper presented at ICASSP in 2011. Since I joined BBC R&D I started working towards a simplified version of the considered JND model integration in the HEVC standard which has then been presented for consideration at the JCT-VC. Together with Prof. David Bull's research group at University of Bristol, I collaborated in some studies on the design of luminance masking models and perceptual quantisation for compression of High Dynamic Range (HDR) video content. The main results can be found in one paper presented at ICIP and another at PCS in 2013 which was eventually extended in a journal contribution.

Standardisation of video compression technology

Video coding standards play a fundamental role in video compression technology by specifying the bitstream's syntax and semantics to enable interoperability among the different parties involved in the video content production and delivery chain (e.g. codec manufactures, broadcasters, etc.). Any video coding standard specifies the bitstream syntax and semantics from the decoder's perspective as illustrated in the following figure:

Since I joined BBC R&D in 2011 I had the opportunity to closely follow the standardisation of HEVC and its related extensions, i.e. scalable, range, etc. In this area I've mainly studied and developed coding tools useful to a broadcaster as the BBC. More precisely, I developed and proposed a perceptual quantisation coding based on the luminance masking phenomenon of the human visual system (see this ICME paper for more details). I've also collaborated to the development of the transform skip mode (see this paper for further details) and after the finalisation of HEVC Version 1 in January 2013, my research focus moved towards the Range Extension (RExt). In this area I proposed an extension of the residual DPCM tool for inter coded blocks (see this PCS paper) and I designed a Supplementary Enhanced Information (SEI) message to carry some useful information associated to alpha channels for studio and post-production video coding applications.

Coding algorithms optimisation for UHD content storage and delivery

HEVC shows excellent performance when coding High Definition (HD) and Ultra High Definition (UHD) content thanks to its novel coding tools. This compression efficiency improvement comes at the cost of an increased computational complexity.

In practical video coding applications which deal with UHD content, it is crucial researching on optimisation techniques which speed up the coding processing without sacrificing compression performance. In this area I've worked on a project co-funded by the Technology Strategy Board (TSB) to devise such algorithms for UHD content storage and transmission (web page). The main output of a project is a software HEVC encoder which is released as open source and denoted as the Turing codec. For this codec I was one of the main developers working on its extension and maintenance.

High dynamic range imaging

With the advent of Ultra High Definition TV (UHDTV) services viewers will not only be provided with more pixels but also better pixels. High Dynamic Range (HDR) is one of the features which mostly enabled the delivery better pixels and gave the so-called "wow effect" when watching content. Two different HDR systems are standardised in the ITU-R BT.2100 recommendation: the Perceptual Quantiser (PQ) and the Hybrid Log-Gamma (HLG) which relate to a display- and scene-referred system, respectively. Scene-referred systems are usually preferred in broadcasting given that the same content might be delivered to different users, each having their own displays brightness and capabilities. BBC R&D together with the Science and Technology Research Laboratories (STRL) of the Japanese public broadcaster NHK developed the HLG system to enable the delivery of HDR content in broadcasting applications.

Besides the definition of HDR system from the "pixel's point of view" it is also important to ensure that the coding systems are able to handle the associated content and convey the necessary information. In this particular area, I've contributed in leading the preparation of the BBC's response to the MPEG's Call for Evidence (CfE) on high dynamic range imaging and wide colour gamut which was presented at the 112th MPEG meeting in Warsaw. The contribution also included the proposal of the so-called "alternative transfer characteristics" SEI message which allows to enable HDR services in a heterogenous environment where both HDR and Standard Dynamic Range (SDR) receivers exist. When seconded at NHK STRL, I've also studied the effect of compression of HDR material graded with HLG and using a "dynamic range" agnostic HEVC encoder. The main goal of that study was to understand whether a particular adjustment of the quantisation parameters was required for HLG-graded signal. You can find more details in this JVET contribution.

Video compression for universal docking

Universal docking refers to use of USB ports to connect docking stations which will then expand the user's desktop onto multiple screens. Differently from conventional docking stations, no bespoke ports and connections are required, thus the same docking station can be used with different laptops, desktop, etc. The transmission link used (USB) is limited in bandwidth thus the pixels grabbed from the graphics card of the user's computer need to be compressed to fit into the channel capacity. Accordingly, universal docking resorts to image and video compression techniques whereby the encoding happens on the user's machine (so-called the host) whilst decoding is performed on the docking station (also denoted as the device).

A compression scheme for universal docking should be designed to meet the following (conflicting) requirements:

- Visually lossless decoded quality
- Low latency to guarantee a seamless and optimal quality of experience
- Low encoding complexity given the use of computational resources on the host by other processes and also battery life constraints

When at DisplayLink, I worked in the video codec research team with a leading role in developing coding tools which met the aforementioned requirements as well as enabled new potential applications.