The First IEEE Workshop on Delay-Sensitive Video Computing in the Cloud (DSVCC)

as part of CloudCom 2015 (, Vancouver, Canada, Nov 30 - Dec 3, 2015.

Motivation and Goal

Video applications are now among the most widely used and a daily fact of life for the great majority of Internet users. In 2013, video data accounted for 78% of all Internet traffic in the USA and 66% of all Internet traffic in the world, which by 2018 will grow to 84% in the USA and 79% in the world. While presentational video services such as those provided by YouTube and NetFlix dominate this video data, conversational video services such as video conferencing, video gaming, telepresence, tele-learning, collaborative shared environments, and screencasting (a.k.a screen sharing, or remote desktop) also have significant usage. At the same time, with the advent of both mobile networks and cloud computing, we are seeing a paradigm shift, where the computationally-intensive components of these conversational video services are moving to the cloud, and the end user’s mobile device is used as an interface to access the service. By doing so, even mobile devices without high-end graphical and computational capabilities can access a high fidelity application with high end graphics, because all the processing and rendering is done in the cloud, and the result is sent to the user via video, which any mobile device today can display. A practical example of this is Cloud Gaming, where the game events are processed and the game scene in rendered the cloud, with the resulting video streamed to the players.

What distinguishes these cloud-based conversational video systems from other video systems is the fact that they are highly delay sensitive. While buffering and interruptions of even a few seconds are tolerated by users in presentational video applications, conversational video applications require a much tighter end-to-end delay, usually in the range of 150 to 250 milliseconds. Otherwise the conversational applications will “fail” since it is not responding to user interactions fast enough. There has been many recent proposals for cloud-based encoding of video, with the great majority focusing on video-on-demand applications, and mostly using the well-known Hadoop and Map/Reduce technologies to break the video into multiple chunks, encoding or transcoding each chunk on a worker node, resulting in parallel and therefore faster encoding of the video as a whole. But this will not work in a “live” video scenario, since we do not have sufficient time for such operations: the video must be processed live as it’s coming in and delivered to the user without violating delay thresholds. The fact that the cloud is used as a central node potentially adds a bottleneck and possibly further delays. Delay-sensitive processing and rendering of video in the cloud has therefore become an emerging area of interest.