ACM Multimedia 2019 TPC Meeting and Workshop

June 16 - 17, 2019, Amherst, Massachusetts, USA

TPC Workshop: Multimedia Analytics Meets Systems

The TPC meeting for ACM Multimedia 2019 is a rare occasion where multimedia researchers gather together. This year's TPC meeting is held in conjunction with the ACM MMSys conference and provide an opportunity for further interaction between the general multimedia researchers and the multimedia systems research.

We are organizing an invitation-only workshop after the TPC meeting, on the afternoon of June 17, 2019 (1-5pm) with the theme of "Multimedia Analytics meets Systems". The workshop aims to foster discussion and closer interactions between the multimedia researchers and systems researchers, exploring the questions:

  • what are some system performance challenges that multimedia researchers are facing
  • how systems research can help improve the performance of multimedia analytics
  • how advances in multimedia understanding and engagement could lead to a better understanding or improved performance of a system

As an outcome of this workshop, we hope to come up with a list of the most important problems to be solved at the interaction of multimedia systems and other multimedia tracks.

The workshop will have a mixture of short talks and discussion sessions.

Registration

If you are interested in attending this workshop, please indicate so in the Registration form. The deadline for registration is May 2, 2019.

Schedule

1300 - 1420 Session 1

    • Enhancing Video Experiences with Analytics, by Vishy Swaminathan, Adobe Research [Slides]
    • Towards Evolving Multimedia Summarization, by Joao Magalhaes, Universidade NOVA de Lisboa [Slides]
    • Improving Quality of Compressed Video Using GAN, by Lorenzo Seidenari, University of Florence [Slides]
    • Streaming on Steroids: Are We There Yet? by Ali C. Begen, Ozyegin University / Networked Media [Slides]

1420 - 1440 Break

1440 - 1600 Session 2

    • Inferring User’s Mental States during Unconstrained Mobile Learning Interactions, by Oya Celiktutan, King's Colleage London [Slides]
    • Detecting Panoramic Saliency for 360-degree Video Streaming, by Zhisheng Yan, Georgia State University [Slides]
    • Multimedia Analytics with 5G Edge Nodes, by Bhojan Anand, National University of Singapore [Slides]
    • Cervical Cancer Screening via Mobile Deep Learning, by Roger Zimmermann, National University of Singapore [Slides]

1600 - 1630 Break-Out Discussion

1630 - 1700 Summary

Abstracts

Enhancing Video Experiences with Analytics

Vishy Swaminathan, Adobe Research

The first generation of our video research at Adobe focused on simple insights from content while the second focused on insights from video consumption (behavioral) data. Now, with the explosion in compute, it is possible to derive insights simultaneously from both content and the behavioral data to close the feedback loop to improve the multimedia content experiences. I will start with a glimpse of the analytics available from user sessions for all types of videos including on-demand and live scenarios. I will elaborate on how this fine-grain consumption data can be combined with deep learning powered multimedia understanding to enhance multimedia experiences. Some sample demos of such applications will be shown, and systems aspects highlighted.

Detecting Panoramic Saliency for 360-degree Video Streaming

Zhisheng Yan, Georgia State University

Emerging 360-degree video streaming systems require significant bandwidth to stream the panoramic content. Fetching the spatial region that users are likely to view can potentially tackle the bandwidth challenge. To ensure fetching efficiency, head movement prediction becomes a key enabler. Unfortunately, most existing efforts have been made towards rate and viewport adaptation of 360-degree video streaming. In this talk, I steer to another dimension that explores head movement prediction to enhance streaming performance by detecting 360-degree saliency. I will first identify the problems of traditional saliency detection models for regular images/videos. I will then present a head movement prediction framework using the proposed panoramic saliency detection and demonstrate its measurable gain in prediction accuracy and streaming efficiency.

Improving Quality of Compressed Video Using GAN

Lorenzo Seidenari, University of Florence

Video streaming is ubiquitous and compression of streams is performed to reduce the bandwidth required for transmission, presenting a challenging trade-off between user experience and business sustainability for provider firms. In this talk I will discuss recent advancements we made in removing compression artifacts using Generative Adversarial Networks. I will discuss recent results published in our IEEE TMM paper: "Deep Universal Generative Adversarial Compression Artifact Removal", 2019. This will cover how to deal with compression artifacts using GANs without knowing coding parameters in advance and some caveats on the evaluation of results. Furthermore I will discuss some recent unpublished evaluation protocols we devised exploiting semantic tasks instead of signal based metrics. Finally, from a system point of view, I will discuss how the generator architecture can be modified to attain real-time performance on the user end.

Streaming on Steroids: Are We There Yet?

Ali C. Begen, Ozyegin University / Networked Media

We all strive to deliver the best viewer experience in streaming, but it is not as easy as it sounds. Large deployments of adaptive streaming over HTTP have shown that streaming still faces problems in many ways for both the content/service providers and consumers. In this talk, I will present some of the challenges regarding the content generation, delivery and consumption based on the distillation of my own work as well as collaboration with some big providers. I will also touch on the latest developments in the standards front, followed by a discussion of what open issues we should tackle with in this space.

Multimedia Analytics with 5G Edge Notes

Bhojan Anand, National University of Singapore

With the growing popularity of X-Reality (VR/AR/MR) generation of global media-data is expected to accelerate at much higher rate. Most of these data need real-time processing such as real-time facial features/emotion recognition to environmental context recognition/mapping beyond the capabilities of an MR glass. The demand for high communication bandwidth and ultra-low latency (<15ms motion to photon latency) will drive the community to exploit the capabilities of the telco edge nodes and push telcos to build more powerful edge nodes. The real-time analytics of media data with strict real-time constraints will be the ‘killer application’ of 5G networks and edge computing. In addition, we believe, multimedia analytics and 5G/edge computing will be the key to push for mass adoption of XR devices. In this talk, we will explore some suitable media analytics methods and architecture to run media analytics with the 5G edge nodes.

Towards Evolving Multimedia Summarization

Joao Magalhaes, Universidade NOVA de Lisboa

Understanding how the semantics of multimodal information changes over time is crucial to many information processing tasks. An information timeline, such as a medical case description or a social media stream, often contains textual and visual data depicting a sequence of highly correlated events. Humans naturally understand these timelines by relying on their working and episodic memory to freely associate visual and verbal concepts across time. The long-term vision presented in this talk aims to bring search engines closer to the way humans process information: it aims at developing methods to capture and search the semantics of information that spans across text, images and time. For this breakthrough to happen recent advances have laid the foundations in evolving information and time-aware retrieval models that significantly depart from the current state-of-the-art. Researchers have investigated pioneering retrieval models that captures multimodal temporal patterns in a semantic space. New methods relate multimodal events from multiple documents and enable the expression of user information needs with complex time constraints. The scientific advances brought by the area are of fundamental importance for a wide range of high-impact areas that deal with complex information and strong temporal requirements, such as the summarization of live events in large-scale social-media.

Inferring User’s Mental States during Unconstrained Mobile Learning Interactions

Oya Celiktutan, King's Colleage London

The use of mobile learning is becoming a commonplace educational system due to low cost and high portability. It enables a continuous access to the learning process - users can learn wherever and whenever they prefer. However, currently mobile learning systems do not take the user state sufficiently into consideration and thus suffer from low engagement and lack of personalisation. In this talk, I will present our recent work on how to automatically recognise user’s mental states such as engagement, boredom during an educational game using the front facing camera of mobile devices. Considering the widespread use of mobile devices and the potential of technology enhanced learning and e-health applications, our work shows implications for adapting system behaviours to user’s individual profiles and needs beyond just clicks.

Cervical Cancer Screening via Mobile Deep Learning

Roger Zimmermann, National University of Singapore

Many aspects of healthcare are undergoing rapid evolution and facing many challenges. Computer vision and image processing methods have progressed tremendously within the last few years. One of the reasons is the excellent performance that machine learning algorithms are achieving in many fields of image processing, especially through deep learning techniques. There exist various application areas where computer-based image classification and object detection methods are making meaningful contributions. Yet, these data-intensive methods encounter a unique set of challenges in the medical domain – which often suffer from a scarcity of large public datasets and still require reliable analysis with high precision. This talk will present some recent work in the area of image analytics for cervical cancer screening in the context of low resource settings (i.e., mobile devices). The work is in collaboration with Dr. Pamela Tan from Singapore’s KK Hospital and MobileODT, a medical device and software-enabled services company. In this joint project, our group’s work focuses on machine learning algorithms for the medical analysis of cervix images acquired via unconventional consumer imaging devices like smartphones, based on their appearance and for the purpose of screening cervical cancer precursor lesions.