About
Fig. 1: This diagram summarizes the scope of communication and collaboration around data.
Fig. 2: The area enclosed by the black contour indicates the thematic overlap between prior events.
Scope
Throughout the last two decades, we have seen an increasing demand for remote synchronous communication and collaboration solutions across professional and educational settings. However, it was the COVID-19 pandemic and the urgent shift to remote and hybrid work and education that fundamentally altered how people communicate, collaborate, and teach at a distance. Across government and enterprise organizations, people need to achieve consensus and make decisions grounded in data, and these activities now often take place via teleconference applications and collaborative productivity tools. Meanwhile, educators at all levels have had to similarly adapt to teaching remotely, and this adaptation has presented challenges and opportunities to innovate with respect to teaching STEM (science, technology, engineering, mathematics) subjects and those that depend on numeracy, abstract representations of data, and spatial reasoning. Popular teleconference tools such as Zoom, Cisco Webex Meetings, Slack Huddles, Google Meet, and others typically afford multimodal communication including multi-party video and audio conferencing, screen sharing, breakout rooms, polls, reactions, and side-channel text chat functionality. Often these tools are used in conjunction with collaborative productivity tools, and collaboration platforms organized by channels and threads, such as Slack or Microsoft Teams, forming synchronous episodes within a larger timeline of asynchronous communication.
Despite the multimodal nature of these meetings and presentations, the presenter and audience experiences are often poor substitute for co-located communication, particularly when a speaker presents complex or dynamic multimedia content such as data visualization via screen-sharing (see Brehmer & Kosara, VIS 2021). When screen-sharing, the presenter is often relegated to a secondary thumbnail video frame, and only they can interact with the shared content using mouse and keyboard controls. In contrast, consider co-located communication scenarios such as those in meeting rooms or lecture halls, where all participants can use their physical presence and body language to interact with and point to the multimedia content being discussed. In particular, embodied cognition research suggests that nonverbal hand gestures are essential for comprehending complex or abstract content, such in mathematics education, in engineering and design, and in business decision-making.
Some telecommunication tools have recently introduced ways to restore the missing embodied presence of a presenter as they share multimedia content such as slides, data visualization, diagrams, and interactive interfaces. For instance, Cisco Webex Meetings and Microsoft Teams offer functionality to segment the presenter’s outline from their webcam video and composite them in front of screen-shared content. Virtual camera applications have also become popular, including mmhmm and OBS Studio; these tools allow for considerable flexibility with respect to video compositing, and are compatible with most teleconferencing applications. However, only a single presenter can interact with shared content, and they must do so using standard mouse and keyboard interfaces.
Meanwhile, advancements in extended reality (XR) and immersive analytics suggest multimodal approaches that bypass standard desktop environments. For instance, Flow Immersive is an application that allows people to present complex 3D data visualization content in mobile augmented reality (AR) or within an immersive virtual reality (VR) environment. More broadly, VR meeting spaces are an emerging trend in enterprise settings, in which all participants join a meeting as an avatar in an immersive 3D conference room. While the potential of XR for remote multimodal communication around data is promising, it also exhibits several limitations. The first issue is the lack of general access to affordable and comfortable hardware devices, including depth sensors and head-mounted displays; moreover, many XR applications require multi-device coordination with hand-held pointing devices or simultaneous touch-screen interaction. A second issue is a relatively higher amount of fatigue induced by XR applications incorporating head-mounted displays. A third issue is the difficulty of maintaining side-channel chat conversations in an XR environment. Lastly, sharing and interacting with complex and dynamic multimedia content in XR remains to be tedious and error-prone.
Recently, an exciting alternative approach to remotely presenting rich multimedia content with remote audiences has emerged: the combination of publicly-available computer vision and speech recognition models with commodity webcam and microphones has the potential to bring the immersive experience of XR to remote communication experiences without abandoning a familiar desktop environment. This combination allows for real-time video compositing and background segmentation, pose and gesture recognition, and voice commands, thereby giving people multiple modalities with which to interact with representations of data. We have already seen applications in this space for presenting business intelligence content in enterprise scenarios (see Hall et al, UIST 2022), for presenting STEM topics in online education and personalized product marketing (see Liao et al, UIST 2022), and for interacting with large cultural collections (see Rodighiero et al, Information Design Journal 2023), such as those in gallery and museum archives.
Goals and Research Questions
The MERCADO meetings will focus on emerging technologies and research that can be applied to multimodal interaction for remote communication and collaboration around data. They will also serve as a forum to identify new application scenarios and expand upon existing ones, and as a forum to test new interaction techniques applicable across these scenarios. We anticipate that the results of this meeting series may include contributions to several academic communities including those affiliated with the IEEE VGTC (VIS, PacificVis, ISMAR, Eurovis) and ACM SIGCHI (CHI, CSCW, UIST, ISS).
We will initiate discussion around the following topics and research questions:
MULTIMODAL REMOTE COLLABORATION AROUND DATA. Considering recent work exploring the design space of synchronous remote collaboration around data via a shared WIMP interface, how can these techniques be extended to incorporate additional modalities?
ADAPTING CO-LOCATED COLLABORATION APPROACHES TO REMOTE / HYBRID SETTINGS. Considering techniques for co-located collaborative work around data, whether using conventional desktop workstations or immersive augmented reality head-mounted displays, how can these techniques be expanded to support hybrid or remote collaboration and communication?
COMMUNICATION AROUND DATA TO LARGE BROADCAST AUDIENCES. Considering presentation techniques employed by television news broadcasters for presenting technical or data-rich stories (e.g., weather, finance, sports), how can these techniques be applied (while keeping production costs low) and expanded to support multi-party multimodal interaction around data at a distance?
LIVESTREAMING ABOUT DATA. Considering presentation and video compositing techniques employed by livestreamers (e.g., Twitch, YouTube, Facebook Live) and recorded video content creators (e.g., YouTube, Tik-Tok), how can these techniques be applied during synchronous communication and collaboration around data, as well as in conjunction with multimodal interaction (e.g., pose / gesture input, voice prompts, proxemic interaction)?
ADAPTING COLLABORATIVE XR EXPERIENCES. Considering the techniques by which individuals display and interact with representations of data in XR (AR / VR), how can we extend or adapt these techniques? In other words, how can techniques initially designed with expensive or exclusive hardware be adapted to low-cost, accessible commodity input and output devices? Similarly, how could augmented video techniques designed for use with depth sensors and pointing devices be similarly adapted?
SUPPORTING DIFFERENT COMMUNICATION ROLES. Currently, most teleconference applications assume a single speaker / presenter, with other participants in audience roles. How can we support multi-party augmented video interaction? Alternatively, how can we support both ‘sage on the stage’ and ‘guide on the side’ style communication, differentiating an orator from a discussion facilitator. Similarly, how can we support formal, linear, and scripted presentations as well as informal, unscripted, interruption-prone, and collaborative discussions?
DEFINING A DESIGN SPACE. Overall, what are the dimensions of the design space for multimodal and synchronous communication and collaboration around data? Where does existing work fit within this design space and which parts remain underexplored?
Ultimately, our aim is to identify a timely and urgent research agenda that the community can pursue. We aim to report the emerging research directions that we will identify in a top quality outlet research venue such as IEEE TVCG or ACM CHI.
Related Workshops / Seminars
Immersive Analytics: A new multidisciplinary initiative to explore future interaction technologies for data analytics (Shonan, 2016)
Immersive Analytics (Dagstuhl, 2016)
Data-Driven Storytelling (Dagstuhl, 2016)
Visualization for Communication (VisComm, IEEE VIS 2018 – 2023)
Immersive Analytics (IEEE VIS 2017, ACM CHI 2019, 2020, 2022)
Death of the Desktop – Envisioning Visualization without Desktop Computing (IEEE VIS 2014)
Emerging Telepresence Technologies in Hybrid Learning Environments (ACM CHI 2022)
Social VR (CHI 2021, CHI 2020)