HCVM 2025: The 2nd International Workshop on Human-Centered Vision and Media Technologies

*This image was created with the assistance of ChatGPT 4o.

Date

Friday, May 23, 2025, 13:50 - 18:10

Venue

Convention Hall, 2nd Floor, An Block, Institute of Industrial Science
The University of Tokyo (Komaba II Campus)

Registration

Participation in the workshop is free, but please register for the workshop using the form below.
If the number of participants exceeds the room capacity, we may close the registration.

https://forms.gle/HLmqgkAGff8T67AcA

Program

13:30 Venue Opens

13:50 - 14:00 Opening Session

14:00 - 14:30 Invited Talk: Wenzel Jakob

14:30 - 15:00 Invited Talk: Mohamed Khamis

15:00 - 16:00 Poster Session 1 & Coffee Break

16:00 - 16:30 Invited Talk: René Schuster

16:30 - 17:00 Invited Talk: Leslie Wöhler

17:00 - 18:00 Poster Session 2 & Coffee Break

18:00 - 18:10 Closing Session

Invited Speakers

Wenzel Jakob (École Polytechnique Fédérale de Lausanne [EPFL])

Title: Differentiable Simulation of Light

Abstract: Their realism of rendering techniques has steadily grown within the last decade, to the extent that renderings are now often indistinguishable from reality. *Inverse* rendering flips this process around: the images (e.g. photos) are now the input, and we seek seek a virtual world that explains them. This is a more difficult problem with applications in diverse scientific fields that require turning pictures into 3D models or other physical parameters. My group works on methods that solve this task by propagating derivatives through a simulation. Although intuitive, this idea leads to numerous theoretical and practical difficulties. I will give an overview of the key challenges and recent progress towards building robust and efficient differentiable rendering methods.

Bio: Wenzel's research revolves around inverse graphics, material appearance modeling and physically based rendering algorithms. He is interested in solving real-world problems using invertible simulations and developing algorithms and systems to do so at scale. Wenzel has received the ACM SIGGRAPH Significant Researcher award, the Eurographics Young Researcher Award, and an ERC Starting Grant. His group develops the Mitsuba renderer, a research-oriented rendering system, and he has created widely used open source frameworks, including pybind11, nanobind, Instant Meshes (SGP Software Award recipient), and Dr.Jit.

Mohamed Khamis (University of Glasgow)

Title: Security and Privacy in the age of Ubiquitous Computing

Abstract: Today, a thermal camera can be bought for < £150 and used to track the heat traces your fingers produced when entering your password on your keyboard. We recently found that thermal imaging can reveal 100% of PINs entered on smartphones up to 30 seconds after they have been entered. Other ubiquitous sensors are continuously becoming more powerful and affordable. They can now be maliciously exploited even by average non-tech-savvy users. The ubiquity of smartphones can itself be a threat to privacy; with personal data being accessible essentially everywhere, sensitive information can easily become subject to prying eyes. There is a significant increase in the number of novel platforms in which users need to perform secure transactions (e.g., payments in VR stores), yet we still use technologies from the 1960s to secure access to them. Mohamed will talk about the implications of these developments and his work in this area with a focus on the challenges, opportunities, and directions for future work.

Bio: Dr. Mohamed Khamis is an Associate Professor at the University of Glasgow’s School of Computing Science, where he leads the SIRIUS Lab. Mohamed and his team conduct research in Human-Computer Interaction, with a focus on Human-centered Security. His work focuses on assessing threats to privacy, security and safety that are caused or exacerbated by ubiquitous technologies, and then developing user-centered systems that mitigate said threats. He regularly publishes in top HCI and Security venues, such as CHI, TOCHI, IMWUT, CSCW, USENIX Security, IEEE S&P and Privacy-Enhancing Technologies. As a PI, his research has received PI on more than £1 million in funding from various bodies including UKRI, the UK National Cybersecurity Centre, the UK Foreign, Commonwealth and Development Office, Meta Reality Labs, the UK National Research Centre on Privacy, Harm Reduction and Adversarial Influence Online, the UK National Research Centre of Excellence for IoT Systems, and the Royal Society of Edinburgh. He regularly serves on editorial boards, funding panels, and program and organizing committees for top conferences.

René Schuster (German Research Center for Artificial Intelligence [DFKI])

Title: Visual Continual Learning - Beyond Current Incremental Setups

Abstract: Continual learning is a cornerstone for robust and adaptive visual perception in real-world systems. While existing research often focuses on incremental setups - where new classes or domains are introduced sequentially - these settings fall short of covering a range of crucial aspects for life-long learning. In this talk, we first provide a general introduction and motivation for continual learning and then explore visual continual learning from a broader and more practical perspective. In particular, a specialized variant of domain-incremental learning and a generalized version of class-incremental learning will be presented. The latter allows a model to not just add new information, but also to refine existing knowledge. The talk will be concluded by our thoughts on how continual learning will pave the way for AGI (artificial general intelligence).

Bio: Dr. René Schuster is a Senior Researcher at the German Research Center for Artificial Intelligence (DFKI). He leads the Automotive Scene Understanding team within the Augmented Vision department, where his work focuses on visual perception for assisted and autonomous driving. His expertise includes resource-efficient learning through neuromorphic systems and Continual Learning. Dr. Schuster holds a Ph.D. in Computer Science from the Technical University of Kaiserslautern. Prior to that, he earned his M.Sc. in Computational Engineering from Technical University of Darmstadt. Dr. Schuster is lecturer of various seminars and courses, including advanced topics in computer vision and deep learning. He also serves as an active reviewer for leading conferences in computer vision, robotics, and automotive technologies.

Leslie Wöhler (Science & Technology Research Laboratories, Japan Broadcasting Corporation [NHK STRL])

Title: Analyzing the Perception of Generative AI-based Media Editing Across Display Modalities

Abstract: Progress in the domains of generative AI and immersive viewing technology is changing the way we create, interact, and experience media content. As both immersive viewing and generative AI allow highly realistic representations of content, it might become harder for viewers to differentiate between the real and virtual world as well as real and generated media. Due to this, it is essential to not only learn about both technologies individually but also how generative content and immersive viewing interact. In this talk, I will present the results of perceptual experiments that provide insights on how viewers perceive generative AI content and discuss how the display modality affects the responses of viewers to identify promising and safe application scenarios.

Bio: Leslie Wöhler is a postdoctoral researcher at the Science & Technology Research Laboratories of the Japan Broadcasting Corporation (NHK). Her expertise lies in the perceptual assessment of computer graphics and vision technologies to enable the design of novel techniques focusing on the benefit of users. She obtained her PhD (Dr.-Ing.) from the TU Braunschweig. After graduating, she was awarded a postdoctoral fellowship by the Japan Society for the Promotion of Science (JSPS), which allowed her to conduct research at the University of Tokyo. Her research interests include human computer interaction, applied perception, and virtual reality.

Poster Presentations

Poster Session 1

Poster ID: Presenter (Affiliation), Title

P1-1: Zaiying Zhao, (The University of Tokyo), Exploring Fairness across Fine-Grained Attributes in Large Vision-Language Models
P1-2: Tomoya Sugihara, (The University of Tokyo), Personalizable Language-Guided Video Summarization
P1-3: Kaede Shiohara, (The University of Tokyo), PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
P1-4: Nicolas Michel, (The University of Tokyo), Online Prototypes and Class-Wise Hypergradients for Online Continual Learning with Pre-Trained Models
P1-5: Ziling Huang, (National Institute of Informatics), LoA-Trans: Enhancing Visual Grounding by Location-Aware Transformers
P1-6: Zhaohui Zhu, (The University of Tokyo), Computer-Assisted Noise Pareidolia Tests Through Patient Emulation
P1-7: Xiangyu Chen, (The University of Tokyo), Balancing Efficiency and Accuracy: An Analysis of Sampling for Video Copy Detection
P1-8: Yidan Zhang, (The University of Tokyo), IRGen: Generative Modeling for Image Retrieval
P1-9: Zhifan Zhu, (University of Bristol), Recovering 3D Hand and Object Interactions in Egocentric Activities
P1-10: Liangyang Ouyang (The University of Tokyo), ActionVOS: Actions as Prompts for Video Object Segmentation

Poster Session 2

Poster ID: Presenter (Affiliation), Title

P2-1: Ryosuke Furuta, (The University of Tokyo), Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga
P2-2: Yifei Huang, (The University of Tokyo), Leveraging Egocentric VLMs for Wearable Smart Assistant
P2-3: Takumi Nishiyasu, (The University of Tokyo), Gaze Scanpath Transformer: Predicting Visual Search Target by Spatiotemporal Semantic Modeling of Gaze Scanpath
P2-4: Takehiko Ohkawa, (The University of Tokyo), Exo2EgoDVC: Dense Video Captioning of Egocentric Human Activities Using Web Instructional Videos
P2-5: Mingfang Zhang, (The University of Tokyo), Egocentric Action-aware Inertial Localization in Point Clouds
P2-6: Ruicong Liu, (The University of Tokyo), Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
P2-7: Nie Lin, (The University of Tokyo), SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
P2-8: Jiawei Qin, (The University of Tokyo), Domain-Agnostic Gaze Estimation via Masked Autoencoder Pre-training on Facial Data
P2-9: Yilin Wen, (The University of Tokyo), Continuous Self-Supervised Adaptation for Personalized Human Pose Estimation
P2-10: Zhehao Zhu, (The University of Tokyo), Prompt-augmented Boundary Attentive Learning for Weakly-supervised Temporal Sentence Grounding

Organizers

Yoichi Sato (The University of Tokyo)
Shin'ichi Satoh (National Institute of Informatics)
Toshihiko Yamasaki (The University of Tokyo)
Yusuke Sugano (The University of Tokyo)
Ryosuke Furuta (The University of Tokyo)

Report abuse