Abstract: In recent years, a number of AI-enhanced diet recording and management applications have been released worldwide and are now in widespread use by the general public. Our team has developed "FoodLog Athl", a dietary management app that enables users to input images of their meals and receive estimated nutritional information by identifying the dishes and inferring their ingredients. A distinctive feature of FoodLog Athl is its focus on supporting Japanese users and facilitating their interactions with human dietitians. In this talk, I will introduce the application's design within the context of Japanese food culture and the data we have collected through real-world use. Unlike food photos commonly found online, which are often intended for sharing, recipes or restaurant menus, images collected through diet management apps like FoodLog Athl are personal records not intended for others to see. As such, they offer a very different data profile that is more representative of everyday meals.
Dr. Yoko Yamakata received her Ph.D. in Informatics from Kyoto University in 2007. From 2010 to 2016, she served as a Lecturer and later an Associate Professor at Kyoto University. In 2015, she became a JSPS Research Fellow and spent time as a Visiting Researcher at the University of Sussex in the UK, accompanied by two children. In 2019, she joined the University of Tokyo as an Associate Professor in the Graduate School of Information Science and Technology, and in 2024, became a Professor at the Information Technology Center of the University of Tokyo. Her research interests focus on multimedia information processing, particularly deep learning techniques for text and image analysis, with a strong interest in AI technologies that support food-related applications. She has served as Organizing Chair for CEA2009-2014, Program Chair for CEA2015-2019 and Program Committee Member for MADiMa2016-2019.
Title: Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Abstract: When being asked “how to make apple juice”, can AI generate step-by-step instructions along with video frame illustration for each step? Exciting models have been developed in multimodal video understanding and generation, such as video LLM and video diffusion model. One emerging pathway to the ultimate intelligence is to create one single foundation model that can do both understanding and generation. After all, humans only use one brain to do both tasks. Towards such unification, recent attempts employ a base language model for multimodal understanding but require an additional pre-trained diffusion model for visual generation, which still remain as two separate components. In this work, we present Show-o, one single transformer that handles both multimodal understanding and generation. Unlike fully autoregressive models, Show-o is the first to unify autoregressive and discrete diffusion modeling, flexibly supporting a wide range of vision-language tasks including visual question-answering, text-to-image generation, text-guided inpainting/extrapolation, and mixed-modality generation of any input/output format, all within one single 1.3B transformer. Across various benchmarks, Show-o demonstrates comparable or superior performance, shedding light for building the next-generation foundation model.
Dr. Mike Zheng Shou is a tenure-track Assistant Professor at the National University of Singapore, working on multimodal and video. His recent Show-o work was among the first single foundation models that can do both understanding and generation and unite both autoregressive and diffusion modelling. He was a Research Scientist at Facebook AI in the Bay Area. He obtained his Ph.D. degree at Columbia University. He received the Best Paper Finalist at CVPR 2022, Best Student Paper Nomination at CVPR 2017, PREMIA Best Paper Award 2023, EgoVis Distinguished Paper Award 2022/23. His team won 1st place in the international challenges including ActivityNet 2017, EPIC-Kitchens 2022, Ego4D 2022 & 2023. He is a Singapore Technologies Engineering Distinguished Professor and a Fellow of the National Research Foundation Singapore. He is on the Forbes 30 Under 30 Asia list. Dr. Shou focuses on developing new deep learning methods to allow machines to understand actions and complex events in videos which can power several perception systems applications.
Abstract: We present an AI-powered system that integrates data from smartphones and wearable sensors to deliver real-time, personalized support for insulin-treated diabetes care. The system architecture combines food image recognition, reinforcement learning-based algorithms, and multi-source data—including lifestyle, clinical, and treatment inputs—within a modular AI framework. Specialized AI modules perform short-term risk prediction, adaptive insulin dosing, and context-aware nutritional guidance. This solution is the result of over a decade of interdisciplinary research and is currently being validated in a multi-country European clinical trial, aimed at bridging state-of-the-art AI methods with clinical expertise and the lived experience of people with diabetes.
Dr. Stavroula-Georgia G. Mougiakakou is an Associate Professor with a Ph.D. in Electrical and Computer Engineering from the National Technical University of Athens, Greece. Since 2008, she is affiliated with the Faculty of Medicine at the University of Bern, where she leads the AI in Health and Nutrition Laboratory at the ARTORG Center for Biomedical Engineering Research, Bern, Switzerland. Her research focuses on the design, development and validation of artificial intelligence and machine learning approaches for the analysis of multimodal data, and their applications in the prevention, personalized diagnosis, prognosis, and treatment of acute and chronic diseases, including obesity, diabetes, and lung diseases. In addition, her work focuses on the entire AI-based pipeline for translating food data into nutrient content, including food detection and segmentation, recognition, and volume estimation. This approach leverages various smartphone sensors and utilizes extensive food-related databases, from nutrient composition to recipes, alongside the recent advancements in large language models. The research is applied in contexts such as tackling malnutrition in hospitals, dietary monitoring and disease self-management, as well as providing recommendations and evaluating diet compliance, such as adherence to the Mediterranean diet. Stavroula is also a member of the Executive Team of the Center for Artificial Intelligence in Medicine (CAIM) at the University of Bern and serves as Co-Director of the MSc program in AI in Medicine. So far, she has supervised several MS students, 19 PhD students (12 finished, 7 ongoing), and 8 postdocs (2 ongoing). Over the years, she and her research group have successfully participated in numerous competitive national, European, and international research projects, resulting in numerous publications, software, several patents, and technology transfer activities.