Radu Tudor Ionescu
University of Bucharest (Romania) and co-founder at SecurifAI, Romania
Text-to-Image Generation via Diffusion: What Synthetic Images Can Tell Us?
Abstract. Diffusion models have achieved impressive performance in text-to-image generation, being able to generate realistic and prompt-aligned images. In this talk, we dive into an analysis of synthetic images generated by text-to-image diffusion models, investigating them from two perspectives. First, we aim to determine how well we can recover the input prompt used to generate an image. To achieve this, we present a framework that learns to recover the prompt from generated images. In addition, we present an interesting discovery: training a diffusion model on the prompt recovery task can make the model generate images that are much better aligned with the input prompts, when the recovery model is directly reused for text-to-image generation. Second, we present a comprehensive dataset containing human annotations reflecting how relevant is a synthetic image for the input prompt. We employ this dataset to train prompt performance predictors and evaluate them in multiple settings (cross-model, cross-dataset, etc.). We find that synthetic images contain strong clues revealing both the content and the difficulty of the input prompt.
Bio. Radu Tudor Ionescu is a professor at the University of Bucharest (Romania) and co-founder at SecurifAI. He completed his PhD at the University of Bucharest in 2013. He received the 2014 Award for Outstanding Doctoral Research in the field of Computer Science from the Romanian Ad Astra Association. His research interests include machine learning, computer vision, image processing, medical imaging, computational linguistics and text mining. He published over 140 articles at international peer-reviewed conferences and journals, and a research monograph with Springer. He received the "Caianiello Best Young Paper Award" at ICIAP 2013 for the paper entitled "Kernels for Visual Words Histograms". Radu also received the "Young Researchers in Science and Engineering" Prize for young Romanian researchers, and the "Danubius Young Scientist Award 2018 for Romania" by the Austrian Federal Ministry of Education, Science and Research and by the Institute for the Danube Region and Central Europe. Together with other co-authors, he obtained good rankings at several international competitions: 4th place in the Facial Expression Recognition Challenge of WREPL 2013, 3rd place in the NLI Shared Task of BEA-8 2013, 2nd place in the ADI Shared Task of VarDial 2016, 1st place in the ADI Shared Task of VarDial 2017, 1st place in the NLI Shared Task of BEA-12 2017, 1st place in the ADI Shared Task of VarDial 2018, 1st place in the ACM Multimedia 2023 Computational Paralinguistics Challenge (ComParE) on request and complaint detection.
Maty Bohacek
Stanford University, USA
Detecting AI-Generated and Fabricated Content in the Era of Model Collapse
Abstract. This talk delves into the rapid evolution of generative AI through the lens of deepfake creation and detection. Highlighting a case study of creating a photorealistic deepfake of CNN's Anderson Cooper in 2023, the speaker tracks how advancements in open-source tools reduced the time needed for generating similar deepfakes from four weeks to just minutes. The discussion emphasizes the importance of shifting detection strategies from reactionary approaches to proactive methods rooted in the generative process itself. The speaker will present two detection techniques developed in response to deepfake advancements: one that learns a specific person's mannerisms and another that exploits vulnerabilities in lip-sync architectures. They will also explore how generative and detection research can mutually benefit each other, focusing on the alignment of incentives, challenges with retraining models on their outputs, and the critical need to differentiate real from AI-generated content for robust detection.
Bio. Maty Bohacek is a student researcher at Stanford University advised by Hany Farid, focusing on AI, computer vision, and digital forensics. In 2022, he developed a deepfake detection method based on facial, gestural, and vocal mannerisms, demonstrating its efficacy on Ukrainian President Volodymyr Zelenskyy. Since then, he has created multiple methods to detect lip-sync and text-to-video deepfakes (e.g., Sora, Veo) and co-developed DeepSpeak, an open-source audio and video deepfakes dataset. While at Google, he created DeepAction, an open-source dataset for text-to-video detection. Maty’s recent work revealed that generative models collapse when trained on their own generations. He also introduced a black-box membership inference method for text-to-image models that enables one to tell if their image was used to train such a model.