AVGenL: Audio-Visual Generation and Learning

ECCV 2024 Workshop

Sep. 29 2024, Milano, Italy

Image is generated by ©DALL·E


In recent years, we have witnessed significant advancements in the field of visual generation which have molded the research landscape presented in computer vision conferences such as ECCV, ICCV, and CVPR. However, in a world where information is conveyed through a rich tapestry of sensory experiences, the fusion of audio and visual modalities has become much more essential for understanding and replicating the intricacies of human perception and diverse real-world applications. Indeed, the integration of audio and visual information has emerged as a critical area of research in computer vision and machine learning, having numerous applications across various domains, from immersive gaming environments to lifelike simulations for medical training, such as multimedia analysis, virtual reality, advertisement and cinematic application. 

Despite these strong motivations, little attention has been given to research focusing on understanding and generating audio-visual modalities compared to traditional, vision-only approaches and applications. Given the recent prominence of multi-modal foundation models, embracing the fusion of audio and visual data is expected to further advance current research efforts and practical applications within the computer vision community, which makes this workshop an encouraging addition to ECCV that will catalyze advancements in this burgeoning field.

In this workshop, we aim to shine a spotlight on this exciting yet under-investigated field by prioritizing new approaches in audio-visual generation, as well as covering a wide range of topics related to audio-visual learning, where the convergence of auditory and visual signals unlocks a plethora of opportunities for advancing creativity, understanding, and also machine perception. We hope our workshop can bring together researchers, practitioners, and enthusiasts from diverse disciplines in both academia and industry to delve into the latest developments, challenges, and breakthroughs in audio-visual generation and learning.

Call for Papers

The workshop will mainly cover the topics presented below.

We invite three types of submissions: workshop papers, extended abstracts (up to 4 pages excluding reference), and papers accepted at ECCV 2024. 


Paper Submission Deadline July 28, 2024 (TBD)

Paper Notification to Authors TBD

Paper Camera Ready Deadline TBD

Invited Speakers

Imperial College London 

Korea University 

University of Rochester

Beihang University 

University of Michigan

UMass Amherst

Program Details

(timezone: Central European Time)

14:00 - 14:10 Opening remarks, welcome

14:10 - 14:40 Invited talk 1: TBD.

14:40 - 15:10 Invited talk 2: TBD.

15:10 - 15:40 Invited talk 3: TBD.

15:40 - 16:20 Poster Session and Coffee Break.

16:20 - 16:50 Invited talk 4: TBD.

16:50 - 17:20 Invited talk 5: TBD.

17:20 - 17:50 Invited talk 6: TBD.

17:50 - 18:00 Closing remarks


University of Tokyo/RIISE

Ludwig Maximilian University of Munich

Imperial College London

Nankai University