PixFoundation CVPR2025

1st Workshop on Pixel-level Vision Foundation Models

12 June 2025

Location: Room 101 E, Music City Center, Nashville, TN

Posters: #14-24 in ExHall D

In recent years, foundation models have gained significant traction and success, particularly in natural language processing, exemplified by the GPT series. These models are large-scale, trained on diverse datasets, primarily through self-supervised learning or vision language modelling. Such foundation models were shown to effectively adapt across various downstream tasks, with strong generalization capabilities, especially in zero-shot and few-shot scenarios. However, while language foundation models are well-established, their counterparts in the vision domain and their adoption in various tasks are still in the early-mid stages of development. Despite this, there is growing interest and progress in vision foundation models (VFM). Some of the latest models include those trained using self supervision, such as the DINO series, and those utilizing image/text like CLIP, Flamingo, and Llava. Various pixel-level vision foundation models have also emerged recently such as OMG-LLava or SAM series. Our workshop aims to bring together researchers dedicated to developing and adapting vision foundation models for pixel-level understanding tasks, including image segmentation, video segmentation, tracking, actor-action segmentation, depth estimation, and motion estimation. We will explore major directions in pixel-level understanding with vision foundation models and discuss the opportunities they present, particularly in low-resource settings that could have a positive societal impact. This is especially apparent in marginalized communities that lack access to large-scale labeled datasets tailored to their needs. Additionally, we will discuss the risks associated with these models and explore methods to mitigate them. The workshop features 7 invited talks, mixing emerging and established researchers, along with two poster sessions and selective spotlight presentations. We encourage submissions related to any research or application of pixel-level understanding with vision foundation models.

Invited Speakers

Adam Harley

Antonio Torralba

Massachusetts Institute of Technology

Cordelia Schmid

Google & Inria

Fatma Güney

Koc University

Hengshuang Zhao

University of Hong Kong

Ishan Misra

Xueyan Zou

University of California San Diego

News and Updates:

June 12, 2025: Our Best paper is announced as "DepthCrafter". Congrats to the team.
June 6, 2025: Poster boards are assigned #14-24 in ExHall D. Tag us on Twitter to promote your work in PixFoundation.
June 5, 2025: CVPR Site for the Workshop is out. Virtual attendees can use the zoom link provided. Please refer to our website for the final workshop schedule.
Apr 14, 2025: Camera ready due.
Apr 1, 2025: Notifications have been released to authors.
Feb 28, 2025: We have extended the submission deadline to March 9th 11:59 pm PST.
Jan 06, 2025: Paper submission is open at https://cmt3.research.microsoft.com/PixFoundation2025

Participation:

We encourage submissions that are under one of the topics of interest, but also we welcome other interesting and relevant research for pixel-level understanding with vision foundation models.

Vision foundation models in pixel-level image and video understanding tasks, including: pixel-level grounding and reasoning, image segmentation, referring segmentation and its video counterpart, video segmentation, tracking, actor-action segmentation, depth estimation, motion estimation, etc.
Adaptation, generalization, and prompting of vision foundation models.
Interpretatibility and benchmarking of vision foundation models and their training data.
Real-world applications with focus on the societal impact of vision foundation models

Papers will be peer-reviewed under a double-blind policy and the submission deadline is the 4th of March 2025. Accepted papers will be presented at the poster session, some as orals, and one paper will be awarded as the best paper.

Best Paper Award:

1000$.

(Only students/researchers are eligible to participate in the best paper award. Government officials, public sector officials, and employees of entities who do business in the Public Sector are not eligible to participate.)

Contact:

For questions you can contact us at: pixfoundationcvpr@gmail.com or pixfoundation_chairs@googlegroups.com

Sponsors & Partners: