Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models
(VLMs-4-All)
CVPR 2025 workshop (June 12, 2025)
(VLMs-4-All)
CVPR 2025 workshop (June 12, 2025)
The winners of the CulturalVQA and GlobalRG challenges have been announced! Visit the Challenges page to see the full results and details.
The remarkable advancements in vision-language models (VLMs) such as visual question answering, image captioning, visual grounding, text-to-image retrieval and text-to-image generation, have unlocked vast potential for AI applications, especially in global contexts where people from diverse backgrounds engage with these systems. As VLMs increasingly drive decision-making across critical sectors like education, healthcare, and public services, ensuring that they are inclusive of a broad spectrum of cultural values and perspectives is no longer optional—it’s imperative! Failure to account for these cultural nuances risks alienating underrepresented communities and perpetuating biases, while success in this area can foster greater trust, fairness, and accessibility in AI systems.
However, while VLMs continue to evolve, the challenge of integrating diverse cultural insights into these models remains underexplored, requiring a focused, interdisciplinary effort. To advance this critical objective, we propose the Vision Language Models For All workshop. The workshop aims to achieve two primary goals. First, it seeks to unite researchers from computer vision, natural language processing, and cultural anthropology to collaboratively explore how we can develop geo-diverse and culturally aware vision-language models and AI systems more broadly. Key focus areas include: identifying effective evaluation tasks, benchmarks, and metrics to assess cultural alignment in these models; and new methodology for improving cultural awareness of AI systems. To facilitate this exchange of ideas, the workshop will feature invited talks and panel discussions with leading researchers in the field. In addition, we will invite short paper submissions of up to 4 pages on these topics. All accepted papers will be presented in a poster session to foster discussion and disseminate novel ideas within the community.
The second goal of the workshop is to benchmark progress in the geo-diverse and cultural understanding of vision-language models by hosting challenges on the recently developed CulturalVQA and GlobalRG benchmarks, where the former focuses on evaluating cultural understanding of VLMs, while the latter focuses on evaluating cultural diversity in VLMs’ outputs. The results of the challenges and winning entries will be presented at the workshop to highlight the latest advancements in cultural understanding. These challenges are designed to drive innovation in vision-language model development and inspire further research into culturally aware AI. By fostering collaboration and healthy competition, we aim to catalyze the creation of AI systems with truly global cultural awareness.