CultureShift: Mapping Temporal Cultural Evolution in Vision-Language Models
Gautam Jajoo, Harsh Deshpande, Hamna, Pranjal A Chitale
Nayana: A Foundation for Document-Centric Vision-Language Models via Multi-Task, Multimodal, and Multilingual Data Synthesis
Adithya S Kolavi, Samarth P, Vyoman Jain
Cultural Awareness in Vision-Language Models: A Cross-Country Exploration
Avinash Madasu, Vasudev Lal, Phillip Howard
Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models
Srishti Yadav, Zhi Zhang, Daniel Hershcovich, Ekaterina Shutova
GeoDiv: Measuring Concept Diversity of Images Across Geographical Regions
Abhipsa Basu, Mohana Singh, Venkatesh Babu Radhakrishnan
CONCAP: Seeing Beyond English with Retrieval-Augmented Captioning
George Ibrahim, Rita Ramos, Yova Kementchedjhieva
JEEM: Vision-Language Understanding in Four Arabic Dialects
Karima Kadaoui, Hanin atwany, Hamdan Al-Ali, Abdelrahman Mohamed, Ali Mekky, Sergei Tilga, Natalia Fedorova, Ekaterina Artemova, Hanan Aldarmaki, Yova Kementchedjhieva
Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages
Shaharukh Khan, Ali Faraz, Abhinav Ravi, Mohd Nauman, Mohd Sarfraz, Akshat Patidar, Raja Kolla, Chandra Khatri, Shubham Agarwal
Why do LLaVA Vision-Language Models Reply to Images in English?
Musashi Hinck, Carolin Holtermann, Matthew Lyle Olson, Florian Schneider, Sungduk Yu, Anahita Bhiwandiwalla, Anne Lauscher, Shao-Yen Tseng, Vasudev Lal
Synthetic Document Question Answering in Hungarian
Jonathan Lingjie Li, Zoltan Csaki, Nidhi Hiremath, Etash Kumar Guha, Fenglu Hong, King Chun Ma, Urmish Thakker
Enhancing Vision-Language Models for Global Cultural Understanding through Semantic Expansion and Diversity Reranking
Zirui Hou, Xiangzhe Yin, Guangyu Gao
Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation
Victor Tolulope Olufemi, Oreoluwa Boluwatife Babatunde, Emmanuel Bolarinwa, Kausar Yetunde Moshood
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
Chirag Parikh, Deepti Rawat, Rakshitha R. T., Tathagata Ghosh, Ravi Kiran Sarvadevabhatla
Behind Maya: Building a Multilingual Vision Language Model
Nahid Alam, Karthik reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Cecilia Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth.S, Snehanshu Mukherjee, Alham Fikri Aji
CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries
Shudong Liu, Yiqiao Jin, CHENG LI, Derek F. Wong, Qingsong Wen, Lichao Sun, Haipeng Chen, Xing Xie, Jindong Wang