Perceive, Ground, Reason, and Act: A Benchmark for 

General-purpose Visual Representation

Jiangyong Huang1,3∗, William Yicheng Zhu1∗, Baoxiong Jia1,2, Zan Wan1,4, Xiaojian Ma1,2, Qing Li1, Siyuan Huang1

1Beijing Institute for General Artificial Intelligence, 2University of California, Los Angeles, 3Peking University, 4Beijing Institute of Technology

indicates equal contribution