Perceive, Ground, Reason, and Act: A Benchmark for
General-purpose Visual Representation
Jiangyong Huang1,3∗, William Yicheng Zhu1∗, Baoxiong Jia1,2, Zan Wan1,4, Xiaojian Ma1,2, Qing Li1, Siyuan Huang1
1Beijing Institute for General Artificial Intelligence, 2University of California, Los Angeles, 3Peking University, 4Beijing Institute of Technology
∗ indicates equal contribution