Abstract
Our understanding of the visual world goes beyond naming objects --- encompassing our ability to parse objects into meaningful parts, attributes, and relations. In this work, we leverage natural language descriptions for a diverse set of 2K procedurally generated objects to identify the parts people use and the principles leading these parts to be favored over others. Specifically, we formalize our problem as search over a space of program libraries that contain different part concepts, using tools from machine translation to evaluate how well programs expressed in each library align to human language. While a library containing only the simplest shape primitives (circle) explains some variance, we discover that libraries containing part concepts of intermediate complexity (wheel) provide efficient compression of these objects and better predict people's descriptions. Our findings highlight the value of jointly leveraging structured program representations and naturalistic language at scale to study how our perceptual experience is organized.
ArXiv Paper [extended with supplement]: http://arxiv.org/abs/2205.05666
A version of this paper originally appeared in the CogSci 2022 proceedings.
Contact: catwong@mit.edu