Concept Learning with Energy-Based Models

Many hallmarks of human intelligence, such as generalizing from limited experience, abstract reasoning and planning, analogical reasoning, creative problem solving, and capacity for language require the ability to consolidate experience into concepts, which act as basic building blocks of understanding and reasoning. I will present a framework that defines a concept by an energy function over events in the environment, as well as an attention mask over entities participating in the event. Given few demonstration events, our method uses inference-time optimization procedures to generate events involving similar concepts or identify entities involved in the concept. We evaluate our framework on learning visual, quantitative, spatial, and relational concepts from demonstration events in an unsupervised manner. Our approach is able to successfully generate and identify concepts in a few-shot setting as well as transfer learned concepts between domains.

Experimental Video Results

Spatial Region Concepts: given demonstration 2D points (left), energy function over point placement is inferred (middle), stochastic gradient descent over energy is then used to generate new points (right)

Spatial Placement Concepts: demonstrations contain marker placement either north, south, east, west of entity or between two entities of a particular color (top). Inference is used to generate similar placements under novel arrangements (bottom)

Spatial Relational Concepts: demonstrations contain entities of a particular color either joining, or forming a line, triangle, or square shapes (top). Inference is used to generate similar spatial relationships under novel arrangements (bottom)

Proximity Concepts: demonstration events bring attention to the entity closest or furthest to the marker or to bring the marker to be closest or furthest to entity of a particular color (top). Inference is used to generate attention masks for closest or further entity (recognition) or to place the marker to be closest or furthest from an entity (generation) (bottom)

Quantity Concepts: demonstration attention is placed on one, two, three, or more than three entities (top). Inference is used to generate attention masks of similar quantity (bottom)

Concept Transfer Between Domains: between energy function transferred to a simulated Fetch robot domain where marker corresponds to robot gripper position