NaturalVLM: Leveraging Fine-grained Natural Language for
Affordance-Guided Visual Manipulation
NaturalVLM: Leveraging Fine-grained Natural Language for
Affordance-Guided Visual Manipulation
Overview
Figure 1. Illustration on the fine-grained instructions. The leftmost and rightmost pairs represent action-prompt and perception-prompt bases respectively. In the center column are the manipulation steps for "Slide the top drawer open", each accompanied by fine-grained language instructions. If the current task’s manipulation step shares the same action or noun phrase as another task’s manipulation step in the fine-grained language instruction, cross-modal alignment will be conducted using the features of the action-prompt base and the perception-prompt base.
NrVLM Benchmark
Figure 2. We introduce NrVLM, a comprehensive benchmark comprising multiple manipulation tasks annotated with fine-grained natural language instructions. Visualization of select tasks from the benchmark is presented in the top two rows. Additionally, we introduce difference task variations to enrich the diversity and complexity of the benchmark, as demonstrated in the bottom two rows.
Framework
Figure 3. The overall framework. The bottom part shows the manipulation process, where the Instruction Selection network (InstrSel) selects the appropriate fine-grained language instruction, the Affordance network (AFF-NET) predicts the object-centric affordance map, and the Actor network (ACT-NET) predicts the gripper action. The top part shows the alternative perception-prompt module and action-prompt modules, they enhance the Affordance and Actor networks by aligning the noun-related perception-prompt set and verb-related action-prompt set. The two dotted arrows before Affordance and Actor networks indicate that the prompt modules are optional. The entire method is trained in an end-to-end manner.
Training Tasks
close box
close door
open drawer
push button
silde cabinet open
turn tap
take umbrella out of stand
open wine bottle
Novel Tasks
close drawer
close laptop lid
close microwave
lamp on
open door
open grill
take item out of drawer
More Tasks
take lid off saucepan
......
play jenga
open microwave
Instances of Fine-grained Instructions
High level instructions
Shut the bottom drawer
Turn the lamp on
close the microwave door
......
Fine-grained instruction(different annotation different color)
1-step1. Move the gripper near the drawer The gripper moved down to the front of the bottom drawer and closed the jaws of the plier
1-step2. Push the drawer to close it The gripper pushed the bottom drawer forward and closed the drawer
2-step1. The gripper glides from top to bottom of the faucet and seizes it The gripper crosses above the tap tube and plans to grip the tap
2-step2. The gripper gyrates the tap The gripper clinches the wheel and turns it at 360 degrees
3-step1. The gripper down is approaching the door of the microwave and position it behind the door of the microwave The gripper moves to the back of the microwave door
3-step2. Push the door of the microwave to close the microwave with the gripper The gripper slides towards the right to shut the microwave door
......