VLMbench:

A Compositional Benchmark for Vision-and-Language Manipulation