CrafText Benchmark
Advancing Instruction Following in complex Open Ended World

ArXiv Paper, Environment (PyPI), Environment (GIT), Baselines (GIT)

Here, we present CrafText, a benchmark for evaluating instruction following in a multimodal environment with diverse instructions and dynamic interactions. It includes 3,924 instructions with 3,423 unique words, spanning Localization, Conditional, Building, and Achievement tasks. In addition, we propose an evaluation protocol that measures an agent’s ability to generalize to novel instruction formulations and dynamically evolving task configurations, providing a rigorous test of both linguistic understanding and adaptive decision-making.

craftext_video.mp4

Page updated

Google Sites

Report abuse

CrafText BenchmarkAdvancing Instruction Following in complex Open Ended World

CrafText Benchmark
Advancing Instruction Following in complex Open Ended World