CrafterDojo: A Suite of Foundation Models
for Building Open-Ended Agents in Crafter
for Building Open-Ended Agents in Crafter
Abstract
Developing general-purpose embodied agents is a core challenge in AI. Minecraft provides rich complexity and internet-scale data, but its slow speed and engineering overhead make it unsuitable for rapid prototyping. Crafter offers a lightweight alternative that retains key challenges from Minecraft, yet its use has remained limited to narrow tasks due to the absence of foundation models that have driven progress in the Minecraft setting. In this paper, we present CrafterDojo, a suite of foundation models and tools that unlock the Crafter environment as a lightweight, prototyping-friendly, and Minecraft-like testbed for general-purpose embodied agent research. CrafterDojo addresses this by introducing CrafterVPT, CrafterCLIP, and CrafterSteve-1 for behavior priors, vision-language grounding, and instruction following, respectively. In addition, we provide toolkits for generating behavior and caption datasets (CrafterPlay and CrafterCaption), reference agent implementations, benchmark evaluations, and a complete open-source codebase.
Overview
CrafterDojo provides a comprehensive suite of Foundation Models and Scalable & Extensible Toolkits for building agents in Crafter.
Toolkits
Expert Behavior Generator: Scalable pipeline that automatically generates diverse expert behavior demonstrations.
Caption Generator: Automated captioning toolkit that produces contextual descriptions for agent actions and environment states.
Foundation Models
CrafterVPT (C-VPT): Pre-trained behavioral foundation model.
CrafterCLIP (C-CLIP): Vision-language grounding Video CLIP model.
CrafterSteve-1 (C-Steve-1): Instruction-following policy.
Demo
CrafterVPT Behavioral Foundation
CrafterVPT exhibits diverse behavioral patterns across all 22 achievements, plus emergent survival tactics & adaptive strategies
Hierarchical Planning with CrafterSteve-1
Place Plant → Place Table
Make Wood Sword → Obtain Sapling
Experiments
CrafterVPT Performance
CrafterVPT achieves 61.4% Crafter Score, outperforming all baselines methods by up to 29.6%
CrafterCLIP Performance
CrafterCLIP achieves 89.9% R@1 v.s. 1.7% for general-purpose VideoCLIP model
CrafterSteve-1 Performance
CrafterSteve-1 achieves near-perfect success rates with shorter task completion times across 5 instruction-following tasks
Hierarchical Planning for Long-Horizon Tasks
PPO-Steve integrating our foundation models achieves competitive performance across 4 long-horizon tasks
CrafterSteve-1 with instruction chaining from heuristic planner achieves higher success rates across all long-horizon tasks,
highlighting the importance of hierarchical planning for complex, long-horizon tasks in Crafter, like Minecraft-based research
BibTex
@article{park2025crafterdojo,
title={{C}rafter{D}ojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter},
author={Park, Junyeong and Cho, Hyeonseo and Ahn, Sungjin},
journal={arXiv preprint arXiv:2508.13530},
year={2025}
}