Creative Agents: Empowering Agents with Imagination for Creative Tasks

We propose Creative Agents, the first framework that can handle creative tasks in an open-ended world. Using this framework, we implement various embodied agents through different combinations of imaginators and controllers. Creative Agents is an initial attempt in this field, aimed at raising the awareness of building intelligent agents with creativity.

Figure 1. Overview of creative agents for open-ended creative tasks. A creative agent consists of two components: an imaginator and a controller. Given a free-form language instruction describing the creative task, the imaginator first generates the imagination in the form of text/image by LLM with Chain-of-Thought (CoT)/diffusion model, then the controller fulfills the imagination by executing actions in the environment, leveraging the code generation capability of vision-language model (VLM) or a behavior-cloning (BC) policy learned from data. We implement three combinations of the imaginator and controller: ① CoT+GPT-4, ② Diffusion+GPT-4V, and ③ Diffusion+BC.

Installation

Basic Requirements

MineDojo works with Ubuntu>=18.04, Python>=3.9 and Java jdk 1.8.0_171, while Voyager works with both Ubuntu>=20.04 and Windows, with Java 17 installed. On Windows 10, installing a virtual environment with Python>=3.10 is recommended.

Install Packages and Environments

Install MineDojo

You can download the MineDojo environment for Creative Agents Here: Modified MineDojo Environment.

Install Voyager

Please follow Voyager Install Tutorial. Note that you should accomplish all the four parts:

Install Dependencies for BC Controller

Download code of BC controller at https://doi.org/10.5281/zenodo.10346783, then extract the zip file in your directory.

Install Dependencies and Download Models, Datasets(for training) for BC Controller

To install packages for BC Controller, run:

pip install -r ./BC_Controller/requirements.txt

After downloading our models and datasets, for split files, run:

cat NAME.zip.* > NAME.zip

Then unzip them, and move models to ./BC_Controller/models, move datasets to ./BC_Controller/datasets

Get Started

Our models, code, and dataset, have been released. You can find them at:

https://doi.org/10.5281/zenodo.10277346 (models)
https://zenodo.org/records/10251970 (code of GPT-based controller)
https://doi.org/10.5281/zenodo.10346783 (code of BC controller)
https://doi.org/10.5281/zenodo.10275049 (dataset)

Before running with CoT+GPT-4 or Diffusion+GPT-4V, please make sure you have an OpenAI API Key that can access to GPT-4 and GPT-4V(ision).

In our experiments, we use Minecraft Official Launcher instead of Microsoft Azure, following this link.

To test with CoT+GPT-4, run:

python cot_gpt4.py --api_key <API_KEY> --task <TASK_DESCRIPTION> --mc_port <LAN_PORT>

To test with Diffusion+GPT-4V, run:

python diffusion_gpt4.py --api_key <API_KEY> --image_path <IMAGE_PATH> --mc_port <LAN_PORT>

To test with Diffusion+BC, run:

python ./BC_Controller/BC_pipline.py

Diffusion+BC results (images, voxels, voxel sequence and building result) will be saved in ./BC_Controller/results.

To type your prompts for building, modify ./BC_Controller/results/validation_prompt.txt.

To train image generation part in Diffusion+BC, run:

python ./BC_Controller/txt2img/train_text_to_image_lora.py

To train voxel generation part in Diffusion+BC, run:

python ./BC_Controller/img2vox/runner.py

To train sequence part in Diffusion+BC, run:

python ./BC_Controller/vox2seq/train.py

Showcases and Demonstrations

Figure 2. Examples of the language description, the generated visual imagination, and the created building of each variant of creative agents. Visual imagination generated by the diffusion model has great diversity, which is an important manifestation of creativity.

(Diffusion+GPT-4V) A screenshot of a white pyramid-like house in Minecraft with windows, which is built of snow

(Diffusion+GPT-4V) A sandstone palace in Minecraft with intricate details and towering minarets

(CoT+GPT-4) Build a wooden house made of oak_planks and glass

(CoT+GPT-4) Build a modern house with quartz_blocks and glass

Page updated

Google Sites

Report abuse