We provide some extra introduction of different traces and a simple demostration of how humans (agents) can do exploration and reasoning in the proposed Conan environment. We hope this material will help reviewers understand how Conan runs and the significance behind the benchmark.
We take the playground shown in Fig 1 as an example. You can zoom in for more details.
The playground has been initialized by assigning a random task to the vandal from the task space while ensuring its survival. A probabilistic parser selects a path of subgoals based on a knowledge graph and communicates this decision to a planner. In this scenario, the assigned task is "get diamond".
Once the vandal completes the task, it leaves behind traces in the playground. You can observe its final position at the bottom of the figure. The detective is then placed into the playground, starting at the beginning of the traces. The detective's objective is to deduce the vandal's actions and intentions throughout the entire process. Fig 1 illustrates that questions can be related to the goal, intent, and survival of the vandal. The detective explores the playground to find traces that provide valuable information.
Let's shed some light on the nature of traces. Traces encompass footprints, objects crafted in the environment, and remnants left after certain actions. We designed traces to be diverse, natural, and carefully curated. For example, footprints cannot be left on hard surfaces, and different footprints may overlap. The vandal will remove objects crafted on a table, making it visible only when something is left on the table.
Apart from traces, knowledge is crucial in Conan, including the task dependency graph, survival conditions, and other world knowledge. To grasp the significance of knowledge, think of what knowledge is required to play the popular game Minecraft. In our case, we display the dependency graph of getting a diamond, which the vandal follows.
Returning to the detective, it has a limited observation space, only being able to see the 9*9 area around itself.
Let's suppose the detective's exploration begins by following footprints. Firstly we can see some traces which indicating trees being cut. Then we find the first table left by the vandal. While the detective may not know what was made on the table, it can deduce that it cannot be stone swords or iron pickaxes due to the absence of collected stones or irons. This is an important point in Conan. You need to handle uncertainty during the whole process through reasoning and collecting more useful information.
Another challenge arises when footprints cannot be left on hard surfaces. The detective needs an exploration policy to determine the vandal's final destination and simultaneously reason about the events that occurred in that area. Did the vandal collect resources? Were there encounters with monsters or lava?
The reasoning process continues as the detective see more traces. For instance, some following explicit traces show that the vandal fought and killed a zombie despite being injured (indicated by blood and the zombie body). Implicit traces provide support for the assumption that the vandal made an iron pickaxe. At this point, the detective infers that the vandal's goal is to obtain a diamond, as only a diamond requires an iron pickaxe for extraction. To validate this hypothesis, the vandal can move towards the diamond. Of course, if the detective directly spots the vandal next to a diamond, it can confidently draw the same conclusion. However, most exploration policies are not perfect, and what the detective observes may not reveal everything that happened. Agents must reason with incomplete information. If they cannot draw the correct conclusion, they need to seek more useful information. This is what makes Conan an intriguing and challenging benchmark.