Abstract: Living organisms exhibit persistent autonomy through internally generated goals and self-sustaining behavioral organization, yet current embodied agents remain driven by externally scripted objectives. This dependence on predefined task specifications limits their capacity for long-term deployment in dynamic, unstructured environments where continuous human intervention is impractical. We propose that personality traits provide an intrinsic organizational principle for achieving persistent autonomy. Analogous to genotypic biases shaping biological behavioral tendencies, personalities enable agents to autonomously generate goals and sustain behavioral evolution without external supervision. To realize this, we develop PEPA, a three-layer cognitive architecture that operates through three interacting systems: Sys3 autonomously synthesizes personality-aligned goals and refines them via episodic memory and daily self-reflection; Sys2 performs deliberative reasoning to translate goals into executable action plans; Sys1 grounds the agent in sensorimotor interaction, executing actions and recording experiences. This closed loop of goal generation, execution, memory consolidation, and reflection enables the agent to continuously redefine objectives and adapt behavior over extended operational periods. We validate the framework through real-world deployment on a quadruped robot in a multi-floor office building. Operating without external task assignment, the robot autonomously navigates elevators and explores environments based solely on personality-driven motivations. Quantitative analysis across five distinct personality prototypes demonstrates stable, trait-aligned behaviors: exploratory personalities maximize spatial coverage, while conservative profiles develop efficient, repeated visitation patterns. The results confirm that personality-driven cognitive architectures enable sustained autonomous operation characteristic of persistent embodied systems.
Fig. 1: Overview of PEPA, the three-layer cognitive architecture. Sys3 generates ultimate/daily goals and intrinsic rewards from personality traits, self-modeling, and accumulated memories. Sys2 combines intrinsic and extrinsic rewards to select optimal actions via MCTS or distilled policies. Sys1 executes actions, monitors system state, and records episodic memories that feed back to Sys3 for goal and reward refinement.
The video below shows the staircase navigation ability of Sys1. Code for staircase navigation is available at https://anonymous.4open.science/r/staircase_navi-B38C.
This video demonstrates elevator navigation. The corresponding code is available at: https://anonymous.4open.science/r/elevator_staircase_navi-1CC5