Modern intelligent systems operate in increasingly dynamic and complex environments, where they must process heterogeneous data streams and adapt to rapidly changing requirements. Conventional AI agents, often restricted to single modalities or static data distributions, struggle to maintain performance when faced with the variability of real-world contexts or the need to integrate diverse sensory inputs without catastrophic forgetting. The purpose of the Special Session on Multimodal Foundation Models for Evolutionary Agent-Based Systems (MEAS) is to investigate how the convergence of multimodal foundation models and agentic architectures can enable a new generation of intelligent systems that perceive, reason, and act across multiple data streams while continuously evolving. The session focuses on methods, models, and frameworks that exploit multimodal AI not only to integrate vision, language, audio, and sensor data but also to exhibit self-improvement, lifelong learning, and adaptive behavior over time. The scope includes research on unified representational spaces and cross-modal grounding, long-context reasoning for complex task sequences, and evolutionary learning strategies such as neuro-evolution, self-supervised adaptation, and online optimization. We especially welcome contributions on agentic architectures capable of hierarchical planning, robust tool orchestration, memory management (episodic and semantic), and multi-agent coordination in non-stationary environments. The session also invites work on ensuring the safety, alignment, and interpretability of agents as they evolve, alongside techniques for efficient deployment on edge devices and resource-constrained scenarios. Contributions regarding empirical studies, industrial applications in domains like Healthcare, Industry 5.0, and Smart Environments, and demonstrations of tools that showcase the potential of multimodal evolutionary agents are strongly encouraged.
By bringing together researchers and practitioners from machine learning, autonomous agents, computer vision, and distributed systems, the MEAS session aims to promote the exchange of ideas, highlight emerging challenges, and define future research directions for intelligent and continuously evolving multimodal agent systems.