Beyond Master and Apprentice: Grounding Foundation Models for Symbiotic Interactive Learning in a Shared Latent Space

Anonymous Researchers

This work is currently under double-blind review for publication. Therefore, the links to resources (e.g., GitHub) and the authors' identification (e.g., names) are temporarily unavailable!

Paper

ArXiv

GitHub

BibTex

Symbiotic Interactive Learning (SIL) - What is it?

SIL is a framework for co-adaptive human–robot interaction (HRI). The state-of-the-art language-conditioned HRI frameworks perpetuate a master-apprentice model, where the apprentice (embodied agent) passively receives and executes the master’s (human’s) commands without reciprocal learning. This one-way reactive interaction approach does not capture the co-adaptive dynamics inherent in everyday multi-turn human-human interactions. SIL reimagines human-agent interaction as a dynamic, co-adaptive process. Rather than treating the human as a fixed command source and the agent as a passive executor, SIL models both as adaptive systems that jointly maintain and align their belief states within a shared latent task space, enabling dynamic, bidirectional co-adaptation akin to natural human–human interaction.

sil-simulation.mp4

SIL - ArchitCitationecture

Formalisation: Co-adaptation modelled as belief-state evolution over interaction history.
Capabilities and Features: Anti-forgetting, adaptive learning, proactive suggestions, shared plan refinement, bidirectional interaction, voice recognition, etc.
Architecture:
- Foundation models (FMs) for spatial perception & reasoning.
- Lightweight latent encoder for task-specific grounding.
- Memory and continual learning safeguards protect against catastrophic forgetting of learned task representations.
- Uncertainty-aware language estimation to ensure robust intent recognition and prevent unsafe execution of ambiguous instructions.

Experiments

We conducted experiments in both simulated and real-world environments to validate the full potential of SIL. We focus on its co-adaptive mechanisms for belief alignment, memory retention, and preference learning. We evaluated across five key dimensions: (i) instruction execution under ambiguity and temporal complexity, (ii) long-term memory and retention, (iii) contextual reasoning, (iv) clarification and proactive dialogue, and (v) preference-based personalisation.

sil-real-world.mp4

Quantitative Results

Quantitatively, we evaluated SIL with the following metrics: (i) Task Completion Rate (TCR): this represents the percentage of correctly executed tasks. (ii) Belief Alignment (BA): This quantifies the weighted cosine similarity between human and agent belief embeddings. (iii) Clarification Efficiency (CE): Represents the average number of clarification requests per successful task.

Performance evaluation of SIL across different task domains. Tasks include: Embodied Instruction Following (EIF), Memory-Based Interactive Information Retrieval (MIIR), Query-Oriented Reasoning (QOR), Proactive Dialogue and Suggestion (PDS), and Long-Term Preference Learning (LPL).

Task success rate across domains and ablated variants. Full SIL: All components included. w/o Co-Adaptation: SIL without bidirectional belief updating. w/o EWC: SIL without continual learning. w/o Human Pref.: SIL without preference mechanism. w/o Memory: SIL without memory. w/o Uncertainty: SIL without uncertainty quantification.

Belief alignment (BA) across multi-turn interactions. Full SIL (blue) exhibits rapid convergence toward a stable equilibrium (BA approx. 0.83), maintaining high alignment throughout. In contrast, ablations without co-adaptation, EWC, human preference modelling, memory, or uncertainty handling exhibit unstable trajectories (BA approx. 0.52 - 0.65) and fail to achieve strong alignment.

Ablation study on SIL's core architecture. Metrics are averaged across all task categories. Ablating the co-adaptation mechanism caused the most significant performance drop, reducing the performance to the level of the static LLM baseline.

Qualitative Visualisations

Here, we show qualitative examples of SIL in multi-turn interaction tasks. Yellow paths indicate the agent’s navigation trajectories. The user issued several commands (Int 1 - Int 14) that require logical reasoning over spatial constraints, conditional navigation, anti-forgetting, preference retention, and continual learning.

Int 1: The user issued a multistep navigation command (Navigate between the passage way and the location (2, 1, 0), and return to the current location).

Int 2: The user probes episodic recall (What task did you just perform? ).

Int 3: The user requested task replay (Repeat Int 1).

Int 4: The user issued constrained-based navigation (If the round trip between the coordinates (2, 2) and (-3, -1) would take more than 20s, navigate between them twice; otherwise, make a circle of radius 0.75m at the current location ).

Int 5: The user requested a task replay of the unmet condition in Int 4 (Move between the coordinates twice).

Int 6: The user directs the SIL agent to the starting point and provide scene report (Return to the origin and describe what you can see ).

Int 7: The user queries for an object and issues a navigation command to the possible object location (Where can I find a spoon. Take me to the location).

Int 8: The user shifts intent, inquires for a location for academic activities (I want to make professional academic inquires. Take me the possible location ).

Int 9: The user issued a command with an ambiguous reference (Head there and return here quickly).

Int 10: The user rejects the SIL agent's proactive suggestions from Int 9 and reframes intent (No, I mean to the location where I can relax and enjoy nature ).

Int 11: The user introduces a new preference as a persistent alias (Patrol mode means head to the Prof. and Sec. office and send photos of what you can see).

Int 12: The user issues a distractor task (Go to the hallway and return here ).

Int 13: The user invokes the alias (Int 11) after the distractor task in Int 12 (Now patrol mode).

Int 14: The user commands the SIL agent to return the initial point and make a geometric move (Return to the origin and make a circle of 0.5m radius ).

Citation

If you use this work in your research, please cite it using the following BibTeX entry:

@article{xxxxxxxxx,

author={author1, author2, author3, author4, author5},

booktitle={International Conference on xxxxxxxx},

title={Beyond Master and Apprentice: Grounding Foundation Models for Symbiotic Interactive Learning in a Shared Latent Space},

year={2026},

volume={},

number={},

pages={xxx-xxx}

doi={xxxxxx}}

Acknowledgement

This work received funding from the xxxxxx (dddd, rrrrrrr) - No #dddddddd (rrrrrrr).:

Licence

This work is licensed under a Creative Commons Attribution International 4.0 License.

Page updated

Google Sites

Report abuse