We supervise PhD students in many aspects of NLP, conversational AI, and Human-Robot Interaction. Many of our students are funded by the Centre for Doctoral training (CDT) in AI and Robotics . Please see Who We Are for a list of potential PhD supervisors and their interests:
Conversational Dynamics - M. Aylett
The current capacity for artificial systems to manage the interaction elements of human conversation is extremely limited. They typically have a 2-3 second delay before responding. They can't be interrupted while they are speaking. They can't generate speech while they are listening. They can't tell when they should speak without an awkward silence. Extending work by Skantze and Ekstedt this project will develop voice activity prediction (VAP) across multiple human and artificial dialog partners and develop underlying architectures to create real-time fluid conversational interaction. The project will explore floor taking/ceding strategies and back channeling for CAs and evaluate when human rules should and should not be applied.
Real-time Mediation of Multi-party Dialog - M. Aylett
In multiparty conversations where there are differences in knowledge and status between participants (e.g. tutor-student; lawyer-client or researcher-participants), many may find it hard to contribute or question an experts advice or decisions. In these situations, we envisage that social robots could significantly help dialogue partners who are reticent to question or engage in conversation in these contexts, with the robot being designed to take on the role of a user advocate. Achieving this requires a paradigm shift in how social robot turn taking is designed and implemented, so as to create proactive and socially appropriate turn taking, whilst considering the multiparty nature of such dialogues. We would aim to move away from current user-led, highly restricted humanlike turn taking approaches common in current social robot applications, towards conducting cutting edge science, design and engineering to produce proactive, user centred turn taking mechanisms that are sensitive to dialogue context, processes and capabilities of robot agents. Such an approach will produce truly useful robot advocates that work.
Ad-hoc Human-robot Teamwork using Generative AI - Oliver Lemon
This project will explore how humans, robots, and AIs can collaborate together in teams, to coordinate on shared tasks. In particular "ad-hoc" collaboration is the problem of coordinating with previously unseen/un-met other agents, which may have different perceptions and knowledge of the world, as humans are able to do via multimodal conversation. Building such capabilities will involve managing conversational interaction to understand tasks, agree plans, resolve ambiguities, correct mistakes, and so on. You will explore the use of LLMs and/ VLMs / VLAs (such as LLAMA, LLAVA, GPT, Gemini, Moshi, OpenVLA etc ...) to create systems which can meet and coordinate with previously unseen other agents (humans, AIs, or robots) and collaborate with them to complete shared tasks -- for example to tidy up a room, make breakfast, or build a lego model. You will use real robots such as Tiago, ARI, Stretch, Furhat and/or simulations of them. You will evaluate the system's effectiveness and efficiency in completing different shared tasks with different people, and its abilities for ad-hoc teamwork. Building these capabilties is key to the next generation of collaborative multimodal AI systems.
Leveraging Language Games for Open-Ended Learning Systems - Alessandro Suglia
As humans, we use language as a means to collaborate with other agents and solve our daily tasks. This activity is what Wittgenstein calls "language games". Despite the tremendous advancements in developing language models, they are still very limited in the variety of tasks that they can solve. On the other end, there is convincing evidence in the literature that "open-ended learning leads to generally capable agents" (Open-Ended Learning Team, DeepMind 2021). In this project, you will explore recent advances in Large Language Models and Multimodal Language Models to create systems that can continuously learn new tasks in complex and diverse embodied environments where agents will have to learn from the interaction with the world and with other agents.
Developing Truly Multimodal Language Models - Alessandro Suglia
Creating Artificial Intelligence (AI) algorithms that can ingest and understand multimodal content is an essential capability for the next generation of AI technologies. The vision for this research proposal is to design an innovative approach that enables one to rapidly build robust AI models for developing multimodal applications such as intelligent assistants embedded in assisted living solutions for visually impaired users. This research will create a novel type of interactive Multimodal Embodied Model (MEM) that learns from interaction to act and reason by leveraging multimodal perceptual inputs (i.e., vision and sound). The proposed Generative AI solution will overcome major bottlenecks of current language models, creating a more robust and realistic way of acquiring grounded language representations by encoding perceptual inputs, and interactively learning their meaning without the intermediate representation of words, i.e. “text-less NLP". This is a revolution in approach to current AI models, which learn a language only from symbolic representations derived from a tokenizer. These representations have several shortcomings: 1) they are specific for each language; 2) they are sensitive to noise (e.g., spelling mistakes); and 3) they are hand-crafted because they do not represent language input in a multimodal way, just like humans do. Additionally, current language models are static learners: they do not learn by interacting with the world around them. In contrast, in this project you will develop AI that can learn interactively from multiple sources of data and modalities when embodied in an environment.