CSCI-644: Natural Language Dialogue Systems - Spring 2026

Schedule

The schedule below is tentative and subject to change.

Week 1: January 16

Georgila - Overview, different types of dialogue, example dialogue systems, topics to be covered

Week 2: January 23

Georgila - Continuation of overview, basic principles of dialogue processing (initiative, grounding, dialogue acts, turn-taking), knowledge-based dialogue management (information states, logic-based approaches)
Assignment 1 handed out

Week 3: January 30

Georgila - Reinforcement learning and simulated users for dialogue management (Part 1)

Week 4: February 6

Georgila - Reinforcement learning and simulated users for dialogue management (Part 2)
Assignment 1 due (Thursday February 5, 11:59 pm)

Week 5: February 13

Georgila - Data collection, dialogue corpora and annotation, dialogue evaluation (manual and automatic)
Assignment 2 handed out

Week 6: February 20

Georgila - Speech recognition and speech synthesis for dialogue

Week 7: February 27

Georgila - Deep learning approaches to dialogue (including end-to-end architectures and chatbots), dialogue state tracking
Assignment 2 due (Monday March 2, 11:59 pm)
Selected special topic due (Monday March 2, 11:59 pm)

Week 8: March 6

Georgila - Reinforcement learning from human and AI feedback, natural language understanding, natural language generation
Project white paper due (Thursday March 5, 11:59 pm)

Week 9: March 13

Georgila - Multi-party dialogue, turn-taking, team dialogue, healthcare applications

March 20 - Spring Break

Week 10: March 27

Guest lecture: Prof. David Traum - Grounding

Week 11: April 3

Project proposal due (Thursday April 2, 11:59 pm)
Student topic presentations

Visual dialogue - Kaushal, Grace, Arushi - 30 min

Long, Yuxing, Xiaoqi Li, Wenzhe Cai, and Hao Dong. Discuss before moving: Visual language navigation via multi-expert discussions. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 17380-17387. IEEE, 2024.
Qiao, Yanyuan, Qianyi Liu, Jiajun Liu, Jing Liu, and Qi Wu. LLM as copilot for coarse-grained vision-and-language navigation. In European Conference on Computer Vision, pp. 459-476. Cham: Springer Nature Switzerland, 2024.
Han, Leekyeung, Hyunji Min, Gyeom Hwangbo, Jonghyun Choi, and Paul Hongsuck Seo. DialNav: Multi-turn Dialog Navigation with a Remote Guide. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8514-8523. 2025.

Mixed-initiative dialogue - Runhui, Xixiao - 20 min

Yuxiang Nie, Heyan Huang, Xian-Ling Mao, and Lizi Liao. 2024. Mix-Initiative Response Generation with Dynamic Prefix Tuning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8748–8761, Mexico City, Mexico. Association for Computational Linguistics.
Maximillian Chen, Ruoxi Sun, Tomas Pfister, and Sercan O Arik. Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training. ICLR 2025.

Manual and automatic evaluation metrics for task-oriented dialogue - Tiannuo, Linxin - 20 min

Arihant Jain, Purav Aggarwal, Rishav Sahay, Chaosheng Dong, and Anoop Saladi. 2025. AutoEval-ToD: Automated Evaluation of Task-oriented Dialog Systems. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 10133–10148, Albuquerque, New Mexico. Association for Computational Linguistics.
Emre Can Acikgoz, Carl Guo, Suvodip Dey, Akul Datta, Takyoung Kim, Gokhan Tur, and Dilek Hakkani-Tur. 2025. TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 113–132, Avignon, France. Association for Computational Linguistics.

Non-cooperative dialogue systems - Athirai - 15 min

Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or No Deal? End-to-End Learning of Negotiation Dialogues. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2443–2453, Copenhagen, Denmark. Association for Computational Linguistics.

Companion dialogue systems - Chi - 15 min

Zheyong Xie, Shaosheng Cao, Zuozhu Liu, Zheyu Ye, Zihan Niu, Chonggang Lu, Tong Xu, Enhong Chen, Zhe Xu, Yao Hu, and Wei Lu. 2025. iPET: An Interactive Emotional Companion Dialogue System with LLM-Powered Virtual Pet World Simulation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 416–425, Vienna, Austria. Association for Computational Linguistics.

Turn-taking - Nithi - 15 min

Choi, Min Gyeong, and Sun-Young Oh. "Developing L2 turn-taking with ChatGPT: A longitudinal conversation analytic study." System 138 (2026): 103959.

Embodied conversational agents - Fatemeh, Phillip - 20 min

Angus Addlesee, Neeraj Cherakara, Nivan Nelson, Daniel Hernandez Garcia, Nancie Gunson, Weronika Sieińska, Christian Dondrup, and Oliver Lemon. 2024. Multi-party Multimodal Conversations Between Patients, Their Companions, and a Social Robot in a Hospital Memory Clinic. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 62–70, St. Julians, Malta. Association for Computational Linguistics.
Neeraj Cherakara, Finny Varghese, Sheena Shabana, Nivan Nelson, Abhiram Karukayil, Rohith Kulothungan, Mohammed Afil Farhan, Birthe Nesset, Meriam Moujahid, Tanvi Dinkar, Verena Rieser, and Oliver Lemon. 2023. FurChat: An Embodied Conversational Agent using LLMs, Combining Open and Closed-Domain Dialogue with Facial Expressions. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 588–592, Prague, Czechia. Association for Computational Linguistics.
Agnes Axelsson and Gabriel Skantze. Do you follow? A fully automated system for adaptive robot presenters. International Conference on Human Robot Interaction, 2023.

NLU for dialogue - Valliammai, Aarushi - 20 min

Kalpa Gunaratna, Vijay Srinivasan, Akhila Yerukola, and Hongxia Jin. Explainable Slot Type Attentions to Improve Joint Intent Detection and Slot Filling. EMNLP Findings, 2022.
Omar Shaikh, Kristina Gligoric, Ashna Khetan, Matthias Gerstgrasser, Diyi Yang, and Dan Jurafsky. 2024. Grounding Gaps in Language Model Generations. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6279–6296, Mexico City, Mexico. Association for Computational Linguistics.

Week 12: April 10

Assignment 3 handed out
Student topic presentations

Modeling/recognizing affect in dialogue systems - Emily, Bowen - 20 min

Pierre Colombo, Wojciech Witon, Ashutosh Modi, James Kennedy, and Mubbasir Kapadia. 2019. Affect-Driven Dialog Generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3734–3743, Minneapolis, Minnesota. Association for Computational Linguistics.
Nicholas Sofroniew, Isaac Kauvar, William Saunders, Runjin Chen, Tom Henighan, Sasha Hydrie, Craig Citro, Adam Pearce, Julius Tarng, Wes Gurnee, Joshua Batson, Sam Zimmerman, Kelley Rivoire, Kyle Fish, Chris Olah, and Jack Lindsey. Emotion concepts and their function in a large language model. 2026.

Multi-party conversations - Sheryl, Lydia - 20 min

Hiroki Ouchi and Yuta Tsuboi. 2016. Addressee and Response Selection for Multi-Party Conversation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2133–2143, Austin, Texas. Association for Computational Linguistics.
Maira Gatti de Bayser, Melina Alberio Guerra, Paulo Cavalin, and Claudio Pinhanez. A Hybrid Solution to Learn Turn-Taking in Multi-Party Service-based Chat Groups. 2020.
Nicolò Penzo, Maryam Sajedinia, Bruno Lepri, Sara Tonelli, and Marco Guerini. 2024. Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11210–11233, Miami, Florida, USA. Association for Computational Linguistics.
Ronald Petrick and Mary Ellen Foster. Planning for social interaction in a robot bartender domain. International Conference on Automated Planning and Scheduling. 2013.

Chelsea, Yihe - 20 min

Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. GEPA: Reflective prompt evolution can outperform reinforcement learning. ICLR 2026.
Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. Agentic context engineering: Evolving contexts for self-improving language models. ICLR 2026.

Manual and automatic evaluation metrics for task-oriented dialogue - Autumn, Vincent-Daniel - 20 min

Abishek Komma, Nagesh Panyam Chandrasekarasastry, Timothy Leffel, Anuj Goyal, Angeliki Metallinou, Spyros Matsoukas, and Aram Galstyan. 2023. Toward More Accurate and Generalizable Evaluation Metrics for Task-Oriented Dialogs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 186–195, Toronto, Canada. Association for Computational Linguistics.
Jiseung Hong, Grace Byun, Seungone Kim, and Kai Shu. 2025. Measuring Sycophancy of Language Models in Multi-turn Dialogues. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2239–2259, Suzhou, China. Association for Computational Linguistics.

Text2LoRA and Doc2LoRA - Daniel, Deyang - 20 min

Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, and Robert Tjarko Lange. Text-to-LoRA: Instant transformer adaption. ICML 2025.
Rujikorn Charakorn, Edoardo Cetin, Shinnosuke Uesaka, and Robert Tjarko Lange. Doc-to-LoRA: Learning to instantly internalize contexts. 2026.

Indigenous ASR - Faith - 15 min

Robbie Jimerson and Emily Prud’hommeaux. 2018. ASR for Documenting Acutely Under-Resourced Indigenous Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).

Juann - 15 min

Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey. Persona Vectors: Monitoring and Controlling Character Traits in Language Models. 2025.

Leonardo - 15 min

Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, and Lucas Dixon. Who's asking? User personas and the mechanics of latent misalignment. NeurIPS 2024.

Chris - 15 min

Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, and James Zou. How well can LLMs negotiate? NEGOTIATIONARENA platform and analysis. Proceedings of the 41st International Conference on Machine Learning 2024.

From LLM to chatbot, post-training technique - Leland, Changhui - 20 min

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. MemGPT: Towards LLMs as operating systems. 2024.
Percy Liang et al. Holistic evaluation of language models. Transactions on Machine Learning Research 2023.
Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. TruthfulQA: Measuring How Models Mimic Human Falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, Dublin, Ireland. Association for Computational Linguistics.

Week 13: April 17

Student topic presentations

Dialogue state tracking - Abhijith - 15 min

Zhaojiang Lin, Bing Liu, Andrea Madotto, Seungwhan Moon, Zhenpeng Zhou, Paul Crook, Zhiguang Wang, Zhou Yu, Eunjoon Cho, Rajen Subba, and Pascale Fung. 2021. Zero-Shot Dialogue State Tracking via Cross-Task Transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7890–7900, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Student project presentations

Internalization of biases in large language models - Faith, Leonardo - 30 min

PolyPersona: Cross-lingual structure of personality directions in large language models - Juann, Nithi - 30min

Training a high-efficiency multi-task transformer - Changhui, Leland - 30 min

Pitch: Cost-aware automatic evaluation for task-oriented dialogue agents - Tiannuo, Linxin - 30 min

LLM sycophancy in multi-turn reasoning dialogues - Autumn, Vincent-Daniel - 30 min

Week 14: April 24

Assignment 3 due (Thursday April 23, 11:59 pm)
Student project presentations

Logic script writer: Multi-agent collaboration for logical consistency in script writing - Chi - 25 min

Socially intelligent LLM tutor - Sheryl - 25 min

End-to-end robotic gesture synthesis from dialogue for Blossom Squish and Stretch Robot - Lydia - 25 min

Vision-language reinforcement learning for multi-turn dialogue agents in maze navigation - Grace, Athirai - 30 min

Gounded dialogue: Ontology-driven scaffolding in conversational AI tutoring system - Kaushal, Arushi - 30 min

Temporal dialogue memory graphs for long-term conversational reasoning - Runhui, Xixiao - 30 min

Dialogue state tracking - Abhijith - 25 min

Week 15: May 1

Student project presentations

Code-switching robustness in voice agents - Chelsea, Yihe - 30 min

Autobiographical interviews - Emily - 25 min

LLM-based reward decomposition for task-oriented dialogue policy learning - Fatemeh, Phillip - 30 min

Evaluating LLM robustness via heterogenous multi-agent debate (H-MAD) - Deyang, Aarushi, Valliammai - 35 min

A data generation pipeline for empathetic sycophancy in mental health dialogue - Bowen - 25 min

Domain-adaptation - Daniel C. R. - 25 min

TBD - Chris - 25 min

Examination period: May 6

Student project reports due (May 6, 4 pm)

Page updated

Google Sites

Report abuse