Readings
Based on the readings, each student should prepare at least one question and optionally additional comments about each one of the required readings. So if for example there are three required to read papers there should be at least three questions overall (one for each paper) and optionally additional comments for each paper. These could be questions about aspects of the research that were interesting or unclear to you, or comments you have on the methodology or results in the work, or implications of the work, or how it might be applied to other work. Students should post their questions on the course Piazza site in the topic for that week's readings by 11:59 pm the day prior to that week's lecture, and come prepared to discuss their questions in class. When you post questions/comments on Piazza use the "Readings" tag, include in the title of the message the week that the question/comment refers to, and mention in the body of the message the paper that the question/comment refers to.
Week 1: Georgila - Overview, different types of dialogue, example dialogue systems, topics to be covered (questions can be posted on Piazza for extra credit before the last class)
Optional
David Traum, "Computational Approaches to Dialogue" in The Routledge Handbook of Language and Dialogue Edited by Edda Weigand, Routledge, 2017, pp. 143-161. Pre-release version
David Traum Socially Interactive Agent Dialogue, Chapter 15 of The Handbook on Socially Interactive Agents (Volume 2) 2022. preprint Preprint
Week 2: Georgila - Continuation of overview, basic principles of dialogue processing (initiative, grounding, dialogue acts, turn-taking), knowledge-based dialogue management (information states, logic-based approaches) (questions can be posted on Piazza for extra credit before the last class)
Optional
Conversation and its Structure Chapter 25 of Speech and Language Processing. Daniel Jurafsky & James H. Martin, Draft of January 2026.
David Traum and Staffan Larsson, The Information State Approach to Dialogue Management in Current and New Directions in Discourse and Dialogue, Ed. Jan van Kuppevelt and Ronnie Smith, Kluwer, 2003, pp 325-354.
Kallirroi Georgila and Oliver Lemon. Adaptive Multimodal Dialogue Management Based on the Information State Update approach. In Online Proceedings of W3C Workshop on Multimodal Interaction, Sophia-Antipolis, France, 2004.
Week 3: Georgila - Reinforcement learning and simulated users for dialogue management (Part 1) (questions should be posted on Piazza by January 29, 11:59 pm)
Required
Jason D. Williams and Steve Young. Scaling POMDPs for spoken dialog management. IEEE Transactions on Audio, Speech, and Language Processing, 15(7):2116-2129, 2007.
Kallirroi Georgila, James Henderson, and Oliver Lemon. User Simulation for Spoken Dialogue Systems: Learning and Evaluation. Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1065-1068, Pittsburgh, USA, 2006.
Oliver Lemon, Kallirroi Georgila, and James Henderson. Evaluating Effectiveness and Portability of Reinforcement Learned Dialogue Strategies with Real Users: The TALK TownInfo Evaluation. Proceedings of the IEEE/ACL Workshop on Spoken Language Technology (SLT), pp. 178-181, Aruba, 2006.
Jost Schatzmann and Steve Young. The Hidden Agenda User Simulation Model. IEEE Transactions on Audio, Speech, and Language Processing, 17(4):733-747, 2009.
Optional
James Henderson, Oliver Lemon, and Kallirroi Georgila. Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Datasets. Computational Linguistics, 34(4):487-511, MIT Press, 2008.
Jason D. Williams and Steve Young. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21:393-422, 2007
Milica Gasic and Steve Young. Gaussian processes for POMDP-based dialogue manager optimization. IEEE Transactions on Audio, Speech, and Language Processing, 22(1):28-40, 2014.
Jost Schatzmann, Kallirroi Georgila, and Steve Young. Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems. Proceedings of the 6th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 45-54, Lisbon, Portugal, 2005.
Kallirroi Georgila, James Henderson, and Oliver Lemon. Learning User Simulations for Information State Update Dialogue Systems. Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 893-896, Lisbon, Portugal, 2005.
Ramesh Manuvinakurike, David DeVault, and Kallirroi Georgila. Using Reinforcement Learning to Model Incrementality in a Fast-Paced Dialogue Game. Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 331-341, Saarbruecken, Germany, 2017.
Alexandros Papangelis and Kallirroi Georgila. Reinforcement Learning of Multi-Issue Negotiation Dialogue Policies. Proceedings of the 16th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 154-158, Prague, Czech Republic, 2015.
Kallirroi Georgila, Maria Wolters, and Johanna D. Moore. Simulating the Behaviour of Older versus Younger Users when Interacting with Spoken Dialogue Systems. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics - Human Language Technologies (ACL-HLT), Short Papers, pp. 49-52, Columbus, Ohio, USA, 2008.
Georgila K. and Traum D. Reinforcement Learning of Argumentation Dialogue Policies in Negotiation. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, 2011.
Kallirroi Georgila, Mark G. Core, Benjamin D. Nye, Shamya Karumbaiah, Daniel Auerbach, and Maya Ram. Using Reinforcement Learning to Optimize the Policies of an Intelligent Tutoring System for Interpersonal Skills Training. Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2019.
Week 4: Georgila - Reinforcement learning and simulated users for dialogue management (Part 2) (questions should be posted on Piazza by February 5, 11:59 pm)
Required
Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, and Kam-Fai Wong. Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning. ACL 2018.
L. Chen, B. Tan, S. Long, and K. Yu, “Structured dialogue policy with graph neural networks,” in Proceedings of COLING, 2018, pp. 1257– 1268.
Deborah Cohen, Moonkyung Ryu, Yinlam Chow, Orgad Keller, Ido Greenberg, Avinatan Hassidim, Michael Fink, Yossi Matias, Idan Szpektor, Craig Boutilier, et al. 2022. Dynamic planning in open-ended dialogue using reinforcement learning. arXiv preprint arXiv:2208.02294.
Optional
Florian L. Kreyssig, Inigo Casanueva, Pawel Budzianowski, and Milica Gasic. Neural User Simulator for Corpus-based Policy Optimisation for Spoken Dialogue Systems. SIGDIAL 2018.
Youngsoo Jang, Jongmin Lee, and Kee-Eung Kim. 2022. Gpt-critic: Offline reinforcement learning for end-to-end task-oriented dialogue systems. In International Conference on Learning Representations.
Pei-Hao Su, Paweł Budzianowski, Stefan Ultes, Milica Gasic, and Steve Young. Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management. SIGDIAL 2017.
Layla El Asri, Jing He, and Kaheer Suleman. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems. Interspeech 2016.
Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic, and Kai Yu. AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2019.
Heriberto Cuayahuitl, Simon Keizer, and Oliver Lemon. Strategic Dialogue Management via Deep Reinforcement Learning. NIPS Workshop on Deep Reinforcement Learning 2015.
Pararth Shah, Dilek Hakkani-Tur, and Larry Heck. Interactive reinforcement learning for task-oriented dialogue management.
Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, and Kam-Fai Wong. Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning. EMNLP 2017.
Tiancheng Zhao and Maxine Eskenazi. Towards end-to-end learning for dialogue state tracking and management using deep reinforcement learning. SIGDIAL 2016.
Inigo Casanueva, Pawel Budzianowski, Pei-Hao Su, Stefan Ultes, Lina Rojas-Barahona, Bo-Hsiang Tseng, and Milica Gasic. Feudal reinforcement learning for dialogue management in large domains. NAACL 2018.
Week 5: Georgila - Data collection, dialogue corpora and annotation, dialogue evaluation (manual and automatic) (questions should be posted on Piazza by February 12, 11:59 pm)
Required
Chia-Wei Liu, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, Joelle Pineau, How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132, Austin, Texas, November 1-5, 2016.
Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang. Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges. In arXiv:2203.10012, 2022.
Clemencia Siro, Mohammad Aliannejadi, and Maarten de Rijke. 2022. Understanding user satisfaction with task-oriented dialogue systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 2018–2023, New York, NY, USA.
Optional
Ai, H., Raux, A., Bohus, D., Eskenazi, M., and Litman, D. (2007). Comparing spoken dialog corpora collected with recruited subjects versus real users.Proceedings of the 8th SIGDial Workshop on Discourse and Dialogue (SIGdial 2007).
Mehri, Shikib, and Maxine Eskenazi. "Unsupervised Evaluation of Interactive Dialog with DialoGPT." Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2020.
Ron Artstein. Inter-annotator agreement In Handbook of Linguistic Annotation, edited by Nancy Ide and James Pustejovsky, pages 297–313. Springer, Dordrecht, 2017.
Marilyn A. Walker, Candace Kamm and Diane J. Litman. Towards Developing General Models of Usability with PARADISE. Natural Language Engineering 2000.
Hone, K. S., and Graham, R. (2000). Towards a tool for the subjective assesment of speech system interfaces (SASSI). Nat. Lang. Eng. 6(3/4), pp. 287–303.
Sudeep Gandhe and David Traum Evaluation Understudy for Dialogue Coherence Models In proceedings of The 9th SIGdial Workshop on Discourse and Dialogue (SIGdial 2008), June, 2008.
Sebastian Moller Assessment and Evaluation of Speech-Based Interactive Systems: From Manual Annotation to Automatic Usability Evaluation Chapter 15 of Speech Technology, Fang Chen, ed., Springer, 2010.
Emre Can Acikgoz, Carl Guo, Suvodip Dey, Akul Datta, Takyoung Kim, Gokhan Tur, and Dilek Hakkani-Tur. 2025. TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 113–132, Avignon, France. Association for Computational Linguistics.
Kallirroi Georgila, Carla Gordon, Volodymyr Yanov, and David Traum. Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC), pp. 726-734, Marseille, France, 2020.
Week 6: Georgila - Speech recognition and synthesis for dialogue (questions can be posted on Piazza for extra credit before the last class)
Optional
Kun Wei, Yike Zhang, Sining Sun, Lei Xie, and Long Ma. Conversational speech recognition by learning conversation-level characteristics. ICASSP 2022.
Suyoun Kim and Florian Metze. Dialog-context aware end-to-end speech recognition. SLT 2018.
Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, and Keiichi Tokuda. End-to-end text-to-speech based on latent representation of speaking styles using spontaneous dialogue. Interspeech 2022.
Eva Szekely, Gustav Eje Henter, Jonas Beskow, and Joakin Gustafson. Spontaneous conversational speech synthesis from found data. Interspeech 2019.
Other (for informational purposes only, not for posting questions unless one wants to post questions for these papers too)
Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. Toward human parity in conversational speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(12):2410-2423, 2017.
Wayne Xiong, Lingfeng Wu, Jun Zhang, and Andreas Stolcke. Session-level Language Modeling for Conversational Speech. EMNLP 2018.
Kallirroi Georgila, Anton Leuski, Volodymyr Yanov, and David Traum. Evaluation of Off-the-shelf Speech Recognizers Across Diverse Dialogue Domains. LREC 2020.
Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, and Dan Su. Controllable context-aware conversational speech synthesis. Interspeech 2021.
Johannah O’Mahony, Catherine Lai, and Simon King. Synthesising turn-taking cues using natural conversational data. Speech Synthesis Workshop 2023.
Elijah Gutierrez, Pilar Oplustil-Gallegos, and Catherine Lai. Location, location: Enhancing the evaluation of text-to-speech synthesis using the rapid prosody transcription paradigm. Speech Synthesis Workshop 2019.
Week 7: Georgila - Deep learning approaches to dialogue (including end-to-end architectures and chatbots), dialogue state tracking (questions should be posted on Piazza by February 26, 11:59 pm)
Required
Siddhant Arora, Jinchuan Tian, Jiatong Shi, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe. Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback. arXiv 2026.
Sunghee Jung, Donghun Lee, Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Junrae Cho, Kihyun Kim, Eunggyun Kim, and Myeongcheol Shin. DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models. SIGDIAL 2025.
Noé Durandard, Saurabh Dhawan, and Thierry Poibeau. LLMs stick to the point, humans to style: Semantic and Stylistic Alignment in Human and LLM Communication. SIGDIAL 2025.
Optional
Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A Persona-Based Neural Conversation Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 994–1003, Berlin, Germany.
Daniel Adiwardana, Minh-Thang Luong, David R So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, and Quoc V. Le. 2020. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977.
Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2020. DIALOGPT: Large-Scale Generative Pre-training for Conversational Response Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 270–278, Online.
Donghoon Ham, Jeong-Gwan Lee, Youngsoo Jang, and Kee-Eung Kim. 2020. End-to-End Neural Pipeline for Goal-Oriented Dialogue Systems using GPT-2. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 583–592, Online.
Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the AAAI Conference on Artificial Intelligence.
Antoine Bordes, Y-Lan Boureau, and Jason Weston. 2017. Learning end-to-end goal-oriented dialogue. In Proceedings of the International Conference on Learning Representations (ICLR).
Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 196–205, Denver, Colorado.
Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. 2015. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 285–294, Prague, Czech Republic.
Miaoran Li, Baolin Peng, Michel Galley, Jianfeng Gao, and Zhu (Drew) Zhang. 2023. Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 496–508, Prague, Czechia.
Ethan A. Chi, Ashwin Paranjape, Abigail See, Caleb Chiam, Trenton Chang, Kathleen Kenealy, Swee Kiat Lim, Amelia Hardy, Chetanya Rastogi, Haojun Li, Alexander Iyabor, Yutong He, Hari Sowrirajan, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Jillian Tang, Avanika Narayan, Giovanni Campagna, and Christopher Manning. 2022. Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 376–395, Edinburgh, UK.
Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Eric Michael Smith, Y-Lan Boureau, and Jason Weston. 2021. Recipes for Building an Open-Domain Chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 300–325, Online.
Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, and Jianfeng Gao. 2021. Soloist: Building Task Bots at Scale with Transfer Learning and Machine Teaching. Transactions of the Association for Computational Linguistics, 9:807–824.
Week 8: Georgila - Reinforcement learning from human and AI feedback, natural language understanding, natural language generation (questions should be posted on Piazza by March 5, 11:59 pm)
Required
Suvodip Dey, Ramamohan Kummara, and Maunendra Desarkar. 2022. Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 318–324, Dublin, Ireland. Association for Computational Linguistics.
Ramesh Manuvinakurike, Trung Bui, Walter Chang, and Kallirroi Georgila. Conversational Image Editing: Incremental Intent Identification in a New Dialogue Task. Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), 2018. – Best paper award
Optional
Piotr Zelasko, Raghavendra Pappagari, and Najim Dehak. What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition. Transactions of the Association for Computational Linguistics, 9:1163-1179, 2021.
Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng, and Jianfeng Gao. Few-shot Natural Language Generation for Task-Oriented Dialog. Findings of the Association for Computational Linguistics: EMNLP, 2020.
Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, and Tatsuya Kawahara. End-to-end speech-to-dialog-act recognition. Proceedings of Interspeech, 2020.
Pierre Colombo, Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, and Chloe Clavel. Guiding attention in sequence-to-sequence act prediction models for dialogue. Proceedings of AAAI, 2020.
Yang Liu, Kun Han, Zhao Tan, and Yun Lei. Using context information for dialog act classification in DNN framework. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
Daniel Ortega and Ngoc Thang Vu. Neural-based context representation learning for dialog act classification. Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), 2017.
Ali Ahmadvand, Jason Ingyu Choi, and Eugene Agichtein. Contextual Dialogue Act Classification for Open-Domain Conversational Agents. Proceedings of SIGIR, 2019.
Arshit Gupta, Peng Zhang, Garima Lalwani, and Mona Diab. CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots. Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019.
Chien-Sheng Wu, Steven C.H. Hoi, Richard Socher, and Caiming Xiong. 2020. TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Jinyu Guo, Kai Shuang, Jijie Li, Zihan Wang, and Yixuan Liu. Beyond the Granularity: Multi-Perspective Dialogue Collaborative Selection for Dialogue State Tracking. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, and Yi Zhang. Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
Lu Chen, Boer Lv, Chi Wang, Su Zhu, Bowen Tan, and Kai Yu. Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks. Proceedings of AAAI, 2020.
Chenguang Zhu, Michael Zeng, and Xuedong Huang. Multi-task Learning for Natural Language Generation in Task-Oriented Dialogue. Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019.
Mihir Kale and Abhinav Rastogi. Template Guided Text Generation for Task-Oriented Dialogue. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Week 9: Georgila - Multi-party dialogue, turn-taking, team dialogue, healthcare applications (questions can be posted on Piazza for extra credit before the last class)
Optional
Week 10: Guest lecture: Prof. David Traum - Grounding (questions can be posted on Piazza for extra credit before the last class)
Optional
Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse Cognitive Science , 13, 259-294.
David R. Traum Computational Models of Grounding in Collaborative Systems, in working notes of AAAI Fall Symposium on Psychological Models of Communication, p. 124-131, November, 1999.
Antonio Roque and David Traum, Degrees of Grounding Based on Evidence of Understanding In proceedings of The 9th SIGdial Workshop on Discourse and Dialogue (SIGdial 2008), June, 2008.
Omar Shaikh, Kristina Gligoric, Ashna Khetan, Matthias Gerstgrasser, Diyi Yang, and Dan Jurafsky. Grounding gaps in language model generations NAACL 2024.
Jokinen, Kristiina, Phillip Schneider, and Taiga Mori. Towards Harnessing Large Language Models for Comprehension of Conversational Grounding. arXiv preprint arXiv:2406.01749 (2024).
Janet E Cahn and Susan E Brennan A Psychological Model of Grounding and Repair in Dialog Proceedings, AAAI Fall Symposium on Psychological Models of Communication in Collaborative Systems (pp. 25-33).
Colin Matheson, Massimo Poesio, and David Traum, Modelling Grounding and Discourse Obligations Using Update Rules, in Proceedings of the 1st Annual Meeting of the North American Association for Computational Linguistics (NAACL2000), May 2000.
Nakano, Y., Reinstein, G., Stocky, T., Cassell, J. (2003) "Towards a Model of Face-to-Face Grounding" Proceedings of the Annual Meeting of the Association for Computational Linguistics. July 7-12, Sapporo, Japan.
Di Maro, Maria. Computational Grounding: An Overview of Common Ground Applications in Conversational Agents IJCoL. Italian Journal of Computational Linguistics 7.7-1, 2 (2021): 133-156.
Week 11: Student topic presentations (questions can be posted on Piazza for extra credit before the last class)
Optional
Long, Yuxing, Xiaoqi Li, Wenzhe Cai, and Hao Dong. Discuss before moving: Visual language navigation via multi-expert discussions. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 17380-17387. IEEE, 2024.
Qiao, Yanyuan, Qianyi Liu, Jiajun Liu, Jing Liu, and Qi Wu. LLM as copilot for coarse-grained vision-and-language navigation. In European Conference on Computer Vision, pp. 459-476. Cham: Springer Nature Switzerland, 2024.
Han, Leekyeung, Hyunji Min, Gyeom Hwangbo, Jonghyun Choi, and Paul Hongsuck Seo. DialNav: Multi-turn Dialog Navigation with a Remote Guide. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8514-8523. 2025.
Yuxiang Nie, Heyan Huang, Xian-Ling Mao, and Lizi Liao. 2024. Mix-Initiative Response Generation with Dynamic Prefix Tuning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8748–8761, Mexico City, Mexico. Association for Computational Linguistics.
Maximillian Chen, Ruoxi Sun, Tomas Pfister, and Sercan O Arik. Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training. ICLR 2025.
Arihant Jain, Purav Aggarwal, Rishav Sahay, Chaosheng Dong, and Anoop Saladi. 2025. AutoEval-ToD: Automated Evaluation of Task-oriented Dialog Systems. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 10133–10148, Albuquerque, New Mexico. Association for Computational Linguistics.
Emre Can Acikgoz, Carl Guo, Suvodip Dey, Akul Datta, Takyoung Kim, Gokhan Tur, and Dilek Hakkani-Tur. 2025. TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 113–132, Avignon, France. Association for Computational Linguistics.
Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or No Deal? End-to-End Learning of Negotiation Dialogues. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2443–2453, Copenhagen, Denmark. Association for Computational Linguistics.
Zheyong Xie, Shaosheng Cao, Zuozhu Liu, Zheyu Ye, Zihan Niu, Chonggang Lu, Tong Xu, Enhong Chen, Zhe Xu, Yao Hu, and Wei Lu. 2025. iPET: An Interactive Emotional Companion Dialogue System with LLM-Powered Virtual Pet World Simulation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 416–425, Vienna, Austria. Association for Computational Linguistics.
Choi, Min Gyeong, and Sun-Young Oh. "Developing L2 turn-taking with ChatGPT: A longitudinal conversation analytic study." System 138 (2026): 103959.
Angus Addlesee, Neeraj Cherakara, Nivan Nelson, Daniel Hernandez Garcia, Nancie Gunson, Weronika Sieińska, Christian Dondrup, and Oliver Lemon. 2024. Multi-party Multimodal Conversations Between Patients, Their Companions, and a Social Robot in a Hospital Memory Clinic. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 62–70, St. Julians, Malta. Association for Computational Linguistics.
Neeraj Cherakara, Finny Varghese, Sheena Shabana, Nivan Nelson, Abhiram Karukayil, Rohith Kulothungan, Mohammed Afil Farhan, Birthe Nesset, Meriam Moujahid, Tanvi Dinkar, Verena Rieser, and Oliver Lemon. 2023. FurChat: An Embodied Conversational Agent using LLMs, Combining Open and Closed-Domain Dialogue with Facial Expressions. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 588–592, Prague, Czechia. Association for Computational Linguistics.
Agnes Axelsson and Gabriel Skantze. Do you follow? A fully automated system for adaptive robot presenters. International Conference on Human Robot Interaction, 2023.
Kalpa Gunaratna, Vijay Srinivasan, Akhila Yerukola, and Hongxia Jin. Explainable Slot Type Attentions to Improve Joint Intent Detection and Slot Filling. EMNLP Findings, 2022.
Omar Shaikh, Kristina Gligoric, Ashna Khetan, Matthias Gerstgrasser, Diyi Yang, and Dan Jurafsky. 2024. Grounding Gaps in Language Model Generations. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6279–6296, Mexico City, Mexico. Association for Computational Linguistics.
Week 12: Student topic presentations (questions can be posted on Piazza for extra credit before the last class)
Optional
Pierre Colombo, Wojciech Witon, Ashutosh Modi, James Kennedy, and Mubbasir Kapadia. 2019. Affect-Driven Dialog Generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3734–3743, Minneapolis, Minnesota. Association for Computational Linguistics.
Nicholas Sofroniew, Isaac Kauvar, William Saunders, Runjin Chen, Tom Henighan, Sasha Hydrie, Craig Citro, Adam Pearce, Julius Tarng, Wes Gurnee, Joshua Batson, Sam Zimmerman, Kelley Rivoire, Kyle Fish, Chris Olah, and Jack Lindsey. Emotion concepts and their function in a large language model. 2026.
Hiroki Ouchi and Yuta Tsuboi. 2016. Addressee and Response Selection for Multi-Party Conversation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2133–2143, Austin, Texas. Association for Computational Linguistics.
Maira Gatti de Bayser, Melina Alberio Guerra, Paulo Cavalin, and Claudio Pinhanez. A Hybrid Solution to Learn Turn-Taking in Multi-Party Service-based Chat Groups. 2020.
Nicolò Penzo, Maryam Sajedinia, Bruno Lepri, Sara Tonelli, and Marco Guerini. 2024. Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11210–11233, Miami, Florida, USA. Association for Computational Linguistics.
Ronald Petrick and Mary Ellen Foster. Planning for social interaction in a robot bartender domain. International Conference on Automated Planning and Scheduling. 2013.
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. GEPA: Reflective prompt evolution can outperform reinforcement learning. ICLR 2026.
Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. Agentic context engineering: Evolving contexts for self-improving language models. ICLR 2026.
Abishek Komma, Nagesh Panyam Chandrasekarasastry, Timothy Leffel, Anuj Goyal, Angeliki Metallinou, Spyros Matsoukas, and Aram Galstyan. 2023. Toward More Accurate and Generalizable Evaluation Metrics for Task-Oriented Dialogs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 186–195, Toronto, Canada. Association for Computational Linguistics.
Jiseung Hong, Grace Byun, Seungone Kim, and Kai Shu. 2025. Measuring Sycophancy of Language Models in Multi-turn Dialogues. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2239–2259, Suzhou, China. Association for Computational Linguistics.
Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, and Robert Tjarko Lange. Text-to-LoRA: Instant transformer adaption. ICML 2025.
Rujikorn Charakorn, Edoardo Cetin, Shinnosuke Uesaka, and Robert Tjarko Lange. Doc-to-LoRA: Learning to instantly internalize contexts. 2026.
Robbie Jimerson and Emily Prud’hommeaux. 2018. ASR for Documenting Acutely Under-Resourced Indigenous Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey. Persona Vectors: Monitoring and Controlling Character Traits in Language Models. 2025.
Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, and Lucas Dixon. Who's asking? User personas and the mechanics of latent misalignment. NeurIPS 2024.
Zhaojiang Lin, Bing Liu, Andrea Madotto, Seungwhan Moon, Zhenpeng Zhou, Paul Crook, Zhiguang Wang, Zhou Yu, Eunjoon Cho, Rajen Subba, and Pascale Fung. 2021. Zero-Shot Dialogue State Tracking via Cross-Task Transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7890–7900, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, and James Zou. How well can LLMs negotiate? NEGOTIATIONARENA platform and analysis. Proceedings of the 41st International Conference on Machine Learning 2024.
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. MemGPT: Towards LLMs as operating systems. 2024.
Percy Liang et al. Holistic evaluation of language models. Transactions on Machine Learning Research 2023.
Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. TruthfulQA: Measuring How Models Mimic Human Falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, Dublin, Ireland. Association for Computational Linguistics.