

  1. Emily Dinan and Gavin Abercrombie and A. Stevie Bergman and Shannon Spruit and Dirk Hovy and Y-Lan Boureau and Verena Rieser. SafetyKit: First Aid for Measuring Safety for Open-domain Conversational Systems. Proceedings of the 60th Conference of the Chapter of the Association for Computational Linguistics (ACL) 2022. (long paper)

  2. Gavin Abercrombie and Verena Rieser. Risk-graded Safety for Handling Medical Queries in Conversational AI. 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2022

  3. A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau and Verena Rieser. Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design. 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2022)

  4. Alessandro Suglia, Bhathiya Hemanthage, Malvina Nikandrou, George Pantazopoulos, Amit Parekh, Arash Eshghi, Claudio Greco, Ioannis Konstas, Oliver Lemon and Verena Rieser. Demonstrating EMMA: Embodied MultiModal Agent for Language-guided Action Execution in 3D Simulated Environments. 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2022)

  5. Lu Yu and Verena Rieser. Adversarial Robustness of Visual Dialog. ArXiv preprint

  6. Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Dilek Hakkani-Tur, Zekang Li, Verena Rieser, Samira Shaikh, David Traum, Yi-Ting Yeh, Zhou Yu, Yizhe Zhang, Chen Zhang. Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges. 2022 [Arxiv]

  7. Marco Casadio, Ekaterina Komendantskaya, Verena Rieser, Matthew Daggitt, Daniel Kienitz, Luca Arnaboldi and Wen Kokke. Why Robust Natural Language Understanding is a Challenge. 5th Workshop on Formal Methods for ML-Enabled Autonomous Systems. 2022.

  8. Luca M. Leisten and Verena Rieser. “”I Like You, as a Friend”: Voice Assistants’ Response Strategies to Sexual Harassment and Their Relation to Gender.” Human Perspectives on Spoken Human-Machine Interaction (SpoHuMa) 2022 preprint PsyArXiv, published PDF


  1. Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser. Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling. 2021 [Arxiv] [tools]

  2. Amanda Cercas-Curry, Gavin Abercrombie and Verena Rieser. ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021. (long paper) [arXiv]

  3. David Howcroft and Verena Rieser. What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. (short paper)

  4. Xinnuo Xu, Ondrej Dusek, Verena Rieser and Ioannis Konstas. MIRANEWS: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization. In Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021.

  5. Karin Sevengnani, David Howcroft, Ioannis Konstas and Verena Rieser. One-turn Topic Transitions for Open-Domain Dialogue. ACL 2021 (long paper). [ARXIV]

  6. Xinnuo Xu, Ondrej Dušek, Verena Rieser, Ioannis Konstas. AGGGEN: Ordering and Aggregating while Generating. ACL 2021 (short paper). [Arxiv]

  7. Gavin Abercrombie, Amanda Cercas Curry, Mugdha Pandya and Verena Rieser. Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants. ACL-IJCNLP 2021 3rd Workshop on Gender Bias in Natural Language Processing (GeBNLP 2021) [Arxiv]


  1. Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski and Verena Rieser. SLURP: A Spoken Language Understanding Resource Package. The 2020 Conference on Empirical Methods in Natural Language Processing EMNLP 2020 [acl anthology] [data]

  2. Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas and Verena Rieser. History for Visual Dialog: Do we really need it? (Long paper) The 58th Annual Meeting of the Association for Computational Linguistics ACL 2020. [acl anthology] [lay summary]

  3. Xinnuo Xu, Ondřej Dušek, Jingyi Li, Verena Rieser and Ioannis Konstas. Fact-based Content Weighting for Abstractive Summarisation Evaluation. (Short paper) The 58th Annual Meeting of the Association for Computational Linguistics ACL 2020. [acl anthology] [lay summary]

  4. Ondřej Dušek, Jekaterina Novikova, Verena Rieser. Evaluating the state-of-the-art of End-to-End Natural Language Generation: The E2E NLG challenge. Computer Speech & Language, Volume 59, Pages 123-156, 2020. [arxiv preprint] [free journal access]

  5. David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Verena Rieser, Sashank Santhanam and Emiel van Miltenburg. Evaluation Sheets. A Checklist Inspired by Twenty Years of Confusion in Human Evaluation for NLG. 13th International Conference on Natural Language Generation (INLG 2020)

  6. Amanda Cercas Curry, Judy Robertson and Verena Rieser. Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas. 2nd Workshop on Gender Bias in Natural Language Processing (GeBNLP) at COLING2020


  1. Verena Rieser. Let's Chat! Can Virtual Agents learn how to have a Conversation? Invited keynote at ACM IVA '19 ACM International Conference on Intelligent Virtual Agents. PARIS, France — July 02 - 05, 2019 [pdf]

  2. Amanda Cercas Curry and Verena Rieser. A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents. SigDial 2019.

  3. Simon Keizer, Ondřej Dušek, Xingkun Liu and Verena Rieser. User Evaluation of a Multi-dimensional Statistical Dialogue System. SigDial 2019.

  4. Ondřej Dušek, Karin Sevegnagni, Ioannis Konstas and Verena Rieser. Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking). 12th International Conference on Natural Language Generation INLG2019, Tokyo, 2019.

  5. Ondřej Dušek,David Howcroft and Verena Rieser. Semantic Noise Matters for Neural Natural Language Generation. 12th International Conference on Natural Language Generation INLG2019, Tokyo, 2019.

  6. David M. Howcroft, Karin Sevegnani, Ondrej Dusek and Verena Rieser. Noise and Neural Natural Language Generation: Rubbish in, Rubbish out? EurNLP 2019, London (31.8% acceptance rate)

  7. Xingkun Liu, Arash Eshghi, Pawel Swietojanski and Verena Rieser. Benchmarking Natural Language Understanding Services for building Conversational Agents. Tenth International Workshop on Spoken Dialogue Systems Technology IWSDS 2019. [arxiv] [data]


  1. Xinnuo Xu, Ondrej Dusek, Yannis Konstas, and Verena Rieser. Better conversations by modeling, filtering, and optimizing for coherence and diversity. In: Conference on Empirical Methods in Natural Language Processing EMNLP 2018. [arxiv]

  2. Jekaterina Novikova, Ondrej Dusek and Verena Rieser. RankME: Reliable Human Ratings for Natural Language Generation. In: 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics NAACL 2018. [arxiv] (29% acceptance rate)

  3. Ondřej Dušek, Jekaterina Novikova, Verena Rieser. Findings of the E2E NLG Challenge. 11th International Conference on Natural Language Generation (INLG). Tilburg, 2018. (to appear) [arxiv]

  4. Cercas Curry, A., Papaioannou, I., Suglia, A., Agarwal, S., Shalyminov, I., Xinnuo, X., Dusek, O., Eshghi, A., Konstas, I., Rieser, V., Lemon, O. Alana v2: Entertaining and Informative Open-domain Social Dialogue using Ontologies and Entity Linking, Alexa Prize Proceedings 2018, Amazon RE-INVENT. Finalist of the Amazon Alexa Challenge 2018. [pdf]

  5. Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser. Improving Context Modelling in Multimodal Dialogue Generation. 11th International Conference on Natural Language Generation (INLG 2018), Tilburg, The Netherlands, November 5-8, 2018. [arxiv]

  6. Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser. A Knowledge-Grounded Multimodal Search-Based Conversational Agent. The 2nd International Workshop on Search-Oriented Conversational AI (SCAI) at EMNLP 2018, Brussels, Belgium, 2018. [arxiv]

  7. Amanda Cercas Curry and Verena Rieser. #MeToo Alexa: How Conversational Systems Respond to Sexual Harassment. Second NAACL Workshop on Ethics in NLP. New Orleans, 2018. [pdf] [bib]

  8. Amanda Cercas Curry and Verena Rieser. Sexual Harassment and Conversational AI. 2nd NAACL Workshop on Widening Natural Language Processing WiNLP, New Orleans, 2018.


  1. Jekaterina Novikova, Ondrej Dusek and Verena Rieser. The E2E Dataset: New Challenges For End-to-End Generation. 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2017) [arxiv pre-print] [data] Nominated for best paper award!

  2. Jekaterina Novikova, Ondrej Dusek, Amanda Cercas-Curry and Verena Rieser. Why We Need New Evaluation Metrics for NLG. EMNLP 2017, Copenhagen. [arxiv pre-print] [data]

  3. Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. An Ensemble Model with Ranking for Social Dialogue. In: NIPS workshop on Conversational AI: Today's Practice and Tomorrow's Potential, Long Beach, USA, 2017. (selected for oral presentation) [arxiv]

  4. Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondrej Dušek, Verena Rieser, Oliver Lemon. Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback. In: AWS re:INVENT, Las Vegas, USA, 2017. [pdf]

  5. Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Data-to-Text Generation Improves Decision-Making Under Uncertainty. IEEECIM Special Issue on Natural Language Generation with Computational Intelligence. 2017 (Impact Factor: 3.65) [link] [pdf]

  6. Dirk Wollherr, Verena Rieser and Matthew Walter (Eds.) Special Issue on Spatial Reasoning & Interaction for Real-World Robotics. Advanced Robotics, volume 32, issue 5, 2017. [link]

  7. Christian Landsiedel and Verena Rieser and Matthew Walter and Dirk Wollherr. A Review of Spatial Reasoning and Interaction for Real-World Robotics. Advanced Robotics 32(5), 222-241, 2017. [pdf]

  8. Ondrej Dusek and Verena Rieser. Referenceless Quality Estimation for Natural Language Generation. 1st Workshop on Learning to Generate Natural Language (LGNL 2017) at ICML, Sydney 2017. [arxiv pre-print] [code]

  9. Jekaterina Novikova, Ondrej Dusek and Verena Rieser. Data-driven Natural Language Generation: Paving the Road to Success. WiNLP workshop at ACL, Vancouver 2017. [arxiv pre-print] [slides]

  10. Amanda Cercas Curry, Helen Hastie and Verena Rieser. A Review of Evaluation Techniques for Social Dialogue Systems. ICMI Workshop on Investigating Social Interactions with Artificial Agents. Glasgow 2017. [arxiv pre-print]

  11. Simon Keizer and Verena Rieser. Learning Transferable Conversational Skills in a Multi-dimensional Framework. SemDial (Short Papers), Saarbruecken 2017. [pdf] [long arxiv version]


  1. Dimitra Gkatzia, Oliver Lemon and Verena Rieser. Natural Language Generation enhances human decision-making with uncertain information. Annual meeting of the Association for Computational Linguistics ACL 2016. [arXiv] [data]

  2. Eshrag Rafaee and Verena Rieser. A Hybrid Approach for Determining Sentiment Intensity of Arabic Twitter Phrases. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), 2016. [pdf] winner of SemEval'16 challenge sub-task 7!

  3. Verena Rieser. Women listen and men look? How to best communicate risk to support decision-making. Understanding Uncertainty, 2016. [link]

  4. Verena Rieser, Amy Isard and Dimitra Gkatzia. Proceedings of the 9th International Conference on Natural Language Generation (INLG'17). Edinburgh, September 2016. [website]

  5. Simon Keizer and Verena Rieser. The MaDrIgAL project: Multi-Dimensional Interaction Management and Adaptive Learning. International Workshop on Domain Adaptation for Dialog Agents (DADA) 2016.

  6. Jekaterina Novikova and Verena Rieser. The aNALoGuE Challenge: Non Aligned Language GEneration. The 9th International Natural Language Generation conference INLG, 2016. [pdf]

  7. Jekaterina Novikova, Oliver Lemon, Verena Rieser. Crowd-sourcing NLG Data: Pictures Elicit Better Data. The 9th International Natural Language Generation conference INLG, 2016. [pdf] [data]

  8. Dimitra Gkatzia, Verena Rieser and Oliver Lemon. How to Talk to Strangers: Minimising Regret when Generating Medical Reports for Unknown Users. IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Vancouver, Canada, 2016. [pdf]

  9. Phil Bartie, William Mackaness, Dimitra Gkatzia, Verena Rieser. The REAL Corpus: A Crowd-Sourced Corpus of Human Generated and Evaluated Spatial References to Real-World Urban Scenes. 10th International Conference on Language Resources and Evaluation (LREC'16). 2016. [pdf][data]

  10. Amanda Cercas Curry and Verena Rieser. A Subjective Evaluation of Chatbot Engines. WOCHAT Shared Task Report. Second Workshop on Chatbots and Conversational Agent Technologies. 2016. [pdf]


  1. Dimitra Gkatzia, Verena Rieser, Phil Bartie and Wiliam Mackaness. From the Virtual to the Real World: Referring to Objects in Spatial Real-World Images. In: Proc. of Conference on Empirical Methods in Natural Language Processing and Natural Language Learning EMNLP 2015. (Acceptance rate: 24%) [pdf] [data]

  2. Nina Dethlefs, Helen Hastie, Heriberto Cuayahuitl, Yanchao Yu, Verena Rieser and Oliver Lemon. Information Density and Overlap in Spoken Dialogue. Computer Speech and Language (CSL), 2015. [link]

  3. Dimitra Gkatzia, Amanda Cercas Curry, Verena Rieser and Oliver Lemon. A game-based setup for data collection and task-based evaluation of uncertain information presentation. Demo Paper. 15th European Natural Language Generation 2015 workshop (ENLG 2015). Brighton, UK, 2015. [pdf] [play online]

  4. Amanda Cercas Curry, Dimitra Gkatzia and Verena Rieser. Generating and Evaluating Landmark-based Navigation Instructions in Virtual Environments. Short Paper. 15th European Natural Language Generation 2015 workshop (ENLG 2015). Brighton, UK, 2015. [pdf]

  5. Eshrag Refaee and Verena Rieser. Benchmarking Machine Translated Sentiment Analysis for Arabic Tweets. In: Proc. NAACL, Student Research Workshop 2015. [pdf]

  6. Verena Rieser and Dimitra Gkatzia. Generation for Things Unknown: Accounting for First-Time Users and Hidden Scenes. 1st International Workshop on Data-to-Text Generation. Edinburgh, UK, 2015. [pdf]

  7. Verena Rieser, Dimitra Gkatzia and Amy Isard. Proceedings of the 1st International Workshop on Data-to-Text Generation. [website]


  1. Nina Dethlefs, Heriberto Cuayáhuitl, Helen Hastie, Verena Rieser and Oliver Lemon. Cluster-based Prediction of User Ratings for Stylistic Surface Realisation, EACL 2014. [pdf]

  2. Verena Rieser and Philippe Muller. Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue (SemDial’14 – DialWatt). [website]

  3. Micha Elsner and Verena Rieser. Proceedings of 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Discourse and Dialog. 2014, Gothenburg, Sweden.

  4. Eshrag Refaee and Verena Rieser. An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis. 9th International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland, 2014. [pdf] [data]

  5. Eshrag Refaee and Verena Rieser. Evaluating Distant Supervision for Subjectivity and Sentiment Analysis on Arabic Twitter Feeds. In: Arabic NLP workshop (ANLP, co-located with EMNLP-2014) Qatar, 2014.

  6. Verena Rieser, Oliver Lemon and Simon Keizer. Natural Language Generation as Incremental Planning Under Uncertainty: Adaptive Information Presentation for Statistical Dialogue Systems. IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 22, Issue 5, 2014. []

  7. Oliver Lemon, Srini Janarthanam and Verena Rieser: Reinforcement Learning Approaches to Natural Language Generation in Interactive Systems. In: Srinivas Bangalore and Amanda Stent, editors, Natural Language Generation in Interactive Systems. Cambridge University Press. [link]

  8. Eshrag Refaee and Verena Rieser. Can We Read Emotions from a Smiley Face? Emoticon-based Distant Supervision for Subjectivity and Sentiment Analysis of Arabic Twitter Feeds. 5th International Workshop on EMOTION, SOCIAL SIGNALS, SENTIMENT & LINKED OPEN DATA. Reykjavik, Iceland, 2014. [pdf] [data]

  9. Eshrag Refaee and Verena Rieser. Subjectivity and Sentiment Analysis of Arabic Twitter Feeds with Limited Resources. Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools (OSACT). Reykjavik, Iceland, 2014. [pdf] [data]

  10. Verena Rieser. On the Use of Peer Feedback in Large Programming Classes. In: International Computing Education Research (ICER'14). Lightning Talk. Glasgow, 2014.

  11. Dimitra Gkatzia, Verena Rieser, Alexander McSporran, Alistair McGowan, Alasdair Mort and Michaela Dewar. Generating Verbal Descriptions from Medical Sensor Data: A Corpus Study on User Preferences. Health Informatics Scotland (HIS'14). Glasgow, 2014.

  12. Verena Rieser and Amanda Cercas Curry. Towards Generating Route Instructions Under Uncertainty: A Corpus Study. In: 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial/DialWatt) - Short Papers. Edinburgh, 2014.

  13. Verena Rieser, Srinivasan Janarthanam, Andy Taylor, Yanchao Yu and Oliver Lemon: SpeechCity: A Conversational City Guide based on Open Data. In: 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial/DialWatt) - Short Papers. Edinburgh, 2014.

  14. Wenshuo Tang, Zhuoran Wang, Verena Rieser and Oliver Lemon. Sample Efficient Learning of Strategic Dialogue Policies. In: 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial/DialWatt) - Short Papers. Edinburgh, 2014.

  15. Aimilios Vourliotakis, Ioannis Efstathiou and Verena Rieser. Detecting Deception in Non-Cooperative Dialogue: A Smarter Adversary Cannot be Fooled That Easily. In: 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial/DialWatt) - Short Papers. Edinburgh, 2014.

  16. Callum Main, Zhuoran Wang and Verena Rieser. Towards Deep Learning for Dialogue State: Tracking Using Restricted Bolzman Machines and Pretraining. In: 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial/DialWatt) - Short Papers. Edinburgh, 2014.

  17. Nina Dethlefs, Heriberto Cuayahuitl, Helen Hastie, Verena Rieser and Oliver Lemon. Getting to Know Users: Accounting for the Variability in User Ratings. In: 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial/DialWatt) - Short Papers. Edinburgh, 2014.