Trustworthy AI

In recent years, we have witnessed the shift of paradigms in NLP from fine-tuning large-scale pre-trained language models (PLMs) on task-specific data to prompt-based learning. In the latter, the task description is embedded into the PLM input, enabling the same model to handle multiple tasks. While both approaches have demonstrated impressive performance in various NLP tasks, their opaque nature makes comprehending their inner workings and decision-making processes challenging for humans. We conduct research to address the interpretability concerns surrounding neural models in language understanding. Our work includes a hierarchical interpretable text classifier going beyond word-level interpretations, uncertainty interpretation of text classifiers built on PLMs, explainable recommender systems by harnessing information across diverse modalities, and explainable student answer scoring.

Participants

Hanqi Yan, Jiazheng Li, Linhai, Zhang, Runcong Zhao, Lixing Zhu, Lin Gui, Yulan He

Projects

Event-Centric Framework for Natural Language Understanding (Jan 2021-Dec 2025), Turing AI Fellowship, funded by the UKRI.
A Lebesgue Integral based Approximation for Language Modelling (2023-2025), funded by the EPSRC.
Twenty20Insight (2020-2023). Funded by the EPSRC.

Publications

J. Li, Y. Zhou, J. Lu, G. Tyen, L. Gui, C. Aloisi and Y. He. Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time. arXiv:2502.19230, 2025.
H. Yan, X. Cui, L. Yin, P.P. Liang, Y. He and Y. Wang. The Multi-Faceted Monosemanticity in Multimodal Representations. arXiv:2502.14888, 2025.
L. Zhang, Z. Gao, D. Zhou and Y. He. Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation. arXiv:2503.01315, 2025.
J. Li, A. Bobrov, D. West, C. Aloisi and Y. He. An Automated Explainable Educational Assessment System Built on LLMs. The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025.
C. Zhang, L. Zhang, J. Wu, Y. He and D. Zhou. Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment. The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025.
H. Yan, Y. Xiang, G. Chen, Y. Wang, L. Gui and Y. He. Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
J. Lu, J. Li, S. An, M. Zhao, Y. He, D. Yin and X. Sun. Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
Y. Zhou, J. Li, Y. Xiang, H. Yan, L. Gui, and Y. He. The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
S. Qi, Y. He and Z. Yuan. Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation. arXiv:2404.12041, 2024.
H. Yan, Q. Zhu, X. Wang, L. Gui, and Y. He. Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning.The 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024.
X. Wang, H. Xu, L. Gui, and Y. He. Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond. Findings of ACL, 2024.
Y. Xiang, H. Yan, L. Gui, and Y. He. Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models. Findings of ACL, 2024.
J. Lu, S. An, M. Zhang, Y. He, D Yin, and X. Sun. FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema, arXiv:2402.11811, 2024.
H. Zhang, L. Gui, Y. Lei, Y. Zhai, Y. Zhang, Y. He, H. Wang, Y. Yu, K.F. Wong, B. Liang, R. Xu. COPR: Continual Human Preference Learning via Optimal Policy Regularization, arXiv:2402.14228, 2024.
H. Zhang, Y. Lei, L. Gui, M. Yang, Y. He, H. Wang, R. Xu. CPPO: Continual Learning for Reinforcement Learning with Human Feedback. The 12th International Conference on Learning Representations (ICLR), 2024.
H. Yan, L. Gui, M. Wang, K. Zhang and Y. He. Explainable Recommender with Geometric Information Bottleneck, IEEE Transactions on Knowledge and Data Engineering, to appear.
J. Li, L. Gui, Y. Zhou, D. West, C. Aloisi and Y. He. Exploring Explainable Automated Student Answer Assessment with ChatGPT, Findings of EMNLP, 2023.
H. Yan, L. Kong, L. Gui, Y. Chi, E. Xing, Y. He, K. Zhang. Counterfactual Generation with Identifiability Guarantees. The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), New Orleans, US, 2023.
J. Li, Z. Sun, B. Liang, L. Gui and Y. He. CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models. The 39th Conference on Uncertainty in Artificial Intelligence (UAI), Pittsburgh, PA, USA, Aug. 2023.
J. Li, R. Zhao, Y. He and L. Gui. OverPrompt: Enhancing ChatGPT Capabilities through an Efficient In-Context Learning Approach, arXiv:2305.14973.
H. Yan, L. Gui, M. Wang, K. Zhang and Y. He. Explainable Recommender with Geometric Information Bottleneck, arXiv:2305.05331.
H. Li, H. Yan, Y. Li, L. Qian, Y. He and L. Gui. Distinguishability Calibration to In-Context Learning, Findings of EACL, 2023.
Z. Fang, Y. He, R. Procter, L.A. Alqazlan and D. Liu. A User-Centered, Interactive, Human-in-the-Loop Topic Modelling System. The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), May 2023.
H. Yan, L. Gui and Y. He. Hierarchical Interpretation of Neural Text Classification, Computational Linguistics, to appear.
H. Yan, L. Gui, W. Li ad Y. He. Addressing Token Uniformity in Transformers via Singular Value Transformation. 38th Conference on Uncertainty in Artificial Intelligence (UAI), Eindhoven, Netherlands, Aug. 2022.
L. Zhu, Z. Fang, G. Pergola, R. Procter and Y. He. Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media. 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Jul. 2022.

Google Sites

Report abuse