L&T24: Language and thought
Topic leaders
Ivana Kajić | Google DeepMind
Guido Zarrella | MITRE
Co-organizers and invited speakers:
Nicole Sandra-Yaffa Dumont | University of Waterloo
Vector Symbolic ArchitecturesOwen He | Google DeepMind
Mixture of A Million ExpertsKathryn Simone | University of Waterloo
Neuromodulation and ComputationJason Eshraghian | University of California, Santa Cruz
Scalable MatMul-free Language ModelingSteve Abreu | University of Groningen Quantization and Sparsity in State Space Models
Xuezhe Ma | University of Southern California Information Sciences Institute
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context LengthTim Welsh | MITRE
AI-Ready Wildfire Simulation, and Theory of Mind in Multi-agent Reinforcement Learning
Ivana Kajić , Google DeepMind
Guido Zarrella, MITRE
Nicole Sandra-Yaffa Dumont, University of Waterloo
Goals
The goal of this topic area is to explore how cognitive-science inspired techniques can improve the performance, efficiency, creativity and evaluation of large foundational models. Large language models (LLMs) have achieved remarkable results on various natural language processing tasks, but they also face significant challenges and limitations, such as the high computational cost, the lack of generalization and robustness, the difficulty of incorporating or updating prior knowledge and reasoning, and the ethical and social implications of their use. This topic area aims to address these challenges by applying neuromorphic, brain-inspired computing principles that move the frontiers of today’s AI towards better power-efficiency, faster learning, low latency, and improved cognitive abilities. This topic area also connects from the area of neuromorphic computing and hardware engineering to other mainstream areas driving the future of artificial intelligence research.
Projects
Biological brains are very efficient at processing structured representations derived from natural environments, whether it involves making sense out of complex visual scenes or deriving meaning from a sentence in a language we are learning. In this project we propose to explore integration of different types of structured representations (for example, Vector Symbolic Architectures) into contemporary AI models and algorithms. We will explore how representations that have been successful in modeling various cognitive functions can be incorporated into RL algorithms and LLMs.
Exploring cognitive science inspired evaluations of frontier LLMs and Visual Language Models (VLMs), with possible applications in designing and testing self-reflection and self-rewarding mechanisms, such as meta-learning and intrinsic motivation, to enhance self-awareness, self-improvement, and self-regulation.
Investigating and implementing neuromorphic techniques for LLMs, such as sparsity, synaptic plasticity, and neuromodulation, that can reduce their resource requirements, increase their adaptability, and facilitate their synchronization and coordination. This could be in the form of spiking neural networks (e.g. SpikeGPT) or by exploring how Mixture of Experts (MoE) models use very coarse forms of sparsity that could be informed by bio-inspired techniques.
Studying and modeling memory consolidation processes in LLMs, such as rehearsal and sleep, that can improve their long-term retention and generalization of learned knowledge and skills. This could include exploring new techniques for developing and evaluating memory-augmented LLMs, such as Retrieval Augmented Generation, that can retrieve and integrate relevant information from external sources to enhance their performance on various tasks.
Recommended reading:
[MoE] Krajewski, Jakub, Jan Ludziejewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera et al. "Scaling Laws for Fine-Grained Mixture of Experts." arXiv preprint arXiv:2402.07871 (2024). [pdf]
[MoE] Mirzadeh, Iman, Keivan Alizadeh, Sachin Mehta, Carlo C. Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, and Mehrdad Farajtabar. "Relu strikes back: Exploiting activation sparsity in large language models." arXiv preprint arXiv:2310.04564 (2023). [pdf]
[SNNs] Dumont, Nicole Sandra-Yaffa, P. Michael Furlong, Jeff Orchard, and Chris Eliasmith. "Exploiting semantic information in a spiking neural SLAM system." Frontiers in Neuroscience 17 (2023): 1190515. [pdf]
[Eval] Ivana Kajić, Olivia Wiles, Isabela Albuquerque, Matthias Bauer, Su Wang, Jordi Pont-Tuset, Aida Nematzadeh: "Evaluating Numerical Reasoning in Text-to-Image Models" arXiv preprint arXiv: 2406.14774 (2024). [pdf]
[Eval] Frank, Michael C. "Baby steps in evaluating the capacities of large language models." Nature Reviews Psychology 2, no. 8 (2023): 451-452. [pdf]
[Eval] Chang, Yupeng, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen et al. "A survey on evaluation of large language models." ACM Transactions on Intelligent Systems and Technology (2023). [pdf]