LT25: Language and Thought

Topic leaders

Steve Abreu | University of Groningen | s.abreu@rug.nl
Nicole Dumont | University of Waterloo | ns2dumon@uwaterloo.ca

Co-organizers and invited speakers:

Guido Zarrella | MITRE | guido@mitre.org
Ivana Kajić | Google DeepMind | kivana@google.com
Alessandro Pierro | Intel Labs | alessandro.pierro@intel.com
Anand Subramoney | RHUL | anand.subramoney@rhul.ac.uk
Emre Neftci | FZ Juelich | e.neftci@fz-juelich.de
Chris Eliasmith | University of Waterloo | celiasmith@uwaterloo.ca
Maxence Ernoult | Google DeepMind

Anya Ivanova | Georgia Institute of Technology
Robert M. Mok | Center for Information and Neural Networks

Goals

With this topic area, we explore how neuromorphic principles can enhance the performance, efficiency, and robustness of state-of-the-art machine learning models. While foundation models excel in many tasks, they face significant challenges, such as high computational cost, lack of continual learning and knowledge editing, and limited reasoning abilities. We aim to bridge disciplines and collaborate between fields - mainstream machine learning, robotics, neuromorphic engineering, neuroscience, cognitive science, and psychology - by focusing on sparsity and always-on reasoning as convergence points.

Neuromorphic hardware offers energy-efficient, adaptive alternatives to GPUs/TPUs, enabling scalable and sustainable AI systems for real-world tasks. Participants will collaborate to develop models leveraging neuromorphic hardware for efficient training, inference, and applications. We will also connect sparsity to interpretability, using sparse autoencoders as tools for mechanistic interpretability while drawing parallels to neuroscience methods for functional understanding of complex systems. Insights from cognitive science, such as human reasoning mechanisms, will complement these efforts, synergizing with neuromorphic techniques to create adaptable, robust AI whose inner workings we can analyze and intervene on.

Talks and Tutorials

There’s plenty of room at the top: Scaling up deep learning

Anand Subramoney

Scaling up deep learning has relied on getting faster and more parallel hardware so far. But this approach is hitting its limits for energy consumption. While neuromorphic computing provides a promising alternative paradigm, the progress on developing neuromorphic-first primitives for DL has been slow. The focus is still on either biology or directly adapting mainstream DL algorithms developed for GPUs. In this talk, I will explain how a principled neuromorphic-first algorithmic design is key for scaling deep learning on neuromorphic hardware. This is a neuromorphic version of “There’s plenty of room at the Top”, a 2020 paper by Leiserson et al. on how algorithms will drive performance after Moore’s law. Which itself was a play on Feynman’s 1959 talk “There's Plenty of Room at the Bottom” which foresaw how miniaturization of hardware will drive computer performance at that time.

Learning long range dependencies through time-reversal symmetry breaking

Guillaume Pourcel

Recurrent Hamiltonian Echo Learning (RHEL) is a "forward-only" proxy of BPTT (Backpropagation Through Time) that uses time-reversal symmetry breaking as a credit assignment mechanism. This approach enables training bespoke RNNs and stacked architectures (called "Hamiltonian SSMs") for long range benchmarks, offering an alternative compute paradigm for model inference and training.

Next generation large-scale brain (and language) models: A prospectus

Chris Eliasmith

My lab is starting to build the next generation of our large-scale brain model, called Spaun 3.0. In this talk I will outline key methods we intend to integrate (including those related to LLMs), and our overarching goals for this model. A main goal with the talk is to start a conversation about what is most compelling to include in such a model. What kinds of functions and tasks would have the biggest impact and provide the most interesting behaviors and comparisons to neural data? What kinds of challenges are outstanding for large-scale brain modeling and what would count as addressing them? Ideally this conversation will guide our work and make the result broadly useful for neuroscientists and engineers.

Neuromorphic Principles for Self-Attention

Emre Neftci

The causal decoder transformer is the workhorse of state-of-the-art large language models and sequence modeling. Its key enabling building block is self-attention, which acts as a history-dependent weighting of sequence elements. Self-attention can take a form strikingly similar to synaptic plasticity, which can be efficiently implemented in neuromorphic hardware.

So far, challenges in deep credit assignment have limited the use of synaptic plasticity to relatively shallow networks and simple tasks. By leveraging the equivalence between self-attention and plasticity, we explain how transformer inference is essentially a learning problem that can be addressed with local synaptic plasticity, thereby circumventing the online credit assignment problem. With this understanding, self-attention can be further improved using concepts inspired by computational neuroscience, such as continual learning and metaplasticity. Since causal transformers are notoriously inefficient on conventional hardware, neuromorphic principles for self-attention could hold the key to more efficient inference with transformer-like models.

Neuromorphic State Space Models for Intel Loihi 2

Alessandro Pierro

Linear RNNs and State Space Models (SSMs) have emerged as powerful backbones for sequence modeling, offering a concrete alternative to full self-attention and better suitability for neuromorphic processors. In particular, their constant compute and memory requirements per step promise a path to bring advanced AI capabilities to low-power edge devices. In this tutorial, we will demonstrate a hardware-aware methodology to optimize the S5 SSM architecture for the Intel Loihi 2 neuromorphic processor, combining unstructured pruning, activity sparsification, and quantization-aware training. The resulting models exhibit a wide Pareto front compared to dense baselines on audio denoising and keyword spotting tasks. Moreover, when deployed on Loihi 2, our models demonstrate up to 42× lower latency and 149× lower energy consumption compared to a dense model on an edge GPU.

Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations

Varun Dhanraj

Large language models (LLMs) continue to face challenges in reliably solving reasoning tasks, particularly those that require precise rule following, as often found in mathematical reasoning. This paper introduces a novel neurosymbolic method that improves LLM reasoning by encoding hidden states into neurosymbolic vectors, enabling problem-solving within a neurosymbolic vector space. The results are decoded and merged with the original hidden state, significantly boosting the model's performance on numerical reasoning tasks. By offloading computation through neurosymbolic representations, this method enhances efficiency, reliability, and interpretability. Experimental results demonstrate an average of 88.6% lower cross-entropy loss and 15.4 times more problems correctly solved on a suite of mathematical reasoning tasks compared to chain-of-thought prompting and supervised fine-tuning (LoRA), without degrading performance on other tasks.

Understanding Concept and Spatial Representations in Models and Brains through Tasks, Theory-based Analysis, and Interventions

Rob Mok

In computational cognitive neuroscience, we aim to understand the brain by building theoretical models that can explain and predict brain function. However, many approaches don't help us understand how brains or how models work. High accuracy on benchmarks (machine learning; ML), high correlations between brains and models (c.f., brainscore), and brain blobs (fMRI activity) at best provide hints but no explanations. I propose a minimum of three elements required for good theoretical explanations of brains and models: Task (performing brains/models), Theory-based Analysis (opposed to only exploratory), and Interventions (opposed to focusing only on metrics) - TTI. I will present three projects where we have used some or all of these to understand concept and category representations in the brain.

First, I will present a non-spatial account of place and grid cells where a task-performing, clustering algorithm explains key findings in both spatial and conceptual domains, opening up questions about the functional specificity of 'spatial' cells. Next, I question the validity of the intuitively classified spatial cells as genuine cell types as cells for spatial function. We use DNNs and VR with the TTI approach in a spatial environment, analyze model units and representations, its spatial knowledge, and use unit ablation to assess how the spatial knowledge arises in the model, and conclude that spatial cells are not genuine cell types unique to spatial cognition. Finally, I will present recent work using a DNN-TTI approach to model dedifferentiation in ageing, and present a novel explanation for dedifferentiation based on age-related white matter neurodegeneration. In sum, I will argue that the combination of these three elements are essential to building good explanations of brains and models, and each requires careful and deep consideration to make progress on understanding the brain.

An algorithmic cookbook for the design of self-learning machines.

Maxence Ernoult

This talk aims to be a pragmatic introduction to learning algorithms grounded in physics, which encompasses a broad class of models from feedforward nets to physical systems, taking static or temporal data as inputs. Starting from first principles, we present a minimal hierarchy of independent concepts to circumvent some problems inherent to the hardware implementation of standard differentiation. This way, we avoid entangling essential ingredients with arbitrary design choices by naively listing existing algorithms and instead propose the draft of a “cookbook” to help the exploration of many possible combinations of these independent mechanisms.

Dissociating Language and Thought in Large Language Models

Anya Ivanova

Today’s large language models (LLMs) routinely generate coherent, grammatical and seemingly meaningful paragraphs of text. This achievement has led to speculation that LLMs have become “thinking machines”, capable of performing tasks that require reasoning and/or world knowledge. In this talk, I will introduce a distinction between formal competence—knowledge of linguistic rules and patterns—and functional competence—understanding and using language in the world. This distinction is grounded in human neuroscience, which shows that formal and functional competence recruit different brain mechanisms. I will show that the word-in-context prediction objective has allowed LLMs to essentially master formal linguistic competence; however, pretrained LLMs still lag behind at many aspects of functional linguistic competence, prompting engineers to adopt specialized fune-tuning techniques and/or couple LLMs with external modules. I will then turn to world knowledge, a capability where the formal/functional distinction is less clear-cut, and discuss our efforts to leverage both cognitive science and NLP to develop systematic ways to probe LLMs’ world knowledge. Finally, I will discuss the do’s and don’ts of cognitive evaluations in LLMs.