Is there a Mathematical Model of the Mind?

Recently I had the pleasure of organizing a workshop on “Conceptual Understanding of Deep Learning”. The workshop aired live over YouTube for anyone to participate (with questions taken over Twitter). The primary goal for this workshop was to bring together researchers from various disciplines motivated by the underlying basic difficult question of “what is the algorithm behind the brain/mind?”. Here mind doesn’t necessarily refer to the human mind but any artificial intelligent system that has the algorithmic capabilities of the human mind (or perhaps much more).

The basic question of “how the mind works” is a fascinating age old mystery -- we could try to look at the mind/brain as a mathematical object, a function that processes inputs from the environment and outputs predictions/actions/decisions; and we don’t have to find exactly how the human mind does it (although it’s a great inspiration for ideas). Instead we could find an alternate algorithm with the same algorithmic power -- which is exactly the goal of AI.


What is the architecture of such an algorithm -- what are the components of such an architecture? Is the current deep learning framework sufficient to create an intelligent system with power close to that of the mind? For instance, we can remember complex phenomena, index them and later retrieve them when a related event occurs later to help with the predictions/actions at that time; we can learn skills over time that build on top of each other; we can use logic and language with such ease and great power; we can adapt to new situations with dexterity. How would all this be achieved in a deep learning system? And will we be able to mathematically understand and claim the capabilities of such a system? The panel discussion on “Is there a mathematical model of the mind” explored these and related questions.



Christos Papadimitriou talking about Language and brain assemblies

The speakers and panelists included Turing award winners Geoffrey Hinton, Leslie Valiant, and Godel Prize winner Christos Papadimitriou, and experts from diverse backgrounds including, ML/AI, algorithms, theory, statistics, and neuroscience. Leslie and Christos spoke about the right architecture of AI that includes a reasoning layer on top of deep learned modules and how language can arise in the brain via neural assemblies. Other talks included how to make sense out of today’s DL models (are they merely correlation extractors) ( Aleksander Madry) and how symbolic, structured interpretations can be assigned to neurons (Jacob Andreas).



For algorithms that interact with the world, there were talks on reinforcement learning (Sergey Levine, Alekh Agarwal) and distributional shift that helps us adapt to new situations/distributions (Chelsea Finn). Leile Wehbe talked about how AI could benefit from some of the latest findings in Neuroscience . Transformers have created tremendous success for language understanding in recent years and there were talks on recent empirical systems (Colin Raffel) and theoretical understanding of transformers (Srinadh Bhojanapalli). We also saw theory talks on contrastive learning (Tengyu Ma) and CNNs (Suriya Gunasekhar), and gradient descent and generalization (regularization in SGD by Jason Lee, and tuning step size via learn to learn by Rong Ge).

Leile Wehbe pointing out synergies between AI and Neuroscience


Panel discussion

It was an honor to host the panel discussion with five panelists, each of whom are luminaries in their fields: Lenore Blum (theory of computation), Geoffrey Hinton (foundations of deep learning), Jack Gallant (neuroscience), Percy Liang (NLP/ML) and Bin Yu (statistics/ML). It was interesting to hear the diversity of perspectives on several basic questions discussed below.

Panelists responding to the main question of the discussion


Is there a Mathematical Model for the Mind? (Here mind essentially refers to its algorithmic capabilities; it doesn’t have to be the human mind but an artificial intelligent system with the same or higher abilities.) Will we be able to find such an algorithm and claim confidently with a mathematical proof it has those abilities, or is this study mostly an empirical science?


The views were very diverse ranging from: “what is the mind”, “why have a mathematical model of the mind”, “there may be a model which is too long to describe and so we should look for the right useful level of abstractions”, “how will we know when we have achieved the capability of the mind”. Geoffrey feels that there is a difference between microscopic and macroscopic understanding of the brain. At a microscopic level the rules may be very simple but there may not be any “simple” understanding at a macroscopic level; just like the Navier stokes equation in fluid dynamics -- they explain what is happening at a microscopic level but in the turbulent regime it doesn’t help much in providing a macroscopic picture which may be similar to what is happening in the brain; especially for intuitive reasoning it may not be possible to describe the human system in a small set of rules. Similar views were echoed by Jack who felt that one interpretation of mind is the software that is running on the hardware which is the brain. He felt that there certainly is a model of the brain at the level of microtubles of Neurons, but such a model is not a useful algorithmic abstraction and hence we should be looking for good useful abstractions that are at a much higher level rather than thinking at the level of Neurons. Percy thinks that AI is more an alien type of intelligence rather than human intelligence which can often be very irrational and recalled the quote that “all models are wrong but some are useful” and pointed. In contrast Bin felt that the human mind is amazingly efficient (for example look at energy efficiency) and in terms of a mathematical model there is a lot to learn from evolution and empirical measurements in neuroscience. Lenore spoke about her work on Conscious AI based on Global Workspace theory and felt that it is important to think of the mind beyond just as an input/output system and to come up with a basic theory of the mind just as Turing machines concretize any algorithm. (I could hardly elicit a final yes/no answer, but maybe one shouldn’t hope for such a simple response for this difficult fundamental question subject to so many interpretations making it rather open ended.)



When you look at today's deep learning framework, what seems to be the main architectural components that you think it is lacking (logic/symbolic-reasoning, memory, modularity, continual learning?) Are there major architectural ingredients in the Human brain that are missing here?


Geoffrey felt that one thing missing till the advent of transformers was dot products of activity vectors by activity vectors instead of just dot products of activity vectors with weight vectors which allows causal correlation. One of the several main things missing now is not logic but “fast weights” (weights that adapt and decay rapidly as an overlay on existing weights) that would enable true recursion by using the same neurons and the same connections for the recursive call that would enable a stack -- we don’t have enough time scales -- in biology there are six different time scales. Lot of this is absent today due to hardware limitations. Jack agreed that representations change at different time scales and the amount of change particularly depends on the attention which can change spike rate anywhere from 2% in prefrontal cortex or 15% in the mid level visual system which changes the representation as seen by Neurophysiology and MRI. This view of multiple representations is completely missing in today’s frameworks based on test/train paradigm. It may make sense to include elements of Bernard Barrs Global workspace theory which was the original model of cognition in the 60’s in DL (Lenore). Another missing component could be that of having a rich environment instead of a limited “training data”. (Bin). And of course, it is important to be able to put together a swarm of individual systems into an intelligent whole (Percy).


How do we remember things? If we meet someone we seem to remember and recall later who we met what we talked about and relate it to other events and people -- Is there a knowledge graph, lookup table of objects (with different “types”) in the brain? In an artificial system using deep learning how would such a knowledge graph of objects arise automatically?



It’s important to not just think of knowledge graph as a triple store or propositional attributes (Percy). Jack pointed out that it is useful to think of two memory systems: one is the long term store (which as a shortcut may be thought of as hopfield networks but very poorly understood) buried in synapses and involves the hippocampus, and the other is the more modern cortical working memory bound with language and distributed over the human cortex; incoming sensory information gets distributed throughout the semantic cortical memory and there are multiple points at which it interacts with long term priors in the long term memory. The memory of a unitary concept of a dog is diffused across different regions: how it looks is stored in the visual system, how it sounds is in the auditory system, if a dog bit you that is stored in the prefrontal cortex. A “hammer” has multiple representations such as “a tool”, “its functionality” and “its visual representation” that are connected perhaps by the long term memory (Bin) and a “word” is in fact an association between a sound and a meaning (Geoffrey). There is currently no good understanding of long term memory although there are some proposals based on the Global workspace theory.


Is there a library of modules/functions in the human brain -- are they physically separate modules? Is there a program call stack that tracks calls functions and passes arguments. Do we need to add such a stack and modules to deep networks? How are new concepts formed (both in an artificial and real brain); Or a new instance of an existing concept -- like when we meet a new person?


Geoffrey thinks there is a program stack and that is implemented using fast weights. As for the presence of different modules the responses here were limited. There was talk about how modules could have arisen through evolution and such an evolutionary process can be mimicked in deep learning by Network architecture search (for example NAS Quoc Le). Jack said there was no knowledge of how functions are “called” in Neuroscience. When asked how functions and subroutines are shared across other functions: we likely have distributed representation of functions and new concepts formed as attractors (Geoffrey). (There is work on Capsule Networks which are like modules but somehow that didn’t come up in the discussion).


As humans when we make a decision or recognize an object we seem to be able to give reasons for their decisions. Why is this so difficult for deep learning systems?


The response was unanimously no -- humans are no better at giving interpretations for their decisions than machines. In fact machines have the potential to be better at giving “reasons” for their predictions. The appearance or feeling that we are better at making reasoned predictions is more of an illusion and a post facto rationalization.


Let’s compare programming languages and natural languages -- they seem so different but may have some similarities. Are any of the compiler-theory/parsers for PL be reused for NLP?


Percy pointed out how both are languages, but they are very different in structure. The flexibility and lack of completeness in natural languages and redundancy makes them very different. Natural language sentences are not taken literally (for example the sentence “I’ll try and do it” actually means “I’ll try to do it”). Natural languages do share some of the structure of the PL’s such as rules of grammar that are fairly rigid -- but humans have this surprisingly ability to communicate even with very limited knowledge of a new language (such as using gestures or context).



Lenore, you have written about consciousness and programs How do you connect machines with consciousness? Is a CTM actually conscious and have conscious feelings or does it just simulate those feelings and their effects?


There are two major theories of consciousness -- one is the Global Neuronal Workspace theory based on things in the brain, short/long-term memory which is closer to our theory , and another is the Integrated Information Theory that seems to be more mathematical and gives a measure of consciousness based somewhat on the feedback systems; one issue with the latter is that according to this theory even a thermostat is conscious whereas we don’t believe computers today are really conscious. In fact, there is a debate over whether animals are conscious. But it’s not just the algorithm in terms of inputs and outputs but it's really the architecture and its dynamics as a system that determines its consciousness. We are looking into something called a mirror test (like a Turing test) to test whether a system is conscious. The short answer is that programs as we know them today are not conscious.



Acknowledgments: The workshop was made possible with great help from Pranjal Awasthi, Manzil Zaheer, and the Google planning/production Team.


Author: Rina Panigrahy, Research Scientist, Google