How Do LLMs Think?

This piece, by Onno Berkan, was published on 09/11/24. The original text, by Matthew Hutson, was published by Nature on 05/14/24.

We’ve seen the tech that tries to decrypt human thoughts, now let’s see how we peek inside the minds of AIs. Many AI systems, especially LLMs (Large Language Models) are black boxes, in that we know their input, and we know their output, but we don’t quite know what happens in between. This is what the field of Explainable AI (XAI) attempts to solve, by reverse-engineering AIs. For LLMs like Chat-GPT, Claude and many Chatbots, however, this is particularly difficult. While new breakthrough technologies seem promising in explaining AIs thought processes, they haven’t produced any mind readers. They have produced fascinating results about how the problems of reading AI minds may allow us to read human minds, though…

Explaining why AIs make the decisions they do is very important for a multitude of reasons, including the reduction of bias and the spread of misinformation. We ought to know why, for instance, “the AI recommended that a prisoner be paroled or came up with a particular medical diagnosis,” writes Matthew Hutson. We rely on AI systems for all sorts of things, from that essay you had ChatGPT write to medical advice and even “high-risk systems” such as biometric identification. In fact, these explanations are required for such high-risk systems inside the EU, which is hoped to reduce many kinds of biases and baseless convictions.

This is where XAI comes into play, which builds decision trees to approximate an AIs behavior. This can increase an AIs trustworthiness and explain why, for example, the system identified an image as that of a dog. Decision trees don’t quite work for LLMs, however, as they often prove to be a little too complex.

LLMs being too complex to accurately predict has had scientists such as Harvard’s Martin Wattenberg questioning whether predicting LLMs’ outputs could help us better understand what goes on inside our own minds. That is to say, LLMs are the closest thing we have to an artificial mind we have right now.

Yet, that doesn’t quite mean they’re particularly close. One study by a company called Anthropic asked an LLM if it consented to being shut down. As expected, the LLM responded the way a dying poet accepting their fate does, with beautiful parting words… And that was because the LLMs words were lifted from a prolific author. The LLM had access to Arthur C. Clarke’s character Hal, an articulate AI that’s abandoned by humans. As such, we don’t actually know if the LLM had realized that it was staring death in the eyes; we just know that it knew how Hal would have acted in this situation. When it seems like AI systems react to stuff just like how they do in movies, it’s important to keep in mind that that’s what they’re trained on!

Back to how all of this relates to the human mind: understanding and studying the LLM ‘mind’ can be a much cheaper way to comprehensively study the human mind. You don’t have to get consent, you can tweak and study individual neurons, and you can conduct a lot of experiments that would be detrimental to humans. The whole point is that, while they are very different, the human and LLM ‘minds’ have many similarities, and these similarities will only increase going forward. Researchers have already built ‘lie detectors’ for AIs that are over 90% accurate, and have made great strides in understanding which part of the AI ‘brain’ causes which response.

This is a beautiful full circle moment, where the field which many go into to better understand the human mind comes back with a promise to revolutionize neuroscience. I think this should be exciting for all of us, I know it’s exciting to me at least. Stay curious.

Thank you for reading. Reminder: Byte Sized is open to everyone! Feel free to submit your piece.

All submissions to berkan@usc.edu with the header “Byte Sized Submission” please. Thank you!

Page updated

Google Sites

Report abuse