Can You Trust Generative AI With Numbers?
How LLMs Really “Do” Math And What It Means for Finance, Audit, and Tax
If you’ve ever pasted a few numbers into ChatGPT and gotten a convincing answer back, it’s tempting to think:
“Nice, this can replace my calculator, my spreadsheet, and maybe my junior analyst.”
My experiments below, together with the last few years of research on large language models (LLMs), say: not so fast.
In this explainer I’ll walk through:
How LLMs actually “compute” something as simple as 2 + 2
Why they can be near-perfect on some math tasks and terrible on others
What the following experiments reveal about failures on longer arithmetic and multi-step word problems
What this means for finance, audit, and tax professionals who are already using GenAI in day-to-day work
A practical “math safety” checklist you can adopt for your own GenAI policies