Zunchen Huang - A reflection on LLM for coding

On September 11, 2025, I wrote the following to a colleague in an email which has a personal opinion on GenAI for coding:

But personally speaking, I believe that GenAI will replace most of software engineers even for senior/highly trained software engineers for two reasons. First, the GenAI development will find a way to leverage formal specifications so that the software programs satisfies all design/trade-off requirements (if that is achieved, the worries in this presentation are gone*). Second, GenAI is already able to solve smaller functionality now. The puzzle now is just how to combine the solutions, and I think this may not be too hard.

Now is May 2026, AI agents can solve large scale industrial programs under the assumption of a generous token budget. I was right on the replacement part but not particularly precise on the scalability part. The first part is solved by talking with the model in detailed project description, and by iteratively interacting with the milestone results. The puzzle to combine the solutions shows to be crackled by leveraging AI agents. And I truly believe I won't make a living by writing software codes just by typing.

The creator of Claude Code Boris Cherny 's talk on "Why Coding Is Solved, and What Comes Next" is particularly interesting in two folds. First, when he asked how many people do 100% coding by hand, no one raised their hand. But 50% of the cohort raised their hand when he asked who codes 100% with AI. This may already be a fact that the old programming fashion between human and machine is gone in industry. What about in schools, universities, and coding geeks' bedrooms? Can students who finish their project with GenAI without truly understanding the solution and theory graduate with a degree? Second, the tool is ready for startup blownups. It is showing concrete evidence that GenAI will generate more startup companies and compete with the existing tech companies without internal resistance.

So is coding solved? Here are some opinions I saw against it:

Opinion 1: The GenAI code will bury bugs for now and bite back in the future at a big cost.

Opinion 2: The GenAI code still cannot make it to particular niche domains.

Opinion 3: The GenAI code is too costly and at the moment more costly than hiring human programmers.

Those points are all debatable. The coding may have been solved against what we can propose and have described to the model. But there are some issues.

Token limits are going to be a real issue. A token limit is handy for the GenAI companies, it tracks user progress on whatever the user is doing in coding, brainstorming, article writing, etc. It has a deep psychological effect for users to force and reinforce the habit of using this product. It is similar for a phone roaming data plan, if a user only has 5g data a month, a certain pattern emerges: less video streaming, no activity after data cap is reached, so on and so forth. A token limit is horrible for the users in multiple dimensions. The user doesn't know how much tokens a particular project needs. More often than not, the complexity of a complex project can only be accurately determined in the mid/late stage of a project, so it is hard for users to plan on token usage. Also, reaching the token limit seems an achievement by itself, making the user comfortable with just finishing. Only the unlimited token users are true winners, like unlimited data roaming plans, they don't have any mental burdens, but can fully focus on finishing the project by burning the tokens day and night. The good news is that, if we are fully engaged in a project, at a certain time, we need to verify the correctness of the code, check a milestone, and add new features. We have a limit in using tokens. Let's rule out someone who is wild enough to intentionally use agents to break the crypto system or prove the correctness of famous unsolved problems**. It would be exciting to let an ordinary user use a cheap model without token limitation under a reasonable budget. I believe cheaper models (and still costly in a PC) would only consume a small amount of resources in the near future and users can truly enjoy building without worrying about token limits.

Functionality is solved. Optimization will be handled by models. Verification of safety and security guarantees will still have issues. I think it is already an exciting time particularly for the programmers with verification, testing, or quality assurance backgrounds. I also think there is only the verification problem. There is no safety/security by design because every metric is from the model performance, e.g., CWE benchmark performance. GenAI has its own data distribution patterns. Individual humans also have their own data distribution on coding output. It is simply not the same domain. The question is if individual provided insight is still relevant to solve a verification problem, i.e., if the code is safe and secure. The AI agents aren't really customers of the coding artifact. Humans are the end users, there will be cases discovered in the physical world and related to the codes which weren't caught by the model training process. Yes, it would be better in the next model but only patched with the physical world and new experience. Human programmers are better at this to correct the physical world and the coding world, and their insights on if the code is vulnerable is still relevant to an unknown extent*** in a more highlevel direction. Under simple guidance from humans, GenAI has shown the ability to discover tons of unknown high risk vulnerabilities in critical software, and it is also able to automatically patch it quickly. But what about the patch quality, is it consistent with the physical world unseen data? It might be patched but hallucinated a plausible solution which will show its issue in the future. This is somewhat linked to the feature of hallucination in GenAI models. Generating information with GenAI is producing useful and sometimes shocking results, but verifying against existing facts/information**** is costly. If a project is internally consistent (against a fixed set of logical rules) without fighting with external facts (knowledge/common sense/industry standards), can we say this is a good sign that it doesn't have a lot of safety or security issues? I am very eager to see GenAI coding improvement on safety and security, e.g., Anthropic's Project Glasswing.

* The presentation I mentioned above is "Why GenAI Won't Replace Software Engineers" by Alexander Pretschner.
** Alas, the recent openAI disapproved the planar unit distance problem conjectured by Paul Erdős. But looking at their paper, many renowned mathematicians participated in verifying the results, among whom declared no expertise or even had not seen this problem before. I wonder why they decided to verify the correctness of the problem? Is it because of the internal researchers who have expertise? Or does the AI output look like a correct one? Or a pure marketing tactic?

*** Could humanoid robots with advanced sensing technologies solve the physical world problem? I can imagine the future testing warehouse of a product will be tested with humanoid robots.

**** Not true/false information.