Slot-machine based development is the new black
Last year, I wrote about my experiences working with Claude Code for a month. I was very optimistic about the future of these tools and the fact that you could describe a change, wait a few seconds and was code appear that actually looked correct.
Almost a year later, I'm less convinced.
My thoughts on these tools changes more often than Trump changes his tarifs. This is not a good sign. Especially when these models supposedly are getting better and better and predications made from people that definitely doesn't have an agenda /s, says we are only 6 months away from software being solved.
I think we are entering the area of slot-machine based development.
Welcome to the Casino
You open claude code/codex/opencode/crush/etc and do the following:
- Pull the level (or, press the magic button)
- Code appears
- It looks correct
- You 'review' it
- Merge it into master and continue on to the next thing
If you get stuck at step 3, pull the level and try one more time.
It's super addicting and gives you an illusion of productivity which is one of the core claims of AI driven development. You supposedly move 100x faster (which, doing a bit of math is a ridiculas statement).
But, you've now lost the mental model of the code in favour of feeling productive. It has never been a competitive advantage to write code fast but writing the right code. If you could also do that fast you are golden.
The same foundamentals that makes software stable, robust and good haven't changed. Writing software has always been constrained by thinking, design, trade-offs and understanding/mapping real world problems into proper systems.
Letting these models run wild over time looks like we simply run faster into a bowl of spaghetti code rather than more high quality software.
Losing the mental model
It's 2am, you get a call whatever-monitoring-tool you are using that production is down. You check the issue, examine the logs, you identify where the problematic code is and spend some time fixing it. Release and go back to bed.
It's very tempting to replace that flow with automation so you can get some hard earned sleep. But what happens when the agent(s) can't figure out what's wrong. What happens when you have to dig through an ungodly amount of unfamiliar code because you are the one responsible.
Add to this that these tools are nondeterministic. It was likely the tool that introduced the bug in the first place, how confident are you that it will actually fix the bug.
But Morten, doesn't humans introduce bugs as well? What's the difference?
I hear you, but atleast the human has a feeling of the code. An idea of how it's composed and structured and what might be the real underlying issue.
AI tend to write alot of code. Indirection of another indirection. Garry Tan (yc CEO) boast about writing 10k lines of code a day and created a blog built using Rails that consist of some 200-300k lines of code which is absolutely insane to show some text files in a browser.
He might be an extreme case of this. But it happens if you are not consistently monitoring/watching these things to a point where you might as well write it yourself. The productivity gain is gone, atleast in the form that I read the commonly made arguments from Big AI.
I do get tired of writing code, which happens faster as I've gotten older. Switching over to letting the AI write and me monitor does increase my coding stamina. I can produce code for longer. But I still have to watch it.
Real project experience
I have been building a SaaS tool lately, called DeployCrate. In the persuit of productivity I offloaded a lot of the simpler model writing to AI. It's not really critical as such, mainly fetching and storing non-essential data.
And it worked. But, looking at the code while I was starting the usual AI refactor process was just horrible. It haven't followed established practices. It made an obscene amount of indirections and "helper" functions that made reading the code much harder than it needs to be. The codebase is in Go which is a notoriously simple language that boast of being very easy to read.
I also lost touch with the fundamentals of the much of the data layer which I only regained after getting it into a proper shape. The time it took me to do that + the time it took the AI to write it makes me very doubtful thatI saved any time at all, compared to just writing it myself.
I'm aware that there are a lot of easy counter arguments in favour of AI to what I just wrote. What about Peter Steinberger and is lobster project? That seems to work just fine if you discount the security incidents. This could also just boil down to skill issue on my part, true, but I have been using and exploring these models daily for the better part of 2 years now.
If you check the uptime for anthropic/openai/microslop/etc it has seriously degraded. Outage is a common occurence now to the point where AWS needed to issue a statement around the use of AI assisted coding.
Add to this, that it seems like these models are performing worse after their initial release suggesting that we can't trust we get the same quality output for the same amount of money, since the providers are incentivised to make it cheaper to run them, i.e. slow use a less quantized version.
Skill decay
Use or loose it as my mom would say. When leaning too heavily into AI assisted coding I saw how my skills atrophied quickly.
The small details disappeared when you only operate on high level stuff but so much of software engineering happens in the little things. Patterns, structure, trade-off decisions.
You might still prefer the role of being an agent manager compared to being a developer but I thin the discourse around it making you automatically faster needs to change. Faster to what end? 92-95% uptime?
Good code is not only a pipedream developers chase for the hell of it. There are real business incentives around it. Who wants to pay for a product that is down half the time? Sure, you can just vibe your own solution which works fine for personal projects but on a business level you have not saved money, you have gained another responsibility ontop of your core business operations that will now also have to be maintained.
In an area where you are not necessarily the expert.
So, it's all just fake?
No. This post might make seem very anti AI which is definitely not the case. I have found them excellent at sparring with me on ideas, explaining and exploring different designs or sufacing blind spots.
These things have seen way more code than I ever will. They perform very well when the issue as hand is in their training data. Anthropic made a big deal out of them building a C compiler using only autonomous agents but failed to mentioned that they had a human-crafted test suit to check against. Plus, the thing they were recreating was likely already in the training data. Impressive that statistical models can re-create something (kinda) of this scale but it's recreating and not inventing.
I really like Carson Gross's take on this in this article "Yes, and..." where we outlines the areas he use AI for:
I typically try to use LLMs in the following way:
To analyze existing code to better understand it and find issues and inconsistencies in it
To help organize my thoughts for larger projects I want to take on
To generate relatively small bits of code for systems I am working on
To generate code that I don’t enjoy writing (e.g. regular expressions & CSS)
To generate demos/exploratory code that I am willing to throw away or don’t intend to maintain deeply
To suggest tests for a particular feature I am working on
A more useful workflow
My current approach is slowly shifting.
Instead of asking the model to write large chunks of production code, I increasingly use it to explore designs.
Generate multiple approaches.
Study the tradeoffs.
Throw them away.
Then implement the real solution myself.
That flips the value proposition.
AI stops being the thing that writes the system and becomes the thing that helps you think about the system.
Ironically that might lead to better software than we had before. Exploring five ideas used to be expensive. Now it is cheap.
But the final code still needs human ownership.