Using Claude Code feels like working with an overenthusiastic intern. Smart, fast, and full of ideas, if not always the right ones. I’ll ask for a simple function, and suddenly I’m reviewing a full utility library. “Want some summary statistics while we’re at it?” No, Claude. I really don’t.
It’s like mentoring someone who shows up Tuesday morning with a weekend’s worth of features no one asked for, built on a branch no one can merge. You admire the hustle, but now you’re burning time (and tokens) cleaning up the aftermath.
Still, there’s a lot to like.
The Good
There’s something nostalgic about working with Claude in a terminal editor. Mainframe green screens gave way to bash prompts and blinking cursors. Claude fits right into that lineage. It’s a smooth collaborator, moving files around faster than I could.
I keep it open in a split screen with the terminal on one side and code on the other. We pair program like we’re sharing a split keyboard. Watching it work is surprisingly cool. The context-aware, auto-generated git commits feel like magic. It runs shell commands on its own. It even supports MCP servers, so you can plug in custom extensions or APIs. It’s a modern-day green screen with superpowers, and just enough spark to make you pause and think, “Wait… this is actually kind of amazing.”
And sometimes it really delivers. Watching it debug why my tests weren’t running gave me the same weird, proud thrill I felt the time my intern won a company-wide award. You expect them to help, but not like that. Those little moments are magic.
Of course, magic doesn’t always land cleanly.
The Middle
Things got shakier in the “not quite code” zone. Claude is solid at writing functions and scripts, but tasks like prompt generation, light config files, or structured data work can get wobbly.
I had it generate a few prompts for a workflow. They looked great at first glance. Clean output, passed a quick sanity check, seemed ready to go. But a couple hours later, I realized there was a subtle inconsistency in the structure. Because I trusted it, I had already moved forward, which meant I had to regenerate the entire dataset from scratch.
Claude, in this context, felt like an intern trying to impress during a design review. The delivery was confident. The doc looked polished. But underneath, it relied on outdated API versions and a soon-to-be-deprecated service. Not malicious, not broken. Just eager, overconfident, and not quite ready to be left unsupervised.
The Ugly
The code Claude generated technically worked. It went line by line, did what I asked, and returned something functional. But it lacked structure. No separation of concerns, no cleanup, no real sense of how the pieces fit together. When I asked Claude to clean it up, it added a new statistics module I hadn’t asked for.
Out of curiosity, I gave the same prompt to Gemini 2.5 Pro. The difference was immediate. Gemini refactored the code, improved the logging, and added exponential backoff retries to my request handler. These were practical, thoughtful changes that made the code better in ways that actually matter.
If Claude is the intern proposing extra stats, Gemini is the senior engineer quietly fixing brittle parts of the system without needing direction. Both can be useful at different times and in different contexts.
The Cost (And the Vibe)
I didn’t expect to care about the cost metric, but I did. It turned into a mini-game. If something cost $1.39 to generate, my brain instantly went: Can I get this down to a buck?
It wasn’t about the money. It was about control. About getting better at prompting. More precise. More efficient. The number became a signal. It told me when I could tighten the loop, sharpen the tool, and improve how I used it.
This is one of those invisible skills that will start to separate people who build with AI from those who just use it. Just like there are better and worse ways to write code, there are better and worse ways to prompt. And the difference in output can be massive.
Final Thoughts
This all still feels like a step forward. Maybe not a perfect one. Maybe not in a straight line. But forward nonetheless. The idea that your development tools can act more like collaborators than compilers feels right.
But I can’t help thinking about the early days of Uber. My wife and I used to take rides to the grocery store because they were so cheap it didn’t make sense to own a car. It felt like a life hack. Then Uber scaled. Investors wanted returns. Prices went up. The magic wore off.
Right now, AI coding tools still feel like they’re riding the performance and cost trend upward. Every month brings faster models, better results, and cheaper inference. But what happens when that curve levels out? When progress slows and the gains become marginal?
Do we end up locked into walled gardens, paying more and more for slightly better completions and slightly tighter IDE integrations?