Thoughts on building with AI

Those who know me know that I have been an AI skeptic and slow on its uptake. I built a little with GitHub Copilot on VSCode, and it was useful, but it felt like fancy auto-complete. I was also a product manager and didn't need to write much code.
All that changed a couple of weeks ago when I started working on a decently sized Ruby project. I will be focusing on increasing their stability and performance, but I also wanted to get used to the codebase to be able to troubleshoot issues in the future. I picked up a couple of small bugs to see how much I can accomplish.
I fired up Claude code and started going through the project and asked questions, like “Which function is executed when this endpoint is hit?”, “How do these lines of code even work?” or “I am running into this error; do you have ideas on how to fix it?” And Claude was extremely helpful and helped me fix my first bug in about 30 minutes.
This was amazing because I had never written or read any Ruby before! But where it blew me away was when I asked it to write tests for the changes I just made. It looked at the class where I made my changes and wrote tests for it, ran them, and iterated through all the failures until everything passed.
But the thing that really blew my mind and made me realize the potential is what Claude did after all the tests passed. It understood the intent behind my changes and realized that the tests didn't actually test for the issue I fixed. It added a test case for the specific bug without me even describing it. After that, I started using it more and more, and it's made building more fun!
I have since built and deployed a small service from scratch and tried to write a gem with Claude. It's kinda addictive, and I had to force myself to put away the computer lest I be up until 6AM again.
These are some of my thoughts so far; it'll be interesting to revisit them in 6-12 months. A lot of these might be obvious, and I am also extremely new to the space, so I'd love your thoughts and perspectives as well!
Not a difficult skill to pick up
After reading many posts on writing the “correct prompts”, I thought I'd need to master how to utter the right incantations for things to work. But I was pleasantly surprised that I just needed to talk to it, no special words or structure. And like any skill, you will naturally get better at it as you use it more.
I have now become pretty decent at getting Claude to do what I want, and it wasn't because of the wisdom in the posts; it was just through using it regularly for a couple of weeks.
But it's also an expensive skill to get better at if your $work isn't paying for it. Claude Code has plans that cost $200/month (that I am now paying for 🙃),and that's A LOT of money.
Build in small, reviewable increments
I keep seeing people run into their token limits even with the Max plan, while I haven't run into them yet. And that's when I realized people also try to one-shot their applications. Like basically give Claude a single big prompt and let it generate the whole application for you in a single go.
I think that's the wrong approach to take. If AI just dumps A LOT of code on you at once, you won't be able to review or fully understand it. Yes, this could open you up to security issues, but also it's a maintenance burden that is bound to bite you in the future.
When I build with Claude, I build incrementally, like I would have if I didn't have Claude. When building https://github.com/gouthamve/hardcover-book-embed/, I had multiple steps:
1. Generate a Golang server.
2. Connect to the hardcover API.
3. Cache the responses.
4. Generate an HTML page to show the responses.
5. Make it a JS widget people can import.
6. Add metrics.
7. Add tests.
After each step, I carefully reviewed the diff and committed the code. This lets me have a good mental model of the codebase, and I can quickly dive in to troubleshoot.
Even with all this, sometimes I need to ask Claude to make more significant changes across many files. For that I use the planning mode, which can be triggered by shift+tab
. I give it a task, and it comes back with a set of steps it will execute. I usually end up correcting a few things in its plan, clarifying a few things, and letting it run.
These changes sometimes take like 5-10 minutes to run, so I also make sure I have some chores to do rather than just browse HN and Reddit. It's kinda weird to have a clean kitchen and folded clothes while also building, but I am not complaining 😄.
Productivity boost is variant
One thing to note is that I was building it like I would have built it. And it was because I knew how to build it. If it were not for Claude, I would still have built it this way, but I'd have taken 50%-100% longer, and my clothes would still be in a pile.
A lot of the times, the cost for the service is justified because of the supposed productivity boost to the engineers. I think for tasks that the engineer knows how to do well, it saves only about 30% of the time. But for unfamiliar tasks, there really is a 5x boost in productivity, but it also comes with the downside of the engineer not really understanding whats happening.
In the hardcover-embed project, if I were to do the HTML and widget without AI, I'd have taken a day or two to get everything working correctly. This was the daunting bit that was holding me back from starting the project. But with Claude, I got it in 5 minutes, with the downside that I don't really know how to debug it 🙃.
And finally, for senior engineers, writing code is only part of the story. They need to understand which outcomes to drive, build consensus, and a lot of things that are NOT code. A 50% boost in efficiency writing code doesn't translate to a 50% boost in their job. I am not really sure if it's a good idea to “fire engineers because AI” just yet.
Feedback loops need to be quick
So what is really cool is that Claude can understand the end state required and iterate towards it. Like, for example, it writes and then runs the tests to take care of any compilation issues or failures. These feedback loops need to be quick.
In the Ruby project, make test
ran all the tests, which took a couple of minutes. This isn't necessarily bad, but Claude was waiting for 2 minutes every single time. I modified the Makefile to take the test file as an optional argument make test SPEC=./path/to/file.rb
to speed up the feedback loop. Claude then iterated by only running the relevant tests and then, at the end, ran the whole test suite to make sure everything still worked.
I can see a world where a Devin-like system could work where you give a system a task via Slack or GitHub and it iterates and opens a PR for you to review. I was talking to Tom Braack about this, and he mentioned that while a slow 1 CPU CI system that runs the tests in 10 minutes might work for humans, it's inadequate for AI.
Fight the fatigue
When I first sit down, I review all the changes carefully before committing them. But after a few hours, I got more and more lax until I just YOLO added them. This is not good for long-term productivity or security.
Right now, I make sure to step away and take a walk when I notice this happening, but I almost always notice a little too late. I don't want to lose my critical thinking and delegate everything to AI, at least not until it generates code better than I can.
One idea that Tom proposed was making the AI force you to review things before committing. Maybe it can ask you questions or put up roadblocks to ensure you've reviewed things before you accept them. This runs counter to what people are shilling, so it will likely never happen, but I think it's a GREAT idea!
Rise of the home-cooked app!
I really love coming across articles like these:


Home-cooked apps are the kind you make for yourself that solve your own problems. Much like a home-cooked meal, they can be shared with friends or family. They don’t have ROI, KPIs, or sales funnels. They don’t scale. They don’t have to.
When I became a PM, I didn't want to lose touch with building, so I did home-cook a couple of apps (GopherCal, Librascan), and I loved it. But they did take a significant amount of time to build.
With AI, building similar apps is much faster and a lot more fun. And I am already seeing more and more people building services and apps for themselves. For example, in the hardcover community, I see people checking if an integration exists, and if it doesn't, they fire up Cursor and share it a day or two later. It's amazing!
I believe there will be an explosion of these apps, and platforms that make it easy to host and manage them will get a nice boost in usage.
Long-term effects
The big question everyone, including me, has is “Will I become a worse engineer if I rely on AI?” I don't think so if you make sure you understand AI's output. If you blindly keep delegating things to AI, you might be, but even then I am not convinced.
What AI does allow you to do is get past the boring/already known problems so you can focus on learning new skills faster. I tried writing a custom sampler for opentelemetry-ruby
and within a couple of hours, I got a working prototype. Can I recreate the same without AI? Sure, but it'll take me muuuuuch longer. Most of it would have gone into setting up the boilerplate.
But very soon after starting, I was diving into the core functionality of the gem and learning new things. I think this is powerful and will help you become a better engineer.
It always helps to work with more experienced people. When Claude writes tests, it tries to test every single method possible. This is counterproductive and will slow you down in the future. So I usually end up deleting 80% of the tests and make sure only a few behaviors are tested. I can do this because I have enough experience to recognize this code smell.
I am sure there are issues with the HTML or CSS that Claude generated, but I can't find them because I simply don't know enough. And I really wish I had an experienced frontend engineer reviewing my code so I could learn.
On Observability
As I used more and more of Claude code one thing became clear to me. The SRE assistant is not far away. Like if Claude can reason about my code and write tests for it, it can also troubleshoot my production issues. A typical troubleshooting loop looks like this:
1. Receive an alert about a SLO being breached
2. Open the relevant dashboard to understand which service in the system might be causing the issue
3. Understand if there have been any changes in the system, like new deployments, more requests, etc.
4. If no changes, look at logs for error messages.
5. If it is slow requests, look at traces.
6. Form a hypothesis based on the data and iterate.
AI can do this, tbh. It's possible; it just needs to be wired up correctly. Imagine getting an alert, and by the time you get to acknowledge it, it presents you with relevant information and potential causes and remediations?
I think this is what all the existing observability companies and many new startups are racing towards, and I can't wait to see the space evolve and get disrupted. It's a fun time to be building tools for engineers!
With that said, I have thoughts on what's missing in the space, and if I can sit my ass down and not get distracted, I'd love to build a couple of those ideas in the open soon.
Early Grafana Assistant usage
With all of this context, I decided to take a serious look at the Grafana Assistant, and it's pretty good! Right now it feels like AI for coding felt last year, a nice auto-complete and explainer. Building dashboards manually in Grafana is a slow, laborious process, and the assistant does a decent job at it. It still hallucinates, and I still need to make manual changes, but I am very pleased with the output and the speed.
I asked it to create an SLO and didn't do a good job of it, but I'm sure it'll get there soon. What I am waiting for is the Claude Code with Opus 4 moment where it'll blow me away.
Member discussion