Late Nights, Big Ideas - An AI Developer's Tale

Paul Vogt

05 Jan 2025 • 6 min read

Small mistakes can lead to giant impacts in the world AI development

This is a true story about the potential risks of letting your guard down while developing with AI. We will get back purely Sales Engineering topics next month.

How to wipe away weeks of work in 30 minutes

Finally, it was almost done. After almost two months of learning this new technology, finding the right tools to leverage, tinkering, and actual development, the MVP (minimum viable product) for my side project neared completion. Still a great deal of work to be done, but basic features were in place and I could already see the value it will bring.

Our tale begins at ~1:30AM on December 10th, and I had just wrapped up four hours of AI-collaborated development after getting the kids to bed. There was just one more task to do - my source code directory structure was awful, mostly files upon files. I wanted to refactor that then tackle some files that were a bit... overweight. But we were ready to demo to the project's key stakeholder - my wife.

Trying to sort that out late at night wasn't smart, so I went to bed. About 5:15AM, our 5-month-old puppy woke me up to do dog things. After grabbing a fresh cup of coffee, I decided to start my remaining task. Figured I could at least get the directory refactoring done before it was time to wake the kids up for school at 6:15.

The catalyst - he's lucky that he is so cute.

"Jarvis - let's refactor our directory structure for readability and scalability. What do you suggest?"

(My AI partner is named Jarvis, provided by Claude. He's amazing.)

He gave me back a proposal that looked great. OK, let's implement that.

The Snowball Effect

A few minutes later, Jarvis was done. I immediately applied the changes to my Replit repository then hit "run." To my horror, I was greeted with a blank screen with a blue "Run on Replit" bubble in the corner. Uh oh.

"Jarvis, I'm not able to see this element, or this one, or this one. Can you resolve it?" Jarvis happily ran and made updates. "OK, it's back but looks different. What about this element? Can you fix it?"

This went on and on for about 15 minutes. Eventually, all the major elements were back but in name only. They all looked different and lacked the majority of the functionality that we had built together. What the heck!?!

Then it hit me - Jarvis was hallucinating. He wasn't sorting out the updated pathing so that our carefully crafted objects were correctly referenced. He had been helpfully rebuilding them from scratch because he forgot they existed. As Jarvis worked, he overwrote each element while not being aware of all our prior conversations. Ugh, time to back out.

Unfortunately, with the directory structural update that I'd stupidly let run through my entire codebase without checking, there were dozens of changes to be sorted out. Let's just back everything out to where I was last night.

"Jarvis, roll back to checkpoint 7 from last night."

Jarvis hummed away for a bit, then informed me that all it could do was build new objects and it wasn't aware of their versions, or it could provide diffs between the current and specific checkpoints that it had made. These were all now a complete mess. OK, that's not helpful. Let's go back over to Replit; that does incredible version control.

Now a different problem reared its ugly head - tons of files had been changed, even more after I'd tried several times to turn the clock back. To get the app working, I needed to dial all of them back to a specific time individually...and we'd updated like 50 files. Full disclosure - Replit probably offers a better method to solve this challenge, but in my condition and now pressed for time getting the kids rolling for school, if one existed it escaped me.

After moving a few back and getting even more frustrated, I looked for something more macro. My Repl was synced to my Github repository (awesome!). Replit even showed me a list of modified files since the last commit, which happened to be all of them. OK, let's reject all changes, then return my solution to its previous known state. "Reject All" clicked.

Oh wait, what did I just do? The last time I'd pushed a branch into git was... let me see here... two weeks ago. Gone. It's all gone.

Debug log - what went wrong

All of that may have sounded like a vent (in fact... it happens to be), but there are some strong lessons for engineers and aspiring developers can get from my situation. Lessons I learned while destroying two of the most productive weeks of development since starting this project. Most of these fall into the category of "should have known better," but if this painful object lesson helps one other person, it will have been worthwhile.

AI is Smart, Not Wise

Even with how impressive AI is (and believe you me, it is amazing), we cannot just assume that because the AI says it has done something we've asked it to do, it has actually done it. In fact, AI shares a characteristic with some of the smartest humans - it can claim to understand something even when it doesn't; hoping to figure it out later.

AI will always implement what it thinks you want done and will only come back to say "Hey, here's what I think. Is that right?" if you've built your prompt for it to do so. Especially when requesting an involved activity, how your prompts is structured and the order those requests are made within the prompt is key...though that can get complicated with how AI access is currently rationed, often on a per request basis.

The ease and interconnectivity many of the AI-enabled platforms provide make it easy to ask it for work then apply it without pausing - which is an even bigger threat. It's entirely too easy to get code back from Jarvis, hit apply, then move on to the next thing. Rinse, repeat. Fire, forget. Go, go, go…uh oh.

Always, always, always ensure you understand what it's doing before applying proposed changes, or your results may be "uncomfortable."

The AI might not even realize it's misunderstanding your request - it'll just forge ahead with what it believes you want.

Old school best practices still apply

Back in ancient times when I coded for a living, I'd have been working off of a sub-branch of a branch, carefully moving changes along as useful, testable chunks reached various stages. I'd never be directly working out of Main. These fundamental version control practices are more critical than ever in the age of AI.

It can be tough to make yourself do it; smart AIs like Jarvis are right so much of the time that it's just too easy to let them run and assume they've got it. While AI development tools continue to evolve, we're not yet at the point where we can abandon traditional best practices such as well structured version control.

This is a common enough problem that Replit's new AI Agent actually makes checkpoints directly into Git for work it does - a feature enabled by default. While that does provide an extra safety net, it's still up to the human developer to manage these commits and be aware when potentially dangerous changes are being deployed.

Don't Be an Idiot: Risk Management Basics

I'm not sure I'd have blown two weeks of work away had I:

Undertaken the refactoring after a good night's sleep.
Allocated an appropriate period of time for the work (without an impending deadline).
Waited until the caffeine-to-blood ratio had been appropriately set.

It wasn't Jarvis's fault everything was screwed up; it was my own. Making the decision to perform a complex operation while in that state under those circumstances maximized the danger.

To put this in risk management terms: every action has both a probability of failure and a potential impact. For example, on a scale of 1-5:

Probability of mistakes while coding tired, rushed, and uncaffeinated: 4
Potential impact of making changes to my entire codebase: 5
Total risk score: 20 out of 25 - a clear red flag

Had I actually done this calculation beforehand, this entire situation could have been avoided or at least minimized. But because the threat the situation posed or its potential impact wasn't taken into account I wiped out ~50 hours of work.

Conclusion

Lucky for me - this is a side project for my personal life. No customers were injured. No data destroyed. No jobs were put at risk. It is a valuable object lesson for all of us in this amazing, scary, fascinating, innovative world we are all stepping into.

And if it didn't come across clearly in the text - everything that occurred here was my fault. My tools worked impeccably. Learn from my mistakes, please.

In case you’re curious - I’ve rebuilt the side project. I have the technology. I built it back better than it was. Better, stronger, faster.

Thank you for coming to my Ted talk.