What I See When A Vibe-Coded App Lands On My Desk

Dan Haiem is the founder and CEO of AppMakers USA, helping business leaders design, build and scale apps that deliver real-world impact.

The founder usually opens with the same line: "It works. I just need someone to clean it up a little." They built it with an AI tool, the demo runs clean, and they're two weeks from launch. Then my team starts the audit.

By the time we're done, "clean it up a little" has turned into a structural review that uncovers authentication gaps, exposed data and failure points that would have surfaced the first time a real user did something unexpected. My team has rescued 37 apps over the past few years. Of those, 20 came in with problems that traced directly back to how they were built: AI tools, minimal review and a demo that ran clean until it didn't.

That's not a knock on the tools. It's a gap in how founders understand what those tools actually produce.

The Demo Is Not The Product

AI coding tools are genuinely good at one thing: generating code that handles the scenario you described. You describe a login flow; it builds one. You describe a checkout process; it builds one. What it builds works exactly as well as the prompt that created it.

The problem is that real users don't behave like prompts. They submit forms twice. They lose connection mid-transaction. They find URLs they weren't supposed to find. They try things the founder never thought to describe, because the founder was thinking about what the app should do, not what a user might do to break it.

The code handles the happy path. Everything outside that path is, in most cases, untested and unhandled.

What's Missing, And Why It's Hard to See

Across the apps my team has reviewed, three gaps show up more than anything else.

The first is authentication logic that only holds under expected conditions. The UI restricts what users can see and do, and that restriction works perfectly when users follow the intended flow. But the backend often doesn't enforce the same rules independently. A user who knows to go directly to an endpoint, rather than through the interface, can sometimes access data that was never meant to be theirs.

The second is database exposure from missing row-level security. This one is particularly common in apps built on platforms that use cloud database backends. The tables exist, the data is structured correctly and everything looks fine from the front end. What's missing is the rule that says a given user can only read their own records. Without it, the data is accessible to anyone who knows how to ask for it.

The third is no handling for when external services fail. The app calls a payment processor, an email service, a third-party API. It works when those services respond normally. There's no retry logic, no fallback, no graceful error state for when they don't. The first time a service times out in production, the app either crashes or leaves the user stranded with no feedback.

Founders don't catch these because they test what they expect to work. The AI produced something that looked right, it passed every scenario the founder ran, and there was no reason to suspect otherwise.

This is the difference between testing your own assumptions and testing what happens when someone else interacts with what you built.

Three Questions Worth Asking Before You Ship

You don't need a full engineering audit to catch the most common problems. You do need someone who didn't build the app to try to break it.

First: Has anyone tried to access something they shouldn't?

Not through the UI, but directly. If your app stores user data, can someone with basic technical knowledge retrieve records that aren't theirs? If the answer is "I don't know," that's worth finding out before launch.

Second: What happens when something fails?

Pick one external dependency your app relies on and simulate a failure. If your payment processor returns an error, does your app handle it cleanly or does it leave the transaction in an unknown state? This test takes an hour and reveals more than most code reviews.

Third: Did anyone test inputs you didn't design for?

Submit an empty form. Enter a string where a number is expected. Upload a file type the app wasn't built to handle. The goal isn't to be exhaustive. It's to find out whether the app fails gracefully or fails badly.

The AI did what you asked. Before you ship, the question is whether anyone has tested what you didn't think to ask for.

So What’s The Problem?

Using AI tools to build fast is a legitimate approach. Getting to a working prototype in days instead of months is a real advantage, and it's changed what's possible for small teams and first-time founders.

The mistake is treating the prototype as the finished product. What ships fast in a demo rarely survives contact with real users, edge cases and the kind of low-effort probing that any moderately curious person will apply to a new app. The gap between those two things is where 20 of the rescues my team has handled actually started.

The tool built what you described. Your job, before it goes live, is to make sure someone has asked what you forgot to describe.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

What I See When A Vibe-Coded App Lands On My Desk

​The Demo Is Not The Product

​What's Missing, And Why It's Hard to See

​Three Questions Worth Asking Before You Ship

So What’s The Problem?

The Demo Is Not The Product

What's Missing, And Why It's Hard to See

Three Questions Worth Asking Before You Ship