Vibe coding won't work on a device

The Stack Overflow 2025 survey landed with two numbers worth holding together. Positive sentiment toward AI tools dropped from 72% to 60% year-over-year. And 72% of professional developers do not "vibe code" — the practice of accepting AI-generated output with minimal review.

The sentiment number is getting the headlines. The review number is the one that matters.

Vibe coding is a sustainable practice in two situations. The first is when the stakes are low — a throwaway script, a personal tool, a prototype nobody will depend on. The second is when the feedback loop is tight enough to catch the failure cheaply. A Next.js app where the dev server reloads in a second and the bug shows up in the browser before you've taken your hand off the mouse tolerates a lot of unreviewed code, because the cost of a regression is a scroll-up in the terminal.

Mobile fails both tests. The stakes aren't low — the code ships to a user device with a version number on it, and a rollback takes days. And the feedback loop is slow enough that an unreviewed failure can sit in the codebase for a cycle before it's caught. A Compose recomposition bug. A LaunchedEffect that fires on the wrong key. A navigation intent that misbehaves under a specific entry sequence. These don't show up in a compile pass. They show up on a device, under a condition you didn't think to test.

The 72% of professionals who review AI output aren't being cautious out of taste. They're being cautious because in most real codebases the cost of shipping wrong code exceeds the cost of reading the diff. Mobile codebases sit at the high end of that tradeoff.

What this looks like in my own pipeline:

The agent writes. The engineer reads. Every diff gets read before it lands. Not scanned — read. The agent is fast enough that the bottleneck is the human review, and pretending otherwise ships bugs.
Tests run before the diff is offered for review. The agent is not allowed to hand me code that hasn't been through the compile and unit test gate. That's a pipeline constraint, not a prompt instruction.
Runtime behavior is in scope, not out of scope. The hardest mobile bugs are runtime bugs. Review has to cover what the code will do, not just what it says. That's why the scaffolding around the agent matters more than the agent's raw capability — the agent needs to see the runtime, or the human reviewer has to carry that weight alone.

The sentiment drop in the Stack Overflow survey tracks something real. People have shipped AI-written code, watched it break, and adjusted. The industry is learning the same thing mobile teams have been saying for a while: agents are useful, and the review gate is non-optional. Especially when the code is going to a device you can't take back.