Skip to content
Data McFly.
Back to writing

writing

How to Spot an AI Expert Who's Faking It

Jun 30, 2026· 5 min read· Roger Stringer

Most "AI experts" you'll talk to this year are running a magic show. The demo works flawlessly on the call. The deck has the right logos. The vocabulary is fluent. And then you hire them, point the thing at your actual business, and it falls apart.

This isn't bad luck. Around 88% of AI-agent pilots never reach production. For you, that's not a statistic — it's a coin flip with worse odds, paid for with your budget and a quarter you don't get back. The people selling you the pilot rarely mention that number. The ones worth hiring bring it up themselves.

So how do you tell them apart before the money's gone? You don't need to understand the tech. You need to know which questions make a faker sweat.

The buzzword tell

The fastest screen is also the oldest: listen to how they talk about failure.

A prompt-jockey talks about capability. What the model can do, how fast it ships, which framework they're "leveraging." Everything is upside. Ask one of them what happens when the agent gets it wrong, and you'll get a shrug dressed up as confidence — "we'll fine-tune it," "it learns over time," "edge cases are rare."

A real operator can't stop talking about failure modes. Where the thing breaks, how often, how they'd catch it, who gets paged. That's not pessimism. It's what competence sounds like. Anyone who's actually shipped an AI system has watched one confidently invent an answer in production, and the memory makes them careful.

The tell to listen for: a real expert is comfortable telling you what AI can't do. The faker treats every limitation as a personal insult.

Four questions that do the work

You don't need a technical interview. You need four questions and the patience to sit through the answers.

"How will you know it's working?" Watch whether they reach for a number or a vibe. Forrester traced agent-project failures to three root causes, and the biggest — 41% — was unclear success criteria. Translation: four in ten of these projects die because nobody agreed up front what "working" meant. A real expert defines that on the first call, in terms you can check yourself. A faker says "you'll see the difference."

"What happens when it's wrong?" Every AI system is wrong sometimes. The only question is whether someone designed for it. You want to hear about review steps, confidence thresholds, the cases that get escalated to a human. If "it's wrong" sounds like a scenario they've never pictured, that's your answer.

"How will you get it access to my real data?" That same Forrester breakdown puts insufficient tool and data access at 33% of failures. A demo runs on clean, curated examples. Your business runs on messy CRM exports and a decade of half-labeled documents. The fakes hand-wave this. The real ones ask about your data before they ask about your budget, because they know that's where pilots actually go to die.

"Show me something you built that broke, and what you changed." This is the one that separates operators from talkers. Anyone who's shipped real work has a scar and a story. Someone who's only ever demoed will reach for a hypothetical, or worse, claim nothing's gone wrong. Nothing going wrong isn't a flex. It means they've never shipped.

I'll go first. Early on I shipped an agent that summarized incoming customer messages and routed them to the right person. It demoed beautifully. In production it started confidently mis-routing the angry edge cases — the exact messages you most need to get right — because I'd tested it on tidy examples and real customers don't write tidy messages. The fix wasn't a smarter model. It was admitting I had no idea how often it was wrong, building a dead-simple log of every routing decision with the original message next to it, and reviewing it weekly until the failure pattern was obvious. Once I could see it, the prompt fix took an afternoon. I'd skipped the boring measurement step, and the model's confidence had hidden the problem from me for weeks.

Why this is a hiring screen, not an AI screen

Here's what those four questions actually test: judgment. Not whether someone can wire up a model — that part's getting easier every month — but whether they know what to do when it misbehaves, which corners are safe to cut, and when to tell you "no, that won't work."

That's the same screen you'd run on any technical leader. You're not really buying AI. You're buying the judgment that decides whether the AI is any good.

It's how I think about my own work. The agents handle the mechanical 70% — the wiring, the boilerplate, the first draft of everything. The 30% that decides whether the result ships or embarrasses you is senior judgment, and it doesn't come from a model. A faker is selling you the 70% and pretending the 30% comes free. It never does. I'll automate the building all day, but I never hand off the call on what "good enough to ship" means — that judgment is the job, and the day I outsource it to the model is the day I'm just a faker with better tooling.

The cruelest part is that the magic show is convincing precisely because the demo is real. The model genuinely does the thing on the call. What you can't see in a demo is the judgment that keeps it doing the thing six months later, on your data, when you're not watching.

Hire for the 30%. Ask what breaks. The ones who flinch were never going to ship.