writing
Your AI Doesn't Have a Model Problem. It Has a Data Problem.
You upgraded to the newest, most capable model the week it dropped. You rewrote the prompt three times. And the output is still wrong in the same annoying ways. If that sounds familiar, here's the uncomfortable diagnosis: your problem was never the model.
Almost every "the AI isn't good enough" complaint I get called in on turns out to be a data problem wearing a model costume. The model is the part everyone looks at, because it's the part with the brand name and the version number. The data is the part that actually decides whether the output is any good.
The reflex to blame the model
When an agent gives a bad answer, the instinct is to reach for a bigger model. It feels like progress. New version, more parameters, surely smarter. Sometimes it helps a little. Usually it doesn't, because the model was never confused about how to reason. It was missing the information it needed to reason about.
Swapping models to fix a data problem is like hiring a smarter employee and still refusing to tell them how your business works. The new hire is more capable and just as lost.
The model is the commodity now. The data is the moat.
Frontier models are extraordinarily good and getting cheaper by the month. That capability is available to you and to every competitor on the same terms. Nobody wins by having a slightly better model, because everyone has access to the same ones.
What you have that nobody else does is your data. Your customers, your history, your policies, your edge cases, the specific shape of how your business actually operates. The work that separates a useful agent from a demo is getting that data into the right form and in front of the model at the right moment. The model is the easy part. The data is the work, and the work is where the advantage is.
The three ways data lets you down
When an agent underperforms, the data is failing in one of three ways.
It isn't available. The agent is being asked to answer using information it was never given. A support agent with no access to the customer's account history is guessing, and a confident guess is worse than an honest "I don't know."
It's in the wrong shape. The information exists but it's a mess: buried in PDFs, scattered across five systems, inconsistent field names, free text where you needed structure. The agent technically has it and still can't use it.
It's stale or contradictory. Two sources disagree, or the data is six months out of date, and nothing tells the agent which to trust. So it picks one, sometimes the wrong one, and sounds certain doing it.
Notice that none of these are fixed by a better model. They're fixed by work on the data.
What this looks like in practice
Picture a support agent that keeps giving slightly wrong answers about billing. The team's first move is to try a bigger model. No change. The actual problem is that the billing rules live in a help doc last updated a year ago, the current rules live in a spreadsheet the finance team maintains, and the two disagree. No model on earth resolves that. Pointing the agent at the current, authoritative source does, instantly.
That's the pattern almost every time. The fix isn't smarter. It's better-informed.
Why teams skip the part that matters
Data work is unglamorous. Nobody demos their data pipeline. There's no launch-day excitement in "we cleaned up the billing rules and put them in one place." So teams pour their energy into the model and the prompt, the visible parts, and treat the data as cleanup to get to later. Later never comes, and the impressive prototype never becomes a dependable system.
This is the heart of the 70/30 Method. The model does the 70, the reasoning and drafting and classifying. But the 30 that decides whether the 70 is worth anything is largely data and context work. It's also the same thing we mean when we say you have to onboard an agent like a new hire: the context you give it matters more than how clever it is.
If your AI output is disappointing and you're shopping for a better model, stop. Look at the data first. That's the Agentic OS work I do with clients, and it's almost always where the real gains are hiding. Let's talk.