writing

Cleaning Up Your Data Before You Automate: The Unsexy Prerequisite

Jun 25, 2026· 3 min read· Roger Stringer

There's a hard rule under every successful automation project that nobody puts on a slide: automating a process built on messy data doesn't fix the mess. It scales it. You take a problem that used to happen occasionally, by hand, and you give it the ability to happen hundreds of times a day, automatically, with confidence.

This is the unglamorous prerequisite. Before you put an agent on a workflow, the data that workflow runs on has to be in good enough shape for the agent to trust. Skip it and the most common outcome isn't a dramatic failure. It's a system that's subtly, consistently wrong in ways nobody catches until a customer does.

Why automation amplifies mess

A human doing a task by hand is a quiet error-correction layer. They notice when a record looks off, when two fields contradict, when something just seems wrong, and they fix it without mentioning it. Most teams have no idea how much their data is being silently patched by the people who deal with it every day.

Automate the task and you remove that layer. The agent does exactly what the data tells it, at speed and scale, with none of the instinctive "that doesn't look right" a person brings. Garbage in, now at a thousand records an hour.

What "clean enough" actually means

Clean enough doesn't mean perfect. It means the data is good enough that an agent acting on it produces results you'd trust. In practice that comes down to a few things.

Consistent and structured. The same kind of information lives in the same place, in the same format, every time. An agent can't reliably act on a field that's a date here, a free-text note there, and empty in a third of the records.

A single source of truth. When two systems hold the same fact, one of them is authoritative and everyone, including the agent, knows which. Contradictory data is where confident-but-wrong answers come from.

Complete where it counts. The fields the workflow actually depends on are reliably filled, and the agent knows what to do when something's missing instead of guessing.

You don't need to boil the ocean. You need the specific slice of data this workflow touches to be trustworthy.

Do this first, not later

The temptation is to ship the automation now and clean the data "as we go." It rarely happens, because once the thing is running and appears to work, the unglamorous cleanup loses every priority fight. Then a quiet error compounds for months.

Doing the data work first is the same discipline as everything in the 70/30 Method: the 30% that isn't exciting is the part that decides whether the 70% is worth anything. It's also why your AI usually has a data problem, not a model problem. Get the data right and a modest setup works beautifully. Get it wrong and the most sophisticated system in the world just makes mistakes faster.

If you're about to automate a workflow and you're not sure your data is ready for it, that's exactly the assessment I do with clients before building anything. Let's talk.