Skip to content
Data McFly.
Back to writing

writing

Guardrails: The Boring Work That Keeps AI Out of the Headlines

Jun 28, 2026· 3 min read· Roger Stringer

Every AI failure that makes the news has the same thing in common: a missing guardrail. The chatbot that promised a refund policy that didn't exist. The agent that got talked into saying something awful. The system that took an action nobody intended at a scale nobody checked. None of those were caused by a weak model. They were caused by the absence of the boring constraints that should have been wrapped around it.

Guardrails are that unglamorous work. They're what separates an agent you can confidently put in front of customers from one that's an incident waiting for a date.

What a guardrail is

A guardrail is a constraint that limits what the agent can do, or checks what it did, so that a mistake gets caught or prevented instead of shipped. The model's job is to produce a good answer. The guardrail's job is to assume it sometimes won't, and to make sure that's survivable. You build guardrails not because you expect failure on every request, but because you can't predict which request will be the one.

The kinds that matter

A few categories cover most of what you need.

Input checks. Validate and sanitize what comes in before the agent acts on it. Catch the malformed, the out-of-scope, and the deliberately adversarial at the door.

Output validation. Check what the agent produced before it goes anywhere. Does it fit the expected format, stay inside policy, avoid promising things you don't offer? A bad output caught before it's sent is a non-event. The same output sent is a problem.

Scope and permission limits. The agent can touch exactly what its job requires and nothing more. This is the same discipline as scoping a new hire's access: enough to work, not enough to cause an incident.

Human gates. High-stakes or irreversible actions wait for a person to approve. Match the gate to the cost of being wrong, so you're not bottlenecking the cheap, reversible work or rubber-stamping the expensive stuff.

Monitoring and a kill switch. You can see what the agent is doing, you get alerted when something looks off, and you can stop it quickly. Trouble you can see early is trouble you can contain.

The mindset

The shift that makes guardrails click is to stop designing for the case where the agent is right and start designing for the case where it's wrong. Assume it will occasionally produce a confident, plausible, incorrect answer, because it will, and build the system so that when it does, the guardrail catches it instead of the customer.

This is the heart of who's accountable when the agent is wrong: guardrails are where accountability becomes concrete. It's also the least glamorous slice of the 70/30 Method, the 30% of senior judgment that decides whether aggressive automation is an advantage or a liability. Nobody demos their guardrails. They're just the reason the headline is about someone else.

Designing the right guardrails for the stakes is core to the Agentic OS work I do with clients who want to automate aggressively without betting the company on it. Let's talk.