Agent Operations
Managing AI Agents
Most companies that have deployed AI agents in production cannot tell you who owns each agent, what it's allowed to do, or what happens when its builder leaves the company. The agents are quietly accumulating like an undocumented workforce no one has hired and no one has fired. The fix isn't more AI strategy. It's the discipline HR has been using to manage humans for fifty years — applied to a category of worker we never used to need it for.
From Shadow IT to Shadow AI Agents
For the past two decades, enterprise IT has been chasing a problem called Shadow IT. The pattern is familiar: an employee signs up for Dropbox to share a file. A team buys a SaaS subscription on a corporate card. Someone installs a useful tool on their own laptop. The software gets used for real work, on real company data — but IT doesn't know it exists, can't secure it, and can't compliance-review it. The discipline IT eventually built around that problem — registries, governance, procurement controls, periodic audits — is what kept Shadow IT from becoming a category-defining failure mode.
We are now watching the same pattern emerge one layer up. Call it Shadow AI Agents. An engineer builds an agent to automate part of their job. It works. They put it into production. They tell maybe two coworkers. The agent quietly does real work — touching customer data, classifying tickets, drafting responses, making operational decisions — but nothing about it is in a central registry. When the engineer changes teams or leaves the company, the agent keeps running, unowned and unreviewed.
The response is structurally the same as the Shadow IT response: registries, governance, lifecycle discipline, accountability. The rest of this guide is what that discipline looks like for a category of worker that didn't used to exist.
The shadow workforce
Walk into a mid-sized enterprise today and ask one question: how many AI agents do you have in production? You will get four answers. The CTO will give one. The head of engineering will give another. The risk officer's will be different still. And someone will eventually mention the LangChain experiment that's been running on a forgotten EC2 instance for fourteen months and is, somehow, still answering customer emails.
None of those answers is wrong. They're all measuring different things, because nobody has decided what counts as “an agent in production” and nobody owns the list. This isn't a metaphor — it's the actual state of most enterprise AI footprints I've walked into in 2025 and 2026. Builders ship agents into production. Builders move teams. Builders leave. The agents stay, and nobody picks them up, because nobody was ever told they needed to.
The way to fix this isn't a new framework. It's the boring one HR has been running quietly in the background of every functional company for decades: define the role, document the scope, name the owner, run a real onboarding, run a real offboarding. Apply that to agents and the shadow workforce shrinks.
Three principles for every agent
Every agent in production should have three documented attributes — three things you'd expect to know about anything operating on your behalf, whether it's a vendor, a contractor, a system, or an agent.
Persona
— answers “Who?”The agent's character, voice, the judgments it makes and the ones it refuses. The narrative version of the system prompt — and the contract for behavior the prompt is supposed to implement.
Role
— answers “What?”The work the agent performs, the boundaries of that work, what it must escalate, what it must refuse. Vague phrases like "handles customer issues" aren't a role; they're a wish.
Decision Authority
— answers “What it's empowered to decide?”Not what the agent could do — what it's authorized to do. Where this lands on the human-in-the-loop spectrum is the most consequential design choice you'll make about it.
See: Operator, Reviewer, Auditor →None of this is exotic. It's the contract HR has had with every new hire since the invention of the offer letter: who you are, what you do, what you're empowered to do. The mistake in most enterprise AI programs is treating agents as if they don't require this — as if a system prompt buried in a Python file counts as a role definition. It doesn't. A system prompt is implementation. The principles above are the contract that the system prompt is supposed to implement.
The practical test: if I asked your security team today, for any one production agent, “who is this, what is it allowed to do, and who decided?” — could they answer in under two minutes? If not, you haven't hired the agent. You're hosting it.
Onboarding scales to the agent's risk tier
Onboarding isn't a single linear march. It's a set of gates that scale to the consequence the agent carries — a production-critical agent operating on customer money clears more gates than a productivity tool somebody built for themselves. Three roles are accountable across every tier.
- Builder
- — designs and ships. Accountable for technical correctness and pre-launch validation. Often the person most likely to leave the company; tracking who the Builder was matters even after they're gone.
- Owner
- — the named business person who accepts the agent into production. Accountable for ongoing operation and decisions when something breaks. Transferable; never empty for production-critical agents.
- Agent Ops
- — the institutional function for registry, permissions, monitoring, and audit. In a mid-market company this might be one person wearing three hats; in an enterprise it's a small team.
Production-critical
7 gates| Gate | Accountable |
|---|---|
| Documented charter — owner, scope, decision authority, sponsor | Builder |
| Named primary owner + named backup owner | Owner |
| Pilot with full logging and a human monitor watching live | Builder |
| Performance bar — accuracy, latency, error rate, cost-per-action | Builder |
| Registry entry + permission envelope + audit log scope | Agent Ops |
| Sponsor approval gate — business + risk sign-off in writing | Owner (secures sponsor) |
| Launch plan — first 72 hours, rollback, paging | Builder + Owner |
Internal-operational
4 gates| Gate | Accountable |
|---|---|
| Documented charter | Builder |
| Named primary owner | Owner |
| Registry entry + permission scan | Agent Ops |
| Launch plan | Builder + Owner |
Productivity-tooling
2 gates| Gate | Accountable |
|---|---|
| Lightweight registry entry — user is self-owner | User |
| Standard tooling permissions only — no custom write access | Agent Ops default |
Experimental
3 gates| Gate | Accountable |
|---|---|
| Charter with explicit sunset date | Builder |
| Sandbox isolation — no production data, no production write access | Agent Ops |
| Auto-decommission at sunset | System (registry-enforced) |
Offboarding scales to the agent's risk tier
The question I get asked most often: when does an agent qualify for retirement? Answer: when one of six trigger events fires. Track all six — don't wait for someone to file a ticket.
- Builder leaves — The engineer who built it has left or moved far enough away to be effectively unreachable.
- Scope changes — The workflow the agent supports has shifted enough that the agent is no longer fit for it.
- Performance drifts — Accuracy, error rate, or other monitored metrics break threshold — and stay there.
- Disuse — Nobody is actually relying on the agent's output anymore.
- Regulatory or policy change — An audit finding, new compliance requirement, or internal policy update forces it off.
- Cost overrun / negative ROI — Cost-per-action breaks the budget, or the value no longer justifies the spend.
Whichever trigger fires, document the rationale that justifies it. The trigger itself isn't a gate — it's the entry point. The gates below are what happens next.
Once a trigger fires, offboarding gates scale with the agent's tier — and the same three roles (Builder, Owner, Agent Ops) carry accountability all the way through. For production-critical retirements, consider a post-mortem as a separate practice afterward — what you learned feeds into the next agent's onboarding.
Production-critical
8 gates| Gate | Accountable |
|---|---|
| Dependency map — what downstream breaks when this stops running | Builder + Owner |
| Knowledge capture — what the builder knew that nobody else does | Builder (before unreachable) |
| Transition path — transferred, replaced, or retired (no "we'll figure it out") | Owner (secures sponsor) |
| Permission revocation — all access tokens, credentials, API keys killed | Agent Ops |
| Data lifecycle decision — what happens to logs, decision history, cached data | Agent Ops |
| Stakeholder communication — anyone relying on its output gets told | Owner |
| Final audit — sample the last 30 days for work outside the charter | Agent Ops |
| Archive entry — registry marked archived, not deleted | Agent Ops |
Internal-operational
5 gates| Gate | Accountable |
|---|---|
| Dependency map — quick scan of downstream users | Owner |
| Transition path — replaced or retired | Owner |
| Permission revocation — credentials and access killed | Agent Ops |
| Stakeholder communication — affected users notified | Owner |
| Archive entry | Agent Ops |
Productivity-tooling
2 gates| Gate | Accountable |
|---|---|
| Permission revocation — user kills their own credentials, or quarterly sweep does it | User |
| Archive entry — historical record preserved | Agent Ops default |
Experimental
2 gates| Gate | Accountable |
|---|---|
| Auto-decommission at sunset — system shuts the agent down on its sunset date | System (registry-enforced) |
| Archive entry — historical record preserved | System (registry-enforced) |
The discipline that scales
A single agent governed this way doesn't look like much. The same discipline applied to fifty agents is the difference between an enterprise AI program that an auditor can defend and a sprawl of unowned code that's one departure away from a customer-visible incident.
What I notice in the field: the companies that get this right aren't the ones with the most sophisticated agents. They're the ones who treated the third agent like the first — with a charter, an owner, a registry entry, and an offboarding plan that exists before anyone needs it. The discipline isn't glamorous. It's the thing that lets the glamorous work scale without collapsing under its own weight.
HR figured this out a century ago. Apply it to agents and the shadow workforce shrinks to a list you can actually manage.
Keep going
The Agent Operating Model
Six disciplines for running an agent fleet in production — the framework these checklists sit inside.
Read the modelTalk to us
If you're looking at the agents you have today and wondering how to make this discipline real for them, let's have a conversation.
Get in touch