Agent Operations

Managing AI Agents

Ryan Carmichael

Managing Partner, Orienteer AI

Most companies that have deployed AI agents in production cannot tell you who owns each agent, what it's allowed to do, or what happens when its builder leaves the company. The agents are quietly accumulating like an undocumented workforce no one has hired and no one has fired. The fix isn't more AI strategy. It's the discipline HR has been using to manage humans for fifty years — applied to a category of worker we never used to need it for.

From Shadow IT to Shadow AI Agents

For the past two decades, enterprise IT has been chasing a problem called Shadow IT. The pattern is familiar: an employee signs up for Dropbox to share a file. A team buys a SaaS subscription on a corporate card. Someone installs a useful tool on their own laptop. The software gets used for real work, on real company data — but IT doesn't know it exists, can't secure it, and can't compliance-review it. The discipline IT eventually built around that problem — registries, governance, procurement controls, periodic audits — is what kept Shadow IT from becoming a category-defining failure mode.

We are now watching the same pattern emerge one layer up. Call it Shadow AI Agents. An engineer builds an agent to automate part of their job. It works. They put it into production. They tell maybe two coworkers. The agent quietly does real work — touching customer data, classifying tickets, drafting responses, making operational decisions — but nothing about it is in a central registry. When the engineer changes teams or leaves the company, the agent keeps running, unowned and unreviewed.

The response is structurally the same as the Shadow IT response: registries, governance, lifecycle discipline, accountability. The rest of this guide is what that discipline looks like for a category of worker that didn't used to exist.

The shadow workforce

Walk into a mid-sized enterprise today and ask one question: how many AI agents do you have in production? You will get four answers. The CTO will give one. The head of engineering will give another. The risk officer's will be different still. And someone will eventually mention the LangChain experiment that's been running on a forgotten EC2 instance for fourteen months and is, somehow, still answering customer emails.

None of those answers is wrong. They're all measuring different things, because nobody has decided what counts as “an agent in production” and nobody owns the list. This isn't a metaphor — it's the actual state of most enterprise AI footprints I've walked into in 2025 and 2026. Builders ship agents into production. Builders move teams. Builders leave. The agents stay, and nobody picks them up, because nobody was ever told they needed to.

The way to fix this isn't a new framework. It's the boring one HR has been running quietly in the background of every functional company for decades: define the role, document the scope, name the owner, run a real onboarding, run a real offboarding. Apply that to agents and the shadow workforce shrinks.

Three principles for every agent

Every agent in production should have three documented attributes — three things you'd expect to know about anything operating on your behalf, whether it's a vendor, a contractor, a system, or an agent.

Persona

— answers “Who?”

The agent's character, voice, the judgments it makes and the ones it refuses. The narrative version of the system prompt — and the contract for behavior the prompt is supposed to implement.

Role

— answers “What?”

The work the agent performs, the boundaries of that work, what it must escalate, what it must refuse. Vague phrases like "handles customer issues" aren't a role; they're a wish.

Decision Authority

— answers “What it's empowered to decide?”

Not what the agent could do — what it's authorized to do. Where this lands on the human-in-the-loop spectrum is the most consequential design choice you'll make about it.

See: Operator, Reviewer, Auditor →

None of this is exotic. It's the contract HR has had with every new hire since the invention of the offer letter: who you are, what you do, what you're empowered to do. The mistake in most enterprise AI programs is treating agents as if they don't require this — as if a system prompt buried in a Python file counts as a role definition. It doesn't. A system prompt is implementation. The principles above are the contract that the system prompt is supposed to implement.

The practical test: if I asked your security team today, for any one production agent, “who is this, what is it allowed to do, and who decided?” — could they answer in under two minutes? If not, you haven't hired the agent. You're hosting it.

Onboarding

Onboarding scales to the agent's risk tier

Onboarding isn't a single linear march. It's a set of gates that scale to the consequence the agent carries — a production-critical agent operating on customer money clears more gates than a productivity tool somebody built for themselves. Three roles are accountable across every tier.

Three accountable roles

Builder: — designs and ships. Accountable for technical correctness and pre-launch validation. Often the person most likely to leave the company; tracking who the Builder was matters even after they're gone.
Owner: — the named business person who accepts the agent into production. Accountable for ongoing operation and decisions when something breaks. Transferable; never empty for production-critical agents.
Agent Ops: — the institutional function for registry, permissions, monitoring, and audit. In a mid-market company this might be one person wearing three hats; in an enterprise it's a small team.

Production-critical

7 gates

Gate	Accountable
Documented charter — owner, scope, decision authority, sponsor	Builder
Named primary owner + named backup owner	Owner
Pilot with full logging and a human monitor watching live	Builder
Performance bar — accuracy, latency, error rate, cost-per-action	Builder
Registry entry + permission envelope + audit log scope	Agent Ops
Sponsor approval gate — business + risk sign-off in writing	Owner (secures sponsor)
Launch plan — first 72 hours, rollback, paging	Builder + Owner

Internal-operational

4 gates

Gate	Accountable
Documented charter	Builder
Named primary owner	Owner
Registry entry + permission scan	Agent Ops
Launch plan	Builder + Owner

Productivity-tooling

2 gates

Gate	Accountable
Lightweight registry entry — user is self-owner	User
Standard tooling permissions only — no custom write access	Agent Ops default

Experimental

3 gates

Gate	Accountable
Charter with explicit sunset date	Builder
Sandbox isolation — no production data, no production write access	Agent Ops
Auto-decommission at sunset	System (registry-enforced)

Offboarding

Offboarding scales to the agent's risk tier

The question I get asked most often: when does an agent qualify for retirement? Answer: when one of six trigger events fires. Track all six — don't wait for someone to file a ticket.

The six trigger events

Builder leaves — The engineer who built it has left or moved far enough away to be effectively unreachable.
Scope changes — The workflow the agent supports has shifted enough that the agent is no longer fit for it.
Performance drifts — Accuracy, error rate, or other monitored metrics break threshold — and stay there.
Disuse — Nobody is actually relying on the agent's output anymore.
Regulatory or policy change — An audit finding, new compliance requirement, or internal policy update forces it off.
Cost overrun / negative ROI — Cost-per-action breaks the budget, or the value no longer justifies the spend.

Whichever trigger fires, document the rationale that justifies it. The trigger itself isn't a gate — it's the entry point. The gates below are what happens next.

Once a trigger fires, offboarding gates scale with the agent's tier — and the same three roles (Builder, Owner, Agent Ops) carry accountability all the way through. For production-critical retirements, consider a post-mortem as a separate practice afterward — what you learned feeds into the next agent's onboarding.

Production-critical

8 gates

Gate	Accountable
Dependency map — what downstream breaks when this stops running	Builder + Owner
Knowledge capture — what the builder knew that nobody else does	Builder (before unreachable)
Transition path — transferred, replaced, or retired (no "we'll figure it out")	Owner (secures sponsor)
Permission revocation — all access tokens, credentials, API keys killed	Agent Ops
Data lifecycle decision — what happens to logs, decision history, cached data	Agent Ops
Stakeholder communication — anyone relying on its output gets told	Owner
Final audit — sample the last 30 days for work outside the charter	Agent Ops
Archive entry — registry marked archived, not deleted	Agent Ops

Internal-operational

5 gates

Gate	Accountable
Dependency map — quick scan of downstream users	Owner
Transition path — replaced or retired	Owner
Permission revocation — credentials and access killed	Agent Ops
Stakeholder communication — affected users notified	Owner
Archive entry	Agent Ops

Productivity-tooling

2 gates

Gate	Accountable
Permission revocation — user kills their own credentials, or quarterly sweep does it	User
Archive entry — historical record preserved	Agent Ops default

Experimental

2 gates

Gate	Accountable
Auto-decommission at sunset — system shuts the agent down on its sunset date	System (registry-enforced)
Archive entry — historical record preserved	System (registry-enforced)

The discipline that scales

A single agent governed this way doesn't look like much. The same discipline applied to fifty agents is the difference between an enterprise AI program that an auditor can defend and a sprawl of unowned code that's one departure away from a customer-visible incident.

What I notice in the field: the companies that get this right aren't the ones with the most sophisticated agents. They're the ones who treated the third agent like the first — with a charter, an owner, a registry entry, and an offboarding plan that exists before anyone needs it. The discipline isn't glamorous. It's the thing that lets the glamorous work scale without collapsing under its own weight.

HR figured this out a century ago. Apply it to agents and the shadow workforce shrinks to a list you can actually manage.

Keep going

The Agent Operating Model

Six disciplines for running an agent fleet in production — the framework these checklists sit inside.

Read the model

Talk to us

If you're looking at the agents you have today and wondering how to make this discipline real for them, let's have a conversation.

Get in touch