I keep seeing the same pattern.
A leader starts with one helpful AI agent doing a narrow task.
Then a few more appear.
Then suddenly there is a small digital workforce producing drafts, decisions, customer messages, refunds, tickets, summaries, follow-ups, and process changes.
That is when the real question arrives:
Who is actually responsible when an agent gets it wrong?
This post is about that moment.
Managing agentic systems looks a lot like managing people.
You set expectations.
You give boundaries.
You review work.
You coach.
You decide what happens when things go well and what happens when they do not.
At the beginning, you can keep a close eye on everything.
That works with one agent.
It even works with three.
But the moment you want agents across sales, finance, HR, service, and operations, you hit the same ceiling you always hit with humans.
Your attention does not scale.
Your focus does not scale.
So you need layers.
You need shepherds.
When I say agentic sheep, I mean AI agents that can:
This is powerful.
It is also operationally dangerous if nobody is watching the right things.
Not watching everything.
Watching the right things.
Most agent programmes fail in a very boring way.
Not because the model is bad.
Because leaders become the human router for every edge case.
They become the escalation layer for vague instructions.
They become the approval gate for every risky output.
The workflow problem turns into a leadership bandwidth problem.
Modern work is already full of interruptions.
If your operating model assumes leaders can simply “keep an eye on it”, you are already in trouble.
This is why scaling agentic systems is an operating model challenge first, not a tooling challenge.
Span of control breaks when one manager has too many direct reports to lead well.
Agents recreate the same dynamic, only faster.
Agents produce more work, more quickly, and often with more confidence than is warranted. That creates more output, more exceptions, and more review pressure.
Flattening without shepherding means leaders drown in escalations.
The answer is not more dashboards.
The answer is a structured oversight layer.
This is where shepherds come in.
A shepherd is a supervising layer that:
A shepherd can be:
This is not about replacing leaders.
It is about protecting them.
If everything needs attention, you have built nothing.
Decide where humans add disproportionate value:
Everything else should default to agent execution with shepherded review.
Most teams define success.
Very few define failure.
Agents are excellent at being confidently wrong.
So define what bad looks like. Write it down. Make it operational. Make it testable.
This line should be printed and pinned wherever your agent programme is being designed:
“The organizations making the most progress are treating AI agents as part of the workforce. They define roles, boundaries, escalation paths, and consequences. They invest as much in governance and monitoring as they do in model capability.”
Natarajan Elayappan
That is the point of shepherding.
Without governance you do not get scale.
You get chaos at speed.
RACI is not fashionable. It is effective.
If you do not define this for an agentic workflow, your organisation will invent it under pressure.
That is when mistakes happen.
Add the shepherd:
If your shepherd cannot explain the RACI, it is not a shepherd.
It is a second agent guessing.
This is where most teams struggle.
They tell an agent to “handle invoices” or “manage enquiries” and are surprised when results vary.
You need three definitions.
A task is ready when:
A task is done when:
Good looks like:
This is not paperwork.
This is how scale becomes safe.
Unchecked autonomy creates incidents.
Incidents create shutdowns.
Guardrails let autonomy exist without damage.
“The real breakthrough lies in finding a balance between harnessing this power and implementing robust safety measures and governance.”
Merve Ayyüce KIZRAK
Guardrails are not rules for the model.
They are rules for the organisation.
They define:
As Ian Walker puts it:
“Human-AI teaming transforms span of control from a fixed rule into a strategic variable.”
Only if you design for it.
Each of these is small, testable, and reversible.
Good leadership is knowing what not to do.
When shepherding works, leaders stop being doers and become designers of decision-making.
They protect customers, colleagues, and the organisation.
That shows up as practical choices, grounded in basic decency:
These are not abstract values.
They are operating decisions.
Purpose
A lightweight control document for any agentic workflow.
Defines ownership, quality, escalation, and trust boundaries before scale.
Workflow OverviewWorkflow Name: Accountability (RACI)
Definition of ReadyA task may only start when all of the following are true.
Definition of DoneA task is complete only when all conditions below are met.
What “Good” Looks Like
What “Bad” Looks Like (Must Escalate)
Human-in-the-Loop Triggers
Logging & Audit Requirements
Rollback & Incident Plan
Approval |
If you want to scale agents, do not start by asking how many tasks they can do.
Start by asking how many decisions you can safely supervise.
Then build shepherds so you do not have to supervise everything.
You will move faster, with fewer surprises, and with more trust from your teams.
Have fun with your shepherds of agentic sheep.
Span of Control: What's the Optimal Team Size for Managers?
https://www.gallup.com/workplace/700718/span-control-optimal-team-size-managers.aspx
Trust rating: high
Reason: Current leadership research on span of control and the risks of overloading managers, directly supporting the “focus and scaling” argument.
Date written: 2026-01-14
RACI Charts: The Ultimate Guide, with Examples [2025]
https://asana.com/resources/raci-chart
Trust rating: high
Reason: Clear, leadership-friendly explanation of RACI with practical examples, used to ground the accountability section.
Date written: 2025-12-03
Guardrails and Governance: A CIO's Blueprint for Responsible Generative and Agentic AI
https://www.cio.com/article/4094586/guardrails-and-governance-a-cios-blueprint-for-responsible-generative-and-agentic-ai.html
Trust rating: high
Reason: Enterprise-focused guidance on governance, auditability, and human-in-the-loop escalation, aligned to the shepherd model.
Date written: 2025-11-24
What is an LLM Evaluation Framework? Workflows and Tools.
https://www.evidentlyai.com/blog/llm-evaluation-framework
Trust rating: high
Reason: Practical guidance on evaluating language model outputs, supporting “definition of done” and repeatable quality checks.
Date written: 2025-08-22
BEST USER ATTENTION SPAN STATISTICS 2025
https://www.amraandelma.com/user-attention-span-statistics/
Trust rating: medium
Reason: Helpful synthesis on attention and interruption pressures, used to support the point that leadership focus is finite.
Date written: 2025-07-22
LinkedIn post by Natarajan Elayappan
https://www.linkedin.com/posts/natdns_the-state-of-ai-in-2025-agents-innovation-activity-7413604726979760128-WVt2
Trust rating: high
Reason: Directly supports the workforce framing and the need for roles, boundaries, escalation paths, and consequences.
Date written: Unknown
LinkedIn post by Merve Ayyüce KIZRAK, Ph.D.
https://www.linkedin.com/posts/merve-ayyuce-kizrak_linkedinnewseurope-activity-7404803185971924992-rsg5
Trust rating: medium
Reason: Reinforces the leadership requirement to balance capability with safety and governance.
Date written: Unknown
LinkedIn post by Ian Walker
https://www.linkedin.com/posts/ian-walker-2a54a8_at-a-time-when-many-organisations-are-looking-activity-7370062616586625025-e2QA
Trust rating: medium
Reason: Validates the “span of control becomes a strategic variable” argument in human AI teaming contexts.
Date written: Unknown