Shepherds of Agentic Sheep
The Leadership Playbook for Scaling AI Without Losing Control
I keep seeing the same pattern.
A leader starts with one helpful AI agent doing a narrow task.
Then a few more appear.
Then suddenly there is a small digital workforce producing drafts, decisions, customer messages, refunds, tickets, summaries, follow-ups, and process changes.
That is when the real question arrives:
Who is actually responsible when an agent gets it wrong?
This post is about that moment.

The metaphor, and why it matters
Managing agentic systems looks a lot like managing people.
-
You set expectations.
-
You give boundaries.
-
You review work.
-
You coach.
You decide what happens when things go well and what happens when they do not.
At the beginning, you can keep a close eye on everything.
That works with one agent.
It even works with three.
But the moment you want agents across sales, finance, HR, service, and operations, you hit the same ceiling you always hit with humans.
Your attention does not scale.
Your focus does not scale.
So you need layers.
You need shepherds.
What “agentic sheep” means in plain English
When I say agentic sheep, I mean AI agents that can:
- Take a goal
- Plan steps
- Use tools such as email, CRM, ticketing, spreadsheets, or finance systems
- Execute actions
- Report what they did
This is powerful.
It is also operationally dangerous if nobody is watching the right things.
Not watching everything.
Watching the right things.
The leadership bottleneck is focus, not technology
Most agent programmes fail in a very boring way.
Not because the model is bad.
Because leaders become the human router for every edge case.
They become the escalation layer for vague instructions.
They become the approval gate for every risky output.
The workflow problem turns into a leadership bandwidth problem.
Modern work is already full of interruptions.
If your operating model assumes leaders can simply “keep an eye on it”, you are already in trouble.
This is why scaling agentic systems is an operating model challenge first, not a tooling challenge.
Span of control comes back, but louder
Span of control breaks when one manager has too many direct reports to lead well.
Agents recreate the same dynamic, only faster.
Agents produce more work, more quickly, and often with more confidence than is warranted. That creates more output, more exceptions, and more review pressure.
Flattening without shepherding means leaders drown in escalations.
The answer is not more dashboards.
The answer is a structured oversight layer.
This is where shepherds come in.
What a shepherd actually is
A shepherd is a supervising layer that:
- Reviews what a set of agents did
- Checks outputs against agreed quality and risk rules
- Escalates only what matters
- Feeds learning back so the system improves over time
A shepherd can be:
- A human with a structured review cadence
- An AI agent designed to supervise other agents
- A hybrid of the two, which is where most organisations will land
This is not about replacing leaders.
It is about protecting them.
Two decisions leaders must make up front
1. Where humans should spend attention
If everything needs attention, you have built nothing.
Decide where humans add disproportionate value:
- Risky decisions
- Ambiguous customer situations
- Money movement
- Legal commitments
- Identity and access changes
- Sensitive data use
- External communication under your brand
Everything else should default to agent execution with shepherded review.
2. What “bad” looks like
Most teams define success.
Very few define failure.
Agents are excellent at being confidently wrong.
So define what bad looks like. Write it down. Make it operational. Make it testable.
Governance is not optional
This line should be printed and pinned wherever your agent programme is being designed:
“The organizations making the most progress are treating AI agents as part of the workforce. They define roles, boundaries, escalation paths, and consequences. They invest as much in governance and monitoring as they do in model capability.”
Natarajan Elayappan
That is the point of shepherding.
Without governance you do not get scale.
You get chaos at speed.
Use RACI because it forces clarity
RACI is not fashionable. It is effective.
- Responsible: who does the work
- Accountable: who owns the outcome
- Consulted: who must be asked
- Informed: who must be told
If you do not define this for an agentic workflow, your organisation will invent it under pressure.
That is when mistakes happen.
Example: agentic refund workflow
- Responsible: Refund Agent
- Accountable: Head of Customer Operations
- Consulted: Finance Controller
- Informed: Support Team Lead
Add the shepherd:
- Responsible: Refund Agent (does)
- Responsible: Shepherd Agent (checks)
- Accountable: Head of Customer Operations
If your shepherd cannot explain the RACI, it is not a shepherd.
It is a second agent guessing.
Definition of Ready, Done, and “Good Looks Like”
This is where most teams struggle.
They tell an agent to “handle invoices” or “manage enquiries” and are surprised when results vary.
You need three definitions.
Definition of Ready
A task is ready when:
- The goal is explicit
- Required data is present and permitted
- Tools are available
- Policy constraints are clear
- Escalation triggers are defined
Definition of Done
A task is done when:
- Outputs are complete
- Sources are recorded
- Actions are logged
- The right people are informed
- A rollback path exists
Good looks like
Good looks like:
- Correct and policy-compliant
- Appropriate for the audience
- Minimal risk exposure
- Measurable improvement
This is not paperwork.
This is how scale becomes safe.
Guardrails let you move faster without hurting people
Unchecked autonomy creates incidents.
Incidents create shutdowns.
Guardrails let autonomy exist without damage.
“The real breakthrough lies in finding a balance between harnessing this power and implementing robust safety measures and governance.”
Merve Ayyüce KIZRAK
Guardrails are not rules for the model.
They are rules for the organisation.
They define:
- Tool access
- Data visibility
- Human approval thresholds
- Logging requirements
- Review expectations
- Stop conditions
The scaling pattern leaders can actually run
Stage 1: One agent, one human shepherd
- Heavy human review
- Fast learning of failure modes
- Logging and metrics established
Stage 2: Many agents, one human shepherd with a shepherd agent
- First-pass review by AI
- Human reviews only flagged cases
- Escalation paths formalised
Stage 3: Many agents, many shepherds, humans on exceptions only
- Shepherds supervise flocks
- Humans approve policy and handle high-impact cases
- Incidents are routine, not crises
As Ian Walker puts it:
“Human-AI teaming transforms span of control from a fixed rule into a strategic variable.”
Only if you design for it.
Twelve practical shepherding moves this quarter
Each of these is small, testable, and reversible.
- Name an owner for every agentic workflow
- Write a one-page RACI for each workflow
- Define explicit stop conditions
- Set human-in-the-loop triggers for high-impact actions
- Apply least-privilege access
- Minimise data by default
- Build a simple evaluation harness
- Log decisions, tool calls, and versions
- Apply spend, rate, and blast-radius limits
- Create a short incident playbook
- Run blameless postmortems
- Do a dignity and fairness check on outputs
The quiet truth
Good leadership is knowing what not to do.
When shepherding works, leaders stop being doers and become designers of decision-making.
They protect customers, colleagues, and the organisation.
That shows up as practical choices, grounded in basic decency:
- Do no harm
- Protect privacy
- Be transparent
- Treat people with dignity
- Be accountable
- Avoid exploitation
- Act with restraint
These are not abstract values.
They are operating decisions.
A One-Page Shepherd Contract
Purpose
A lightweight control document for any agentic workflow.
Defines ownership, quality, escalation, and trust boundaries before scale.
Workflow OverviewWorkflow Name: Accountability (RACI)
Definition of ReadyA task may only start when all of the following are true.
Definition of DoneA task is complete only when all conditions below are met.
What “Good” Looks Like
What “Bad” Looks Like (Must Escalate)
Human-in-the-Loop Triggers
Logging & Audit Requirements
Rollback & Incident Plan
Approval |
Closing thought
If you want to scale agents, do not start by asking how many tasks they can do.
-
Start by asking how many decisions you can safely supervise.
-
Then build shepherds so you do not have to supervise everything.
-
You will move faster, with fewer surprises, and with more trust from your teams.
-
Have fun with your shepherds of agentic sheep.
Links
-
Span of Control: What's the Optimal Team Size for Managers?
https://www.gallup.com/workplace/700718/span-control-optimal-team-size-managers.aspx
Trust rating: high
Reason: Current leadership research on span of control and the risks of overloading managers, directly supporting the “focus and scaling” argument.
Date written: 2026-01-14 -
RACI Charts: The Ultimate Guide, with Examples [2025]
https://asana.com/resources/raci-chart
Trust rating: high
Reason: Clear, leadership-friendly explanation of RACI with practical examples, used to ground the accountability section.
Date written: 2025-12-03 -
Guardrails and Governance: A CIO's Blueprint for Responsible Generative and Agentic AI
https://www.cio.com/article/4094586/guardrails-and-governance-a-cios-blueprint-for-responsible-generative-and-agentic-ai.html
Trust rating: high
Reason: Enterprise-focused guidance on governance, auditability, and human-in-the-loop escalation, aligned to the shepherd model.
Date written: 2025-11-24 -
What is an LLM Evaluation Framework? Workflows and Tools.
https://www.evidentlyai.com/blog/llm-evaluation-framework
Trust rating: high
Reason: Practical guidance on evaluating language model outputs, supporting “definition of done” and repeatable quality checks.
Date written: 2025-08-22 -
BEST USER ATTENTION SPAN STATISTICS 2025
https://www.amraandelma.com/user-attention-span-statistics/
Trust rating: medium
Reason: Helpful synthesis on attention and interruption pressures, used to support the point that leadership focus is finite.
Date written: 2025-07-22
Quotes
-
LinkedIn post by Natarajan Elayappan
https://www.linkedin.com/posts/natdns_the-state-of-ai-in-2025-agents-innovation-activity-7413604726979760128-WVt2
Trust rating: high
Reason: Directly supports the workforce framing and the need for roles, boundaries, escalation paths, and consequences.
Date written: Unknown -
LinkedIn post by Merve Ayyüce KIZRAK, Ph.D.
https://www.linkedin.com/posts/merve-ayyuce-kizrak_linkedinnewseurope-activity-7404803185971924992-rsg5
Trust rating: medium
Reason: Reinforces the leadership requirement to balance capability with safety and governance.
Date written: Unknown -
LinkedIn post by Ian Walker
https://www.linkedin.com/posts/ian-walker-2a54a8_at-a-time-when-many-organisations-are-looking-activity-7370062616586625025-e2QA
Trust rating: medium
Reason: Validates the “span of control becomes a strategic variable” argument in human AI teaming contexts.
Date written: Unknown