Shepherds of Agentic Sheep

Written by Tony Wood | Jan 27, 2026 10:36:52 AM

The Leadership Playbook for Scaling AI Without Losing Control

I keep seeing the same pattern.

A leader starts with one helpful AI agent doing a narrow task.

Then a few more appear.

Then suddenly there is a small digital workforce producing drafts, decisions, customer messages, refunds, tickets, summaries, follow-ups, and process changes.

That is when the real question arrives:

Who is actually responsible when an agent gets it wrong?

This post is about that moment.

The metaphor, and why it matters

Managing agentic systems looks a lot like managing people.

You set expectations.
You give boundaries.
You review work.
You coach.

You decide what happens when things go well and what happens when they do not.

At the beginning, you can keep a close eye on everything.

That works with one agent.
It even works with three.

But the moment you want agents across sales, finance, HR, service, and operations, you hit the same ceiling you always hit with humans.

Your attention does not scale.
Your focus does not scale.

So you need layers.

You need shepherds.

What “agentic sheep” means in plain English

When I say agentic sheep, I mean AI agents that can:

Take a goal
Plan steps
Use tools such as email, CRM, ticketing, spreadsheets, or finance systems
Execute actions
Report what they did

This is powerful.

It is also operationally dangerous if nobody is watching the right things.

Not watching everything.

Watching the right things.

The leadership bottleneck is focus, not technology

Most agent programmes fail in a very boring way.

Not because the model is bad.

Because leaders become the human router for every edge case.

They become the escalation layer for vague instructions.
They become the approval gate for every risky output.

The workflow problem turns into a leadership bandwidth problem.

Modern work is already full of interruptions.
If your operating model assumes leaders can simply “keep an eye on it”, you are already in trouble.

This is why scaling agentic systems is an operating model challenge first, not a tooling challenge.

Span of control comes back, but louder

Span of control breaks when one manager has too many direct reports to lead well.

Agents recreate the same dynamic, only faster.

Agents produce more work, more quickly, and often with more confidence than is warranted. That creates more output, more exceptions, and more review pressure.

Flattening without shepherding means leaders drown in escalations.

The answer is not more dashboards.

The answer is a structured oversight layer.

This is where shepherds come in.

What a shepherd actually is

A shepherd is a supervising layer that:

Reviews what a set of agents did
Checks outputs against agreed quality and risk rules
Escalates only what matters
Feeds learning back so the system improves over time

A shepherd can be:

A human with a structured review cadence
An AI agent designed to supervise other agents
A hybrid of the two, which is where most organisations will land

This is not about replacing leaders.

It is about protecting them.

Two decisions leaders must make up front

1. Where humans should spend attention

If everything needs attention, you have built nothing.

Decide where humans add disproportionate value:

Risky decisions
Ambiguous customer situations
Money movement
Legal commitments
Identity and access changes
Sensitive data use
External communication under your brand

Everything else should default to agent execution with shepherded review.

2. What “bad” looks like

Most teams define success.

Very few define failure.

Agents are excellent at being confidently wrong.

So define what bad looks like. Write it down. Make it operational. Make it testable.

Governance is not optional

This line should be printed and pinned wherever your agent programme is being designed:

“The organizations making the most progress are treating AI agents as part of the workforce. They define roles, boundaries, escalation paths, and consequences. They invest as much in governance and monitoring as they do in model capability.”
Natarajan Elayappan

That is the point of shepherding.

Without governance you do not get scale.

You get chaos at speed.

Use RACI because it forces clarity

RACI is not fashionable. It is effective.

Responsible: who does the work
Accountable: who owns the outcome
Consulted: who must be asked
Informed: who must be told

If you do not define this for an agentic workflow, your organisation will invent it under pressure.

That is when mistakes happen.

Example: agentic refund workflow

Responsible: Refund Agent
Accountable: Head of Customer Operations
Consulted: Finance Controller
Informed: Support Team Lead

Add the shepherd:

Responsible: Refund Agent (does)
Responsible: Shepherd Agent (checks)
Accountable: Head of Customer Operations

If your shepherd cannot explain the RACI, it is not a shepherd.

It is a second agent guessing.

Definition of Ready, Done, and “Good Looks Like”

This is where most teams struggle.

They tell an agent to “handle invoices” or “manage enquiries” and are surprised when results vary.

You need three definitions.

Definition of Ready

A task is ready when:

The goal is explicit
Required data is present and permitted
Tools are available
Policy constraints are clear
Escalation triggers are defined

Definition of Done

A task is done when:

Outputs are complete
Sources are recorded
Actions are logged
The right people are informed
A rollback path exists

Good looks like

Good looks like:

Correct and policy-compliant
Appropriate for the audience
Minimal risk exposure
Measurable improvement

This is not paperwork.

This is how scale becomes safe.

Guardrails let you move faster without hurting people

Unchecked autonomy creates incidents.
Incidents create shutdowns.

Guardrails let autonomy exist without damage.

“The real breakthrough lies in finding a balance between harnessing this power and implementing robust safety measures and governance.”
Merve Ayyüce KIZRAK

Guardrails are not rules for the model.

They are rules for the organisation.

They define:

Tool access
Data visibility
Human approval thresholds
Logging requirements
Review expectations
Stop conditions

The scaling pattern leaders can actually run

Stage 1: One agent, one human shepherd

Heavy human review
Fast learning of failure modes
Logging and metrics established

Stage 2: Many agents, one human shepherd with a shepherd agent

First-pass review by AI
Human reviews only flagged cases
Escalation paths formalised

Stage 3: Many agents, many shepherds, humans on exceptions only

Shepherds supervise flocks
Humans approve policy and handle high-impact cases
Incidents are routine, not crises

As Ian Walker puts it:

“Human-AI teaming transforms span of control from a fixed rule into a strategic variable.”

Only if you design for it.

Twelve practical shepherding moves this quarter

Each of these is small, testable, and reversible.

Name an owner for every agentic workflow
Write a one-page RACI for each workflow
Define explicit stop conditions
Set human-in-the-loop triggers for high-impact actions
Apply least-privilege access
Minimise data by default
Build a simple evaluation harness
Log decisions, tool calls, and versions
Apply spend, rate, and blast-radius limits
Create a short incident playbook
Run blameless postmortems
Do a dignity and fairness check on outputs

The quiet truth

Good leadership is knowing what not to do.

When shepherding works, leaders stop being doers and become designers of decision-making.

They protect customers, colleagues, and the organisation.

That shows up as practical choices, grounded in basic decency:

Do no harm
Protect privacy
Be transparent
Treat people with dignity
Be accountable
Avoid exploitation
Act with restraint

These are not abstract values.

They are operating decisions.

A One-Page Shepherd Contract

Purpose
A lightweight control document for any agentic workflow.
Defines ownership, quality, escalation, and trust boundaries before scale.

Workflow Overview

Workflow Name:
Business Outcome:
Primary Risk Area: (e.g. financial, legal, reputational, operational)

Accountability (RACI)

Role	Name / Function
Responsible (R)
Accountable (A)
Consulted (C)
Informed (I)

Definition of Ready

A task may only start when all of the following are true.

Goal and success criteria are explicit
Required input data is available and permitted
Tools and permissions are correctly scoped
Policy and guardrails are known
Escalation conditions are defined

Definition of Done

A task is complete only when all conditions below are met.

Output is complete and fit for purpose
Sources and assumptions are recorded
Actions taken are logged
Required parties are notified
Rollback path is known and available

What “Good” Looks Like

Policy-compliant and factually correct
Appropriate tone and audience fit
Minimal risk exposure
Measurable improvement (time saved, errors reduced, escalations avoided)

What “Bad” Looks Like (Must Escalate)

Financial impact above threshold
Legal or contractual ambiguity
Sensitive data exposure
Customer distress or harm
Identity, access, or permission changes
Any output that is confidently uncertain

Human-in-the-Loop Triggers

Money movement
External communication under brand
Contractual commitments
Identity or access changes
High uncertainty or low confidence signals

Logging & Audit Requirements

Inputs received
Decisions made
Tools used
Actions executed
Model and prompt version
Timestamp and actor (agent or human)

Rollback & Incident Plan

How the action can be reversed
Who is notified
What is paused or shut off
Where the incident is reviewed

Approval
Accountable Owner Sign-off:
Date:

Closing thought

If you want to scale agents, do not start by asking how many tasks they can do.

Start by asking how many decisions you can safely supervise.
Then build shepherds so you do not have to supervise everything.
You will move faster, with fewer surprises, and with more trust from your teams.
Have fun with your shepherds of agentic sheep.

Links

Span of Control: What's the Optimal Team Size for Managers?
https://www.gallup.com/workplace/700718/span-control-optimal-team-size-managers.aspx
Trust rating: high
Reason: Current leadership research on span of control and the risks of overloading managers, directly supporting the “focus and scaling” argument.
Date written: 2026-01-14
RACI Charts: The Ultimate Guide, with Examples [2025]
https://asana.com/resources/raci-chart
Trust rating: high
Reason: Clear, leadership-friendly explanation of RACI with practical examples, used to ground the accountability section.
Date written: 2025-12-03
Guardrails and Governance: A CIO's Blueprint for Responsible Generative and Agentic AI
https://www.cio.com/article/4094586/guardrails-and-governance-a-cios-blueprint-for-responsible-generative-and-agentic-ai.html
Trust rating: high
Reason: Enterprise-focused guidance on governance, auditability, and human-in-the-loop escalation, aligned to the shepherd model.
Date written: 2025-11-24
What is an LLM Evaluation Framework? Workflows and Tools.
https://www.evidentlyai.com/blog/llm-evaluation-framework
Trust rating: high
Reason: Practical guidance on evaluating language model outputs, supporting “definition of done” and repeatable quality checks.
Date written: 2025-08-22
BEST USER ATTENTION SPAN STATISTICS 2025
https://www.amraandelma.com/user-attention-span-statistics/
Trust rating: medium
Reason: Helpful synthesis on attention and interruption pressures, used to support the point that leadership focus is finite.
Date written: 2025-07-22

Quotes

LinkedIn post by Natarajan Elayappan
https://www.linkedin.com/posts/natdns_the-state-of-ai-in-2025-agents-innovation-activity-7413604726979760128-WVt2
Trust rating: high
Reason: Directly supports the workforce framing and the need for roles, boundaries, escalation paths, and consequences.
Date written: Unknown
LinkedIn post by Merve Ayyüce KIZRAK, Ph.D.
https://www.linkedin.com/posts/merve-ayyuce-kizrak_linkedinnewseurope-activity-7404803185971924992-rsg5
Trust rating: medium
Reason: Reinforces the leadership requirement to balance capability with safety and governance.
Date written: Unknown
LinkedIn post by Ian Walker
https://www.linkedin.com/posts/ian-walker-2a54a8_at-a-time-when-many-organisations-are-looking-activity-7370062616586625025-e2QA
Trust rating: medium
Reason: Validates the “span of control becomes a strategic variable” argument in human AI teaming contexts.
Date written: Unknown

View full post