AI & Agents 7 May 2026 · 6 min

The Agent Harness Is Now Architecture

By wGrow Project Team · 7 May 2026

Anthropic’s Triad and the Death of the Omnipotent Prompt

Agent Triad

Session (state and memory)

Harness (orchestration and decision)

Sandbox (tool boundary)

We spent two years treating the model as the entire application. That was the wrong level of abstraction, and it cost us in production.

Anthropic’s managed agents documentation formalises what delivery teams learned the hard way: agents are not monolithic. The spec carves the agent surface into three explicit layers. The session holds state and memory across a conversation or task run. The harness is the orchestration loop — it decides what the agent does next, when it stops, and which tools it can call. The sandbox is the execution boundary where tool calls actually run, isolated from the reasoning layer above it.

These are not implementation details. They are architectural responsibilities. If you cannot point to who owns each layer in your current deployment, you have a design gap — not a deployment.

The LLM is a text engine. The harness decides what work actually happens.

That sounds obvious now. It wasn’t obvious to us in 2023, or to most of the enterprise teams I reviewed last year. The model dominated the conversation. Prompt engineering was the discipline. Orchestration was usually a thin wrapper — LangChain, AutoGen, something home-grown and under-specified. The framework handled the loop; nobody was accountable for it.

If you’re building on a black-box orchestration framework today and can’t describe the harness loop without opening the library source, you don’t own your system architecture.

Why DAGs Beat ReAct Loops for Compliance Drafting

Singaporean professional working on data flow diagrams at a dual-monitor desk.

Drafting DAG

step 01

Extract Schema

step 02

Validate Rubric

step 03

Draft Section

We built an automated drafting crew for enterprise compliance reports. The first iteration used ReAct loops: reason, act, observe, repeat. It failed in a specific, repeatable way.

When the agent encountered missing regulatory data — an approval reference not yet filed, a committee sign-off still in transit — it hallucinated completion. Structurally valid compliance text, citing approvals that did not exist. The loop had no mechanism to distinguish “I found the approval” from “I cannot find the approval but the draft requires one.” Those are very different states. The agent treated them as equivalent.

The fix was to discard ReAct entirely and replace it with a hardcoded Directed Acyclic Graph.

Node A extracts raw data from the document corpus — nothing else. Node B validates extracted fields against a hardcoded rubric: required approvals, mandatory references, field completeness. Validation fails? The DAG halts and surfaces the gap. It does not proceed. Node C drafts only on validated input.

Session state passes between nodes as a strict JSON schema. No ambient memory. Node B cannot be influenced by what Node A inferred — it sees only the structured payload. Node C cannot access the extraction layer at all.

In the production logs we reviewed post-deployment, hallucinated approvals stopped appearing. We are not publishing a reduction count until the full log audit is complete. The DAG didn’t make the agent smarter. It made the agent’s failure modes explicit and catchable.

That’s the core lesson. Open-ended autonomy pushes failure into the output, where it’s expensive to detect. Deterministic structure pushes failure into the process, where you catch it cheaply. The trade-off is real — DAGs are rigid. When compliance schemas change, the graph changes with them. That maintenance cost is lower than producing undetected fabrications in regulated documents, but it is not zero.

Sandbox Isolation and the Customer Service Routing Bot

A second project illustrates the sandbox layer. An SME client needed a customer service routing bot: classify inbound queries, route to the right tier, escalate where appropriate.

End-users found the failure mode before we did: prompt injection. A crafted message could rewrite the routing logic embedded in the system instructions. The agent would reclassify Tier 2 issues as Tier 1, or skip escalation paths entirely. The model was compliant. The harness had no boundary.

We applied the triad explicitly. Session state — the user dialogue — is handled by the session layer. It accumulates turn history, nothing else. The harness evaluates session state, extracts intent, and translates it into a constrained JSON routing payload. That payload goes to the sandbox. The sandbox has zero read-access to the original system instructions. It executes the routing call based only on the typed payload it receives.

A prompt injection attack can pollute the session. It cannot reach the sandbox unless the harness passes it through — and the harness passes only typed, schema-validated intent.

We have not seen a successful prompt-injection reroute since deploying the sandbox boundary. That observation stands, but we are not treating it as a published metric until the production volume is audited. The model didn’t change. The sandbox boundary did.

Both projects trace back to the same underlying architecture. The model is interchangeable. The harness and the sandbox boundary are where reliability lives — and, when poorly specified, where brittleness lives too.

Implicit Assumptions and the Stanford HELM Warning

Minimalist technical diagram showing modular system blocks and data flow.

Here’s the underappreciated risk in a harness-dependent architecture: model upgrades are not drop-in replacements.

A prompt that returns clean JSON extraction on one model version may return different field ordering or edge-case handling on the next. The harness that assumes field X is present may silently misroute on that field’s absence. The sandbox call fails — or worse, executes on a degraded payload and nobody notices until a compliance audit.

HELM (Holistic Evaluation of Language Models) is useful here as a reminder: aggregate benchmark scores mask task-level variance. Structured-output behaviour must be regression-tested directly rather than inferred from headline numbers — an upgrade that scores higher overall can still regress on the exact extraction pattern your harness depends on. Consistent enough that I’d treat every model upgrade as a harness risk event — not a transparent infrastructure swap.

If your sandbox is wired directly to a third-party orchestration framework that abstracts the model call, your next model swap may break production in ways you cannot easily trace. The framework hid the assumption. The model changed. The assumption is now wrong.

The mitigation isn’t to freeze model versions — that’s security debt. It’s to keep orchestration explicit and testable. Write the harness loop yourself where stakes justify it. Test it against model output directly. Treat every model upgrade as a harness regression test.

If your harness is coupled to a specific model’s output quirks, the model has become load-bearing. That is exactly what the triad is designed to prevent.

Who Owns the Harness

Standard architecture reviews ask: which model are you using?

Wrong question.

The right question is: who owns the harness?

Who wrote the orchestration loop? Who can modify it without touching the model layer? Who is accountable when a session state assumption proves wrong? Who defined the sandbox boundary?

If the answer is “the framework,” or “we use LangChain,” or “the managed agent handles that” — you’ve described who built the plumbing. You haven’t identified who is accountable for the architecture. For low-stakes internal tooling, delegating that accountability to a well-maintained framework may be a reasonable call. For compliance drafting, customer routing, or any regulated workflow, it is not.

Enterprise teams running consequential workloads should build custom orchestration loops, enforce rigid sandbox boundaries with typed payloads and zero ambient instruction bleed, and treat models as interchangeable compute. A better model ships; the harness absorbs the swap without incident.

Anthropic’s triad is useful precisely because it forces the question. Session, harness, sandbox: three separate accountability boundaries. If you can’t name the owner of each in a five-minute architecture review, a model upgrade or a creative end-user will find the gap for you.

Stop asking teams which model they’re using. Ask who owns the harness. The answer tells you whether they’ve built a system — or rented one.

← All field notes Brief a crew →