> For the complete documentation index, see [llms.txt](https://wiki.fridays.bot/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://wiki.fridays.bot/documentation/white-paper/5.-agent-orchestration.md).

# 5. Agent Orchestration

#### 5.1 Playbook model: declarative goals, not free-form agent loops

A playbook is a first-party, versioned artifact with five parts: **objective** (a verifiable predicate over tenant state), **constraints**, **tool allow-list**, **risk policy**, and **required scopes** (§4.5). Abridged example:

```yaml
playbook: invoice_followup
version: 3.2.0
objective: >
  Every open invoice aged >14 days has an active follow-up cadence
  (last outbound touch <= cadence interval) or an owner-acknowledged hold.
constraints:
  - max_touches_per_invoice_per_week: 1
  - no_outbound_before_first_reminder_day: 14
  - tone_profile: tenant.communication_profile
  - never_contact: tenant.suppression_list
tools:
  - quickbooks.invoice.list        # read
  - gmail.thread.search            # read
  - zoho.contact.get               # read
  - gmail.draft.create             # record_mutation (internal)
  - gmail.message.send             # external_communication
risk_policy:
  gmail.message.send: suspend_unless_below_threshold
scopes:
  google: [gmail.readonly, gmail.send]
```

Why declarative-with-closed-tools rather than an open agent loop:

1. **Verifiability.** The objective is a predicate, so "did the playbook work" is computable per tenant per day — the basis of the eval harness (§11.1) and of the correctness claims regulators expect substantiated (§12.2). A free-form loop's success criterion is whatever the model decided it was.
2. **Bounded failure surface.** Anthropic's own agent-design guidance distinguishes *workflows* (LLM-directed steps through predefined paths) from *agents* (open-ended loops), recommending the simplest sufficient pattern \[1]. Back-office tasks are closed-world: the tool set needed to chase an invoice is knowable in advance. Accepting an open loop buys generality Fridays doesn't need at the price of unboundable behavior it can't ship.
3. **Injection containment.** The allow-list is enforced at the gateway (§3.3), but it originates here: an injected email cannot enlarge a playbook's capability because capability is a property of the artifact, not the conversation \[6].

Tenants never author playbooks (§1.3). Per-tenant variation enters through typed parameters (cadence intervals, tone profile, suppression lists) validated against the spec's schema — configuration, not code.

#### 5.2 Planner/executor separation; plan caching

The **planner** (LLM-backed) compiles a playbook instance against a tenant snapshot into a **plan**: a DAG of typed tool calls with data-flow edges. The **executor** (no LLM) walks the DAG, handling retries, idempotency, and suspension. The split is the cost and determinism boundary:

* **Inference happens at plan time, not per step.** A ReAct-style loop \[2] re-invokes the model after every tool result — for a 9-step invoice sweep that's 9 inference calls with growing context. Fridays' planner emits the DAG in one to two calls; the executor's step transitions are code. Model calls scale with *plan generation*, not *plan length*.
* **Plan caching.** Recurrent instances of the same (playbook version, tenant-stack shape) compile to structurally identical DAGs; only slot values differ (invoice IDs, amounts, recipients). The plan template is cached keyed on (playbook version, connector set, parameter schema hash); cache hits skip inference entirely. This is the mechanism behind the §13.1 claim that inference cost per action is bounded — the same logic as LLM-cascade cost work \[3], applied at plan granularity. Cache invalidation is exact: playbook version bump, connector schema change (§3.3 version pinning), or tenant parameter change.
* **Replanning is an explicit transition.** If a step returns a result outside the plan's declared expectations (contact not found; invoice paid mid-flight), the executor does not improvise — it suspends the branch and re-enters the planner with the failure context. Improvisation in the executor would reintroduce the unbounded behavior 5.1 excluded.

Plans are content-addressed: the plan hash is recorded in the action log (§2.3 step 5), so any executed action is traceable to the exact compiled artifact that requested it.

#### 5.3 Tool-call mediation

Restating §2/§3 from the orchestration side, because the contract is the interface the planner is trained and evaluated against: the executor emits **actions** — `(tool, args, idempotency_key, plan_ref, tenant)` — to the Action Gateway. The gateway validates args against the tool's JSON Schema (malformed model output is rejected before any vendor traffic; schema-constrained decoding at the planner reduces but does not replace this check \[4]), verifies the tool against the playbook allow-list, classifies risk, and either dispatches or suspends (§6). No component below the gateway accepts free text as an instruction. The practical consequence: hallucination degrades to *rejected action* (logged, retried via replanning), never to *unintended vendor mutation*.

#### 5.4 State management: idempotency, retries, partial failure

Vendor delivery semantics are at-least-once on both planes (webhooks §3.4; API calls under retry). Exactly-once *processing* is achieved the standard way — idempotent effects, not exactly-once transport \[5]:

* **Idempotency keys.** Every mutating action carries a key derived from `(tenant, playbook_instance, step_id, semantic_target)` — e.g., `(t_481, invoice_followup#2026-07-02, send_reminder, invoice:QB-10442, touch:2)`. Retries of the same logical step reuse the key. Wrappers map it to vendor mechanisms: native idempotency headers where offered, etag/If-Match compare-and-set, or query-before-write where the vendor offers nothing (§3.2.2) — the executor sees one contract.
* **Retry policy.** Transient classes (429, 5xx, network) retry with exponential backoff plus full jitter \[7], budgeted per action and metered through the Rate-Limit Governor (§3.5). Permanent classes (4xx validation, revoked auth) do not retry: validation failures route to replanning (§5.2); auth failures route to the re-auth path (§4.4).
* **Partial failure.** A DAG with 40 invoice branches where 3 fail completes the 37, records the 3 with cause, and surfaces one digest entry — not 3 alarming pushes. Branch independence is declared in the plan; the executor never lets one branch's failure poison siblings' state.
* **Crash recovery.** Executor state (per-step status, keys, pending suspensions) is persisted per transition; a resumed executor replays from the last recorded step, and idempotency keys make replay of an already-dispatched step a vendor-side no-op.

#### 5.5 Cross-vendor consistency: sagas, not distributed transactions

No two-phase commit exists across QuickBooks and Gmail; vendors expose no transactional API surface. Multi-app playbooks therefore run as **sagas** \[8]: an ordered sequence of local actions, each with a declared **compensation**, executed with the semantic ordering chosen so the *hardest-to-compensate action goes last*.

Worked example — `record_payment_and_thank`: (1) match bank deposit to invoice, (2) record payment in QuickBooks, (3) send thank-you/receipt email. Ordering rationale: step 3 (external communication) is uncompensatable — an email cannot be unsent — so it executes only after step 2 confirms. If step 2 fails after step 1's match was cached, compensation is trivial (discard the match). The general rules:

* **Compensation is declared per tool in the connector**, not invented by the planner: `quickbooks.payment.create` ⇄ `quickbooks.payment.delete`; `zoho.deal.update` ⇄ restore-from-prior-snapshot; `gmail.message.send` ⇄ ∅ (uncompensatable, therefore always latest-ordered and approval-gated — the risk policy and the saga ordering reinforce each other).
* **Semantic locks, not vendor locks.** The saga holds an internal lease on the semantic target (invoice QB-10442) so two concurrent playbook instances cannot interleave on the same object; vendors are never asked to hold locks they don't offer.
* **Compensation runs through the gateway** like any action — audited, risk-classed (a compensating `payment.delete` is itself record mutation), and visible in the same log chain as the forward path (§7.2).

#### 5.6 Model selection and routing

Playbook classes are routed to model tiers on a cost/latency/capability matrix, in the spirit of LLM-cascade results showing large cost reductions at matched quality when easy calls are served by smaller models \[3]:

* **Tier A (frontier):** initial plan compilation for complex playbooks; drafting external communications (tone-sensitive, tenant-profile-conditioned); replanning after anomalous failures.
* **Tier B (mid):** cached-plan slot filling, classification (email triage intent labels), extraction (invoice references from thread text).
* **Tier C (small/local-capable):** formatting, canonicalization, deterministic-adjacent transforms where a regex won't survive input variety.

Routing is declared per playbook step class, not decided at runtime by the model itself; escalation (B→A) triggers on schema-validation failure or low-confidence classification. Every call is metered per action (§7.4), so tier assignments are audited against measured quality from the eval harness (§11.1) — a tier downgrade ships only when the playbook's golden-set metrics hold. Provider redundancy: at least two providers per tier, hot-swappable, since §10.1's degraded-mode analysis treats LLM providers as vendors like any other.

#### 5.7 Context construction

Planner context is assembled, never accumulated — no long-lived conversational state that drifts or leaks across instances (each plan compilation starts from a fresh, minimal context):

* **Tenant snapshot.** The Operational Cache (§2.2) serves a typed business-state projection scoped to the playbook's declared reads: open invoices with aging buckets, recent thread metadata, contact records. Projection, not dump — the planner receives the 12 fields the playbook consumes, not the QuickBooks object graph.
* **PII minimization.** Fields irrelevant to the decision are pseudonymized before entering model context (bank account fragments, SSNs from payroll reads, personal addresses) using deterministic tokenization; the gateway re-hydrates real values at dispatch time from the cache, outside model visibility. Detection/redaction uses a recognizer pipeline of the Presidio class \[9]; §9.2 carries the policy per data class. Consequence: the LLM provider (a sub-processor, §9.3) processes materially less PII than the vendor APIs do, which narrows DPA scope and the §12 disclosure surface.
* **Instruction/content separation.** Untrusted vendor text (email bodies, invoice memos) enters context only inside typed, delimited data fields, never concatenated into the instruction channel; the planner's system instructions are static per playbook version \[6]. This is a mitigation, not a guarantee — §8.2 states plainly that separation reduces injection success rates but the enforcement backstop remains the gateway allow-list plus approval gating (TB2, §2.4).

***

#### References (Section 5)

\[1] Anthropic, *Building Effective Agents* (Dec 2024) — workflows vs. agents; "find the simplest solution possible." <https://www.anthropic.com/research/building-effective-agents>

\[2] S. Yao et al., *ReAct: Synergizing Reasoning and Acting in Language Models* (ICLR 2023) — the per-step reason/act loop §5.2 deliberately avoids for cost/determinism reasons. <https://arxiv.org/abs/2210.03629>

\[3] L. Chen, M. Zaharia, J. Zou, *FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance* (2023) — LLM cascades; basis for tiered routing and plan-cache economics. <https://arxiv.org/abs/2305.05176>

\[4] OpenAI/Anthropic structured-output and tool-use documentation; JSON Schema (draft 2020-12) as the tool-argument contract. Schema-constrained decoding complements, never replaces, gateway-side validation.

\[5] M. Kleppmann, *Designing Data-Intensive Applications* (O'Reilly, 2017), ch. 8, 11 — exactly-once processing via idempotence over at-least-once delivery; also IETF draft-ietf-httpapi-idempotency-key-header.

\[6] S. Willison, prompt-injection series (2022–2025) — instruction/content separation limits and the case for capability-side enforcement. <https://simonwillison.net/series/prompt-injection/>

\[7] AWS Architecture Blog, *Exponential Backoff and Jitter* (2015) — full-jitter retry policy. <https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/>

\[8] H. Garcia-Molina, K. Salem, *Sagas* (SIGMOD 1987) — compensation-based long-lived transactions; applied here across vendor APIs without distributed locks.

\[9] Microsoft Presidio — PII recognizer/anonymizer pipeline pattern for pre-inference redaction. <https://microsoft.github.io/presidio/>

***

*Next: Section 6 (The Approval Model) — risk classification, threshold engine, approval feed mechanics, GDPR Art. 22 mapping, failure modes.*

***

### As-Built Reconciliation — V1

*Legend/sources as in §1's addendum.*

| §                                                                               | Status                     | As-built (source)                                                                                                                                                                                                                                                                                                                                                            |
| ------------------------------------------------------------------------------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 5.1 Playbook model (declarative goal + closed tools)                            | **PLANNED**                | Realized on backend as agent config + routines + issues; no first-party playbook catalog/YAML spec yet. The **heartbeat model** is the as-built expression of "not a free-form loop" — bounded, budget-checked, resumable — and **EXISTS**. AO §2.                                                                                                                           |
| 5.2 Planner/executor split + plan caching                                       | **PARTIAL / PLANNED**      | Backend has **exact-once plan decomposition** (fingerprinted on `(sourceIssueId, acceptedPlanRevisionId)`) and pre-invocation budget bounding (EXISTS, AO §3, §6). The specific **planner-emits-DAG / executor-walks-DAG** split with cached plan templates is a Fridays design (PLANNED); backend agents run via **adapters** (CLI agents), not a Fridays planner/executor. |
| 5.3 Tool-call mediation (typed, policy-checked, no free text below the gateway) | **PARTIAL / ASPIRATIONAL** | Plugin tools are capability-gated + schema-validated + audited (EXISTS); runtime MCP tool calls are **not** (the hole). "No component below the gateway accepts free text" is not yet true by construction. IS §4.1.                                                                                                                                                         |
| 5.4 State mgmt (idempotency, retries, partial failure, crash recovery)          | **EXISTS (substrate)**     | Atomic checkout (409-never-retried), single-assignee, **one** auto-retry then explicit recovery, silent-run watchdog, reconciliation. AO §3. The Fridays **idempotency-key contract** (`(tenant, instance, step, semantic_target)`) is PLANNED; backend uses checkout/execution locks.                                                                                       |
| 5.5 Cross-vendor sagas + compensations                                          | **ASPIRATIONAL**           | No saga engine in backend; first-class **blocker/dependency** edges are the nearest primitive. AO §3. Compensation-per-tool + semantic leases — PLANNED.                                                                                                                                                                                                                     |
| 5.6 Model selection / routing (tiers, 2 providers/tier)                         | **EXISTS (substrate)**     | Adapter model, runtime-swappable, model-vendor-independent (10 adapters + `process`/`http`). AO §1, TS §4.2. The **tier-routing policy** + hot failover are PLANNED (adapters make them possible).                                                                                                                                                                           |
| 5.7 Context construction + PII minimization (Presidio pseudonymization)         | **PLANNED**                | Log-side redaction EXISTS (SC §7); **prompt-side** pseudonymization does not. → `data-handling-and-privacy-v1.md` §4.                                                                                                                                                                                                                                                        |

**Flag:** planner/executor, the saga engine, and the idempotency-key contract are Fridays designs not present on backend; tool-mediation completeness depends on closing the runtime-MCP hole (§3.3/§5.3). The reliability primitives §5.4 relies on (checkout, retry, watchdog, reconciliation) genuinely **exist** — this subsection is the most as-built-aligned in the section.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://wiki.fridays.bot/documentation/white-paper/5.-agent-orchestration.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.