> For the complete documentation index, see [llms.txt](https://wiki.fridays.bot/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://wiki.fridays.bot/documentation/white-paper/9.-data-handling-and-privacy.md).

# 9. Data Handling and Privacy

Roles first, because every obligation below hangs on them: for the business data flowing through connectors, the **tenant is the controller** and Fridays is a **processor** under GDPR Art. 28 \[1] (service provider under CCPA/CPRA \[6]). The data subjects who matter most — the tenant's customers, whose emails and invoices Fridays reads — have no contract with Fridays at all; their rights run against the tenant, and Fridays' job is to make the tenant's compliance mechanically possible (§9.4). The architecture already did most of the work: this section mostly names the properties §1.3, §4.3, §5.7, and §7.1 bought.

#### 9.1 Data classification

| Class                           | Contents                                               | Handling anchor                                                                                       |
| ------------------------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------- |
| **C1 — Credentials/secrets**    | OAuth tokens, verifier tokens, signing keys            | Vault only; never in logs, context, or telemetry (§4.3, §8.4)                                         |
| **C2 — Financial records**      | Invoices, payments, balances, bank-account fragments   | Pseudonymized pre-inference (§5.7); 7-y audit-chain retention for money chains (§7.1)                 |
| **C3 — Communications content** | Email bodies/threads, drafts                           | Highest injection exposure (§8.2) *and* Google Limited-Use-restricted where Gmail-sourced (§9.3)      |
| **C4 — Identity/contact PII**   | Names, emails, phones, addresses of tenant's customers | Deterministic tokenization in model context (§5.7)                                                    |
| **C5 — Employment/payroll**     | Compensation, SSN-class identifiers, benefits          | Highest-liability class; deliberately routed through one aggregator, read-heavy only (§3.2.3)         |
| **C6 — Derived data**           | Embeddings, plans, prompts-as-seen, instance scores    | Inherits source class — embeddings at source-text sensitivity per inversion results (§8.3 \[4 in §8]) |
| **C7 — Telemetry**              | Traces, metrics                                        | Payload-free by construction (§7.5)                                                                   |

Class assignment is per field at the wrapper/connector layer, not per record — an invoice object contains C2 amounts, C4 contact fields, and C3 memo text, each with different context-visibility rules (§5.7).

#### 9.2 Minimization and retention

Minimization is enforced at three chokepoints already specified, restated as policy:

1. **Projection at context assembly** (§5.7): the planner receives the playbook's declared read-set — fields, not object graphs. Art. 5(1)(c) "adequate, relevant, limited to what is necessary" \[1] implemented as a schema, which makes it testable in CI rather than aspirational.
2. **Pseudonymization pre-inference** (§5.7): C4/C5 identifiers and C2 account fragments never reach the LLM sub-processor in the clear; re-hydration happens gateway-side at dispatch.
3. **Re-fetch over retain.** §1.3's not-a-system-of-record posture has a retention corollary: anything reconstructible on demand from the vendor API defaults to short-TTL cache, because the vendor *is* the system of record and an archival copy at Fridays is pure breach surface with no product value. The Operational Cache is a working set with TTLs measured in days, not a data lake.

Retention classes (full table in Appendix C):

| Store                   | Class                           | Default retention                             | Basis                                              |
| ----------------------- | ------------------------------- | --------------------------------------------- | -------------------------------------------------- |
| Operational Cache       | C2–C4 working set               | Days–weeks (TTL per entity type)              | Re-fetchable; Art. 5(1)(e) storage limitation \[1] |
| Payload store           | Action request/response bodies  | Per risk class; money-movement-linked longest | Dispute resolution, §7.2                           |
| Audit chain (structure) | Hashes, classes, decisions      | 7 y for `money_movement`-touching chains      | Tax/records horizons, §7.1                         |
| Prompt archive          | Context-as-seen (pseudonymized) | Tied to plan retention                        | Provenance, §7.2; already minimized by (2)         |
| Telemetry               | C7                              | Weeks                                         | Ops only, §7.5                                     |

The GDPR-vs-immutability resolution carries the whole table: payloads are crypto-shreddable per tenant (DEK destruction, §4.3/§7.1) while chain structure persists under Art. 17(3)(b)'s legal-obligation carve-out for the records-retention classes \[1].

#### 9.3 Sub-processor architecture

Three sub-processor classes, with the design goal stated bluntly: *minimize the count and the fidelity of data each one sees*, because every sub-processor is a DPA to negotiate, an audit to consume, and a line in the Art. 28(2) disclosure list with advance-notice obligations on change \[1].

* **LLM providers.** Receive pseudonymized, projected context only (§5.7, §9.2) — materially less PII than the vendor APIs themselves carry. Contract terms required: enterprise API terms with **no-training-by-default** on API inputs/outputs and zero-or-minimal retention options, which the major providers' commercial terms now offer \[7]. Two providers per tier (§5.6) means two DPAs — the redundancy cost is accepted; the mitigations are identical.
* **Aggregators (Finch/Merge, Ayrshare-class).** The inversion of the LLM case: they see *full-fidelity* C5 payroll data — that is their function — which is exactly why §3.2.3 chose one aggregator over N direct payroll integrations: one DPA, one SOC 2 report to review, one sub-processor list entry, instead of ADP+Rippling+Gusto+... each with their own. The aggregator consolidates liability surface as much as engineering surface.
* **Cloud infrastructure.** Standard DPAs + region pinning (§9.4).

**The Google Limited Use constraint deserves its own paragraph** because it binds the whole chain, not just Fridays: Gmail-restricted-scope data is subject to Google's API Services User Data Policy Limited Use requirements — use only for user-facing features, no advertising, no human reads outside narrow exceptions, no transfers except as necessary to provide the feature — and Google's Workspace API policy explicitly prohibits using Workspace user data to train **generalized** AI/ML models \[8]. Consequence: the LLM sub-processor's no-training terms in \[7] are not merely Fridays' preference — they are a *flow-down condition of the Gmail integration existing at all*, verified in the CASA/verification process (§3.7). C3 handling, §9.5's training boundary, and the sub-processor contracts are one interlocking obligation set.

#### 9.4 Regional handling and deletion propagation

**Transfers.** EU tenant data relies on the EU–US Data Privacy Framework adequacy decision (July 2023) for DPF-certified sub-processors, with the 2021 Standard Contractual Clauses (Decision 2021/914) executed as fallback \[2]\[3] — belt-and-suspenders because DPF's litigation exposure is a known risk (the General Court dismissed the first direct challenge in 2025; appeals and successor challenges are assumed as a planning matter, hence SCCs are pre-executed, not break-glass). CCPA/CPRA: service-provider addenda with no-sale/no-share and purpose limitation \[6]. Region pinning for EU tenants' primary stores where offered by infrastructure sub-processors.

**Deletion, two distinct paths:**

1. **Tenant leaves Fridays.** Sequence: active playbooks halt → vendor tokens actively revoked at each authorization server (RFC 7009 \[5] — revocation, not mere discard, so the grant dies vendor-side too, and several vendor policies require deletion-on-disconnect regardless \[8]) → caches and payload stores crypto-shredded via DEK destruction (§4.3) → chain structure retained per §9.2's carve-out. Nothing needs deleting *inside* the tenant's QuickBooks/Zoho, because Fridays never was the system of record (§1.3) — exit cost and deletion cost were bounded by the same design decision.
2. **Tenant's customer exercises rights against the tenant.** The tenant (controller) deletes/rectifies in their own systems — QuickBooks, Zoho — and Fridays, as processor, must not resurrect the data. Mechanism: the deletion/change arrives as a **webhook like any other change event** (§3.4); the entity's cache rows are invalidated, payload-store objects keyed to that entity are tombstoned, and pseudonym-token mappings for the subject are destroyed (after which the prompt archive's tokens for that subject dereference to nothing — the §5.7 tokenization turns out to be a deletion mechanism, not only a confidentiality one). Short cache TTLs (§9.2) bound the propagation window even if the webhook is missed; the next re-fetch simply finds the vendor's post-deletion state. Audit-chain hashes referencing the entity persist under Art. 17(3)(b) where the underlying action is in a records-retention class \[1].

The asymmetry is deliberate and honest: path 1 is instantaneous by cryptography; path 2 is eventually consistent with a bounded window — stated as such in the DPA rather than promising synchronous global erasure no distributed system can deliver.

#### 9.5 Model-training boundary

Stated as three nested prohibitions, each with its enforcement mechanism:

1. **No customer data trains foundation models.** Enforced contractually (no-training API terms \[7]) and required independently by Google's Workspace policy for any Gmail-derived data \[8]. Not a policy Fridays could quietly relax: it is a condition of the highest-value integration.
2. **No cross-tenant fine-tuning or shared learned artifacts.** Excluded at the product level (§8.3), which also deletes the membership-inference/extraction attack class against shared weights. Playbook improvements come from the eval harness over synthetic and consented data (§11), shipping as *versioned playbook releases* — reviewable diffs, not weight updates.
3. **Per-tenant learning is confined to the approval-threshold policy state** (§6.2): band positions per (playbook, class, feature-band) cell. This is parameter estimation over a decision policy — inspectable, resettable, exportable, and erased with the tenant — not gradient updates to any model. The distinction is load-bearing in three venues: vendor API reviews (\[8]'s training prohibition), DPA negotiations (what "processing for improvement" means), and §12's disclosure obligations, where "does the system learn from my data" must have a precise, honest answer: *yes — the thresholds at which it asks you; nothing else.*

***

#### References (Section 9)

\[1] Regulation (EU) 2016/679 (GDPR) — Art. 5(1)(c),(e) (minimization, storage limitation); Art. 17(3)(b) (erasure carve-out for legal obligation); Art. 28 (processor, sub-processor authorization). <https://eur-lex.europa.eu/eli/reg/2016/679/oj>

\[2] European Commission, Adequacy Decision for the EU–US Data Privacy Framework (10 July 2023); General Court dismissal of the first annulment action (2025) — treated as unstable precedent for planning purposes.

\[3] Commission Implementing Decision (EU) 2021/914 — Standard Contractual Clauses (modules 2/3, controller→processor and processor→processor).

\[4] J. Morris et al., *Text Embeddings Reveal (Almost) As Much As Text* (EMNLP 2023) — cited via §8.3 for C6 classification. <https://arxiv.org/abs/2310.06816>

\[5] IETF RFC 7009, *OAuth 2.0 Token Revocation*. <https://datatracker.ietf.org/doc/html/rfc7009>

\[6] California Consumer Privacy Act as amended by CPRA — service-provider definition, no-sale/no-share contract requirements (Cal. Civ. Code §1798.100 et seq.).

\[7] LLM provider commercial/enterprise data-usage terms (Anthropic, OpenAI) — no-training-by-default on API data; retention controls. Current terms verified at DPA execution; provider terms are mutable and re-reviewed on renewal.

\[8] Google, *API Services User Data Policy* (Limited Use requirements for restricted scopes) and Workspace API user-data policy — prohibition on using Workspace API data to train generalized AI/ML models. <https://developers.google.com/terms/api-services-user-data-policy>

***

*Next: Section 10 (Reliability Engineering) — degraded modes per vendor outage, queueing and priority, DR with vault recovery, API deprecation management.*

***

### As-Built Reconciliation — V1

*Legend/sources as in §1's addendum. Whole section → companion `data-handling-and-privacy-v1.md`.*

| §                                                                                                       | Status                  | As-built (source)                                                                                                                                                                                                                      |
| ------------------------------------------------------------------------------------------------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 9.1 Classification (C1–C7)                                                                              | **PLANNED (framework)** | C1 credentials handled today (secrets subsystem, SC §9). Per-field classification at the wrapper/connector layer — PLANNED (no connectors exist).                                                                                      |
| 9.2 Minimization / retention                                                                            | **PLANNED**             | Projection + pre-inference pseudonymization are prompt-side and unbuilt; **re-fetch-over-retain** is consistent with backend's not-system-of-record posture. TTL/retention classes — PLANNED (ties to `data-model-v1.md` §4).          |
| 9.3 Sub-processors + DPAs + Google Limited-Use flow-down                                                | **PLANNED**             | LLM provider is a sub-processor requiring no-training DPA; aggregators see full-fidelity C5. Companion §4.3–4.4.                                                                                                                       |
| 9.4 Regional / deletion (DPF/SCCs, RFC 7009 revocation, crypto-shred, webhook-driven customer deletion) | **PARTIAL / PLANNED**   | **Company-portability export/import with secret scrubbing EXISTS** (AO §7) as the deletion substrate; crypto-shred assumes per-tenant DEKs (see §4.3 — currently per-instance master key); region pinning rides instance-per-customer. |
| 9.5 Model-training boundary                                                                             | **PLANNED**             | Per-tenant learning = threshold state only (ties to defined-learning-boundary requirement); Google Workspace no-training flow-down. Companion §5.                                                                                      |

**Flag:** the entire section is PLANNED privacy design layered on **two EXISTING** substrates — secret custody (SC §9) and company export/import (AO §7). The piece with no counterpart anywhere in the platform docs is **prompt-content privacy** (what customer PII the LLM sub-processor sees); that is exactly what the companion fills. Note the crypto-shred deletion story (§9.4/§7.1) depends on the per-tenant-DEK vault design of §4.3, which is not the current per-instance-master-key implementation.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://wiki.fridays.bot/documentation/white-paper/9.-data-handling-and-privacy.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
