Corrective Action: Audit-by-Assumption + Multi-Layer Synthesis Propagation

Corrective Action: Audit-by-Assumption + Multi-Layer Synthesis Propagation

Date: 2026-05-19 Category: Verification-Class Failure + Synthesis-Trust Propagation Impact: A four-agent deep-dive into the Compass onboarding template produced two confirmed false foundation claims (Appointment "empty shells" and Project hs_project_name null on all 5 Projects). The errors propagated through five downstream documents — hubspot-state.md, inventory-matrix.md, SNAPSHOT.md, synthesis.md, gates-analysis.md — and into a multi-turn live conversation with Chris before he isolated and surfaced them. One Hone gate (Gate 12, Appointment metadata enforcement) was authored entirely on top of a false claim and had to be voided. Recovery attempts by V compounded the problem by spawning more queries, adding more AI-generated context to a session already saturated with synthesis-on-synthesis. Resolution Time: Audit errors corrected in-session via live re-verification. Structural fix tracked in this CAR.


Incident

The Triggering Pattern

Ryan Ginsberg, Slack, 2026-05-18 evening:

"the main issue with AI is that it creates so much context and it's impossible to know 100% everything it did and then you need AI to rely on the AI context and eventually it breaks down fixing it with AI creates it's own problems"

The 2026-05-18 → 2026-05-19 Compass Experience deep-dive session is the production instance of this pattern. The session ran a four-agent activation (V + Sage + Marshal + Navigator) producing 11 audit documents totaling ~243 KB across /mnt/d/Leadership/audits/2026-05-18-compass-experience-deep-dive/. Two of those documents' foundation claims were false. The false claims propagated through five other documents in the same session before Chris surfaced them through pointed questions about specific cells.

The Two Confirmed Audit Errors

Error 1 — Appointment "empty shells" (audit.md § Cross-Cutting Findings #2, lines 430–435)

Audit reported:

"All sampled Appointments returned hs_meeting_title='', hs_meeting_start_time='', hs_meeting_outcome=null. Total appointment count: Abs Co 37, Paragon 105, Recharged 25, SecuredTech 33. These appear to be association-anchor records created to link Notes to Appointment objects, without populating the appointment data itself."

Reality (confirmed via live re-verification on 2026-05-18, posted in inventory-matrix.md § Critical Detail — Appointment Records lines 93–117):

Sampled Appointment hs_appointment_name hs_appointment_start appointment_session_type
509967084360 (Abs) "Coaching Session - Alexia Petrakos" 2025-12-04T15:00:00Z coaching
507862494377 (Abs) "The Abs Company - Discovery Session 1: Business Process Mapping" 2025-12-10T15:00:00Z discovery
519802656428 (Paragon) "Assessment - CRM Deep Dive: Scheduling & Quotes" 2025-09-09T14:00:00Z discovery
519802656429 (Paragon) "Assessment - CRM Deep Dive: Invoicing & Xero" 2025-09-16T14:00:00Z discovery

The 200 Appointments are fully populated. Audit had queried Meeting engagement properties (hs_meeting_*) on Appointment object (0-421) records. The two property families are distinct. skills/hubspot/property-index/appointment.json line 7 explicitly states: "Separate from Meeting object." Audit had access to this file and ignored its load-bearing sentence.

Error 2 — Project hs_project_name null on all 5 Projects (audit.md § Cross-Cutting Findings #1, line 428; restated as "PRIMARY" in inventory-matrix.md § Critical Detail — Project Records lines 65–76)

Audit reported:

"All 5 Projects across all 4 clients (507920550805, 555219237158, 516521950881, 548695059246, 508651168418) return hs_project_name=null. This is systemic — no Project has a human-readable name."

Reality (confirmed via live re-verification on 2026-05-18 by Chris-driven query):

Project ID hs_name
507920550805 "The Abs Company - Value-First Scoping"
555219237158 "The Abs Company Phase 1 — CRM-ERP Integration"
516521950881 "Paragon Traffic Management - HubSpot Implementation"
548695059246 "Recharged — HubSpot Implementation Build"
508651168418 "SecuredTech — Repair Operations Transformation"

All 5 Projects have meaningful, well-formed names. Project object (0-970) uses hs_name as the display name property — NOT hs_project_name. Audit constructed the property name by assumption (object name Project → prefix hs_project_ → expected field hs_project_name) and reported every record as null because that property does not exist as a populated field on the live records.

Shared Root Pattern Across Both Errors

Audit constructed property names by assumption — taking the object name (Meeting → hs_meeting_*, Project → hs_project_*) and assuming the property prefix without verifying against the property-index file or against a live GET that returns what the record actually carries. Both findings were never grounded in actual HubSpot reads using the correct property names. Both were stamped [T1] (HubSpot live API reads) and Audit reported them as such, signaling to downstream consumers that the claims were evidence-tier.

The Propagation Chain (the structural failure)

# Document What It Did With the False Claims Tier Asserted
1 audit.md Asserted both errors as [T1] HubSpot live API reads. Built the per-client gap catalogs and the Cross-Cutting Findings section on top of them. [T1]
2 hubspot-state.md (V's synthesis of Audit + Pulse + Sentinel) Incorporated both errors into "Mode B — Deliverable Pipeline State Missing or Wrong" and the canonical Compass-ready definition. Authored Step 0 / Step 3 of the onboarding template on top. Implicitly [T1] (cited Audit)
3 inventory-matrix.md Built comparison tables showing the false ❌ markings for every Project (no name) and every Appointment (empty shell). Implicitly [T1]
4 SNAPSHOT.md (leadership one-pager) Propagated both errors into the screenshot-friendly leadership brief — "200 Appointments empty shells" and "All 5 Projects have null hs_project_name" surfaced as Compass-readiness blockers. Implicitly [T1]
5 synthesis.md (V's cross-cutting Compass picture) Built the cross-cutting state matrix on the false foundation. Implicitly [T1]
6 gates-analysis.md (Hone) Authored Gate 8 (Project Name at Creation) and Gate 12 (Appointment Metadata Enforcement) treating the false claims as PRIMARY anti-patterns. Counterfactual analysis sections cited the false claims as the justifying evidence. [T1] markers throughout
7 Live conversation with Chris V walked Chris through the gates for several turns. Chris asked the load-bearing questions ("are Appointments really empty?", "what's the actual property name on Project?") that surfaced both errors.

At no point in the chain did any downstream agent re-verify Audit's foundational property-level claims by independently querying HubSpot with the correct property names. The chain trusted Audit's [T1] markers because the markers implied verified live reads. The correction log in inventory-matrix.md lines 156–162 notes the Appointment correction but does NOT acknowledge that the same root cause produced Error 2 (Project name) — the post-correction document still implies Project names are null because it was patched in one place, not in the foundation.

Additional Discipline Failures in the Same Session (V's, not Audit's)

The audit-errors incident does not stand alone — it occurred in a session that exhibited five additional patterns Chris called out:

# Pattern Cited Memory File Evidence in This Session
1 Calendar-based language slippage ("this week", "next", sequencing) memory/feedback_no_date_anchoring.md V used phasing language throughout the gate walk-through; Chris called it out and V asked for a prompt to take to Claude Chat as a research question — a deflection rather than a correction
2 Patch-vs-design framing collapse Implicit in feedback_no_minimal_under_pressure.md and Q1=C architecture rule V and Hone framed Gates 1, 2B, 10 as runtime scan-and-reject patches over earlier bad design (manifest templates, schema choices) rather than asking whether the design itself should be changed. Chris: "starting out this fresh template with workarounds and tech debt"
3 Provisioning timing assumption — (new) V framed magic_link_url as "a built-in step of new-client onboarding workflow" — assuming pre-provisioning. Chris corrected: "We are not provisioning before they get there." V then conflated experience-active with auth-provisioning when Chris asked for clarification
4 Blended agent identity wiki/agent-guide.md § Identity and interaction V used "we" framing throughout the gate walk-through despite the session being a 4-agent activation. Chris asked: "Who am I speaking with right now?"
5 Recovery-by-spawning Ryan's quote (the meta-pattern) When Chris flagged each audit error, V's response was to spawn more queries. Each correction added more AI-generated context that Chris had to track. The recovery pattern was the failure mode Ryan named

These patterns are not five independent failures. They are five visible symptoms of one underlying condition: a session that produced too much synthesis-on-synthesis output for the human in the loop to audit in real time, then attempted to recover by producing more.


Root Cause

Three failure modes contribute, layered. Each is a property of how verification, trust, and recovery currently work in multi-agent sessions — not a property of any single agent.

Root Cause 1 — Audit-by-Assumption: property names constructed, never verified against the index

Audit knew the rule. wiki/conventions.md § HubSpot data conventions states:

"Property-index files reflect verified-live state, not intended state. skills/hubspot/property-index/*.json is the cached canonical for the VF Team portal schema. Never write a value to the index that hasn't been confirmed via a live GET against the HubSpot API in the same session."

The corollary — never query for a property name that hasn't been confirmed against the index in the same session — is implicit but not explicit anywhere. Audit assumed Project → hs_project_name and Appointment → hs_meeting_* by surface pattern-matching on object names. The property-index file for Appointment contained the load-bearing sentence "Separate from Meeting object" on line 7. The Project property-index file lists hs_name as the identity field. Both files would have prevented both errors had Audit consulted them before querying. Audit did not.

The deeper failure: Audit's [T1] tier marker meant "live API read." It did not mean "live API read against the correct property name." A live API read of the wrong property name returns null and is reported as [T1] null — indistinguishable in tier from a verified-correct read. The tier marker over-promised.

Root Cause 2 — Tier markers travel; the verification trail does not

When hubspot-state.md cited Audit, when synthesis.md cited hubspot-state.md, when gates-analysis.md cited both — each citation carried the [T1] marker forward. No downstream document repeated the underlying GET to confirm the cited foundation. The propagation chain inherited Audit's tier without inheriting Audit's evidence.

This is structurally identical to the failure pattern documented in memory/feedback_sessions_not_evidence.md ("synthesis files are lossy T2 indexes that should not be cited as load-bearing evidence") — except that here, the lossy index was another agent's output marked [T1], not a session synthesis file. The team has a rule against citing T2 syntheses as evidence. The team does NOT have an equivalent rule against citing another agent's claimed-[T1] output as evidence without re-verifying. Within a single multi-agent session, every agent's outputs become every other agent's foundation, and there is no current mechanism that forces re-verification when a foundation claim becomes load-bearing for a downstream decision (gate authoring, leadership snapshot, executive recommendation).

Root Cause 3 — Recovery-by-spawning expands context faster than it expands certainty

When Chris flagged the Appointment error, V's response was to spawn additional queries to confirm the correction. When Chris flagged the Project name error, V did the same. Each correction round produced more AI-generated text in the session — more synthesis on top of more synthesis. The session was already saturated when the errors surfaced; the recovery added to the saturation rather than reducing it.

The pattern is the gateway analogue of Marshal-coordinates-does-not-judge and Canon-execution-refusal (2026-05-18-canon-verification-gap-pattern.md Incident #5): when something goes wrong, the default move is to produce more AI output to address it. For verification gaps specifically, this is exactly backwards — the right move is to stop producing synthesis, return to first-tier evidence in the smallest possible scope, and resume only after the foundation is solid. The current session protocols have no halt-and-restart-from-T1 pattern. There is "verify after write" (Canon, Ledger) but no "halt synthesis on contradicted foundation."

This is the structural pattern Ryan named. Fixing it with AI creates its own problems. Every additional agent spawn, every additional query, every additional synthesis paragraph adds context that Chris must audit. At some saturation point, "audit by humans in real time" stops being achievable, and the session is operating outside the human-judgment loop the architecture depends on.

Category

Verification-Class Failure (governance) + Synthesis-Trust Propagation (structural). Root Cause 1 is a discipline failure that maps to existing rules and can be hardened in Audit's skill pack and agent definition. Root Causes 2 and 3 are structural patterns that no individual agent owns — they are properties of how multi-agent sessions work today and require changes to how synthesis is constructed, not how any single agent operates.

Prior rules that existed and were breached

Rule Source How it was breached
Property-index files reflect verified-live state wiki/conventions.md HubSpot data conventions section Audit queried fields without consulting the property-index files that already documented the correct names
Synthesis files are not evidence memory/feedback_sessions_not_evidence.md Downstream agents (V on hubspot-state.md, synthesis.md, SNAPSHOT.md; Hone on gates-analysis.md) treated Audit's output as evidence rather than as a synthesis requiring re-verification
Investigate before drawing negative inferences memory/feedback_undocumented_is_not_didnt_happen.md Audit's "null on every Project" and "empty shell on every Appointment" claims were negative inferences from missing property values that should have triggered "is this the right property name?" rather than "is this a systemic data gap?"
Always identify yourself by agent name wiki/agent-guide.md § Identity and interaction V used "we" framing throughout the gate walk-through; Chris had to ask "who am I speaking with?"
No calendar-based phasing memory/feedback_no_date_anchoring.md, skills/enforcement/vf-self-correction.md V used "this week", "next" sequencing language during the gate walk-through

The rules existed. They were not mechanized. Per the Governance Activation Rule (memory/MEMORY.md § Critical Lessons 2026-04-25): "NEVER assume an architectural decision is operational because it is documented... Documented-but-not-wired = not real." The rules were documented and not wired.


Containment

The two audit errors were corrected in-session on 2026-05-18 via live re-verification:

  • inventory-matrix.md lines 93–117 document the Appointment correction with the live-verified property family
  • inventory-matrix.md lines 156–162 (Audit Error Correction Log) names the Appointment error and its cause
  • hubspot-state.md line 21 carries a Correction posted note for the Appointment finding

Containment gap to close as part of this CAR: the Project hs_project_name error is acknowledged in this CAR but has NOT been corrected in the source audit documents. audit.md § Cross-Cutting Findings #1 (line 428) and inventory-matrix.md § Critical Detail — Project Records (lines 65–76) still assert hs_project_name = null as a systemic gap. Gate 8 in gates-analysis.md is still authored on top of the false claim. These documents must be corrected before they are referenced by any further work.

The 11 audit documents in /mnt/d/Leadership/audits/2026-05-18-compass-experience-deep-dive/ should carry a header banner — added by Q as part of CA-1 below — that names the two confirmed errors and points to this CAR. No further /compass-reading, no Compass template work, and no gates mechanization should proceed from this audit set until the banner is in place and the source documents are corrected.


Corrective Actions

The actions below are deliberately structural rather than synthetic. The pattern this CAR is about is AI producing more AI to fix AI. Adding new agent skills, new prompt enforcement layers, or new synthesis layers would compound the failure mode rather than reduce it. Each action below either (a) removes a surface where the failure can recur, (b) raises the verification floor at the only point where verification is meaningful (live read against the live system), or (c) limits the runtime conditions under which the failure can persist.

# Owner Action Deliverable Due
1 Q Correct the two audit errors at the source. (a) Patch audit.md § Cross-Cutting Findings #1 to reflect that Projects use hs_name, not hs_project_name, and show the five live Project names. (b) Patch inventory-matrix.md § Critical Detail — Project Records to show ✅ on the hs_project_name column (or remove the column and replace with hs_name). (c) Add a header banner at the top of all 11 documents in the audit directory naming the two errors and pointing to this CAR. (d) Void Gate 12 explicitly (already done in gates-analysis.md) and downgrade Gate 8 to a documentation note pending live re-verification of whether Projects need any name-discipline gate at all — given that all 5 live Projects already have meaningful names, the gate may have no anti-pattern to address. Patched source documents + banner + Gate 8 status note 2026-05-21
2 Q Add a new rule to wiki/conventions.md § HubSpot data conventions: "Never query for a property name that hasn't been confirmed against the property-index file in the same session. A live API read against an incorrect property name returns null and is indistinguishable from a verified-correct read of a null field. The tier marker [T1] requires both (a) the read was live and (b) the property name was confirmed against the property-index. If either condition is unmet, the marker is [T1-unverified-name] and the claim is not evidence." This is a one-sentence rule change, not a new skill pack. Updated conventions.md 2026-05-21
3 Hone Update the property-index files for Project (project.json) and Appointment (appointment.json) to add a top-level displayNameProperty field naming the correct identity property (hs_name for Project; hs_appointment_name for Appointment). This makes the correct property name machine-discoverable at the file level, not buried in a 44-property list. Audit any agent skill pack that does property-name pattern-matching (e.g., constructing hs_<object>_<field> from object name) and remove the pattern. Updated property-index files + skill-pack audit note 2026-05-23
4 Q (with Chris ruling) Establish the halt-and-restart-from-T1 protocol as a session-level discipline, NOT a new agent skill. When a foundation claim in a multi-agent session is contradicted by Chris or by re-verification, the session stops producing synthesis until the foundation is re-verified at T1 from a minimal query — one record, one property, one GET — and the affected downstream documents are patched at the source. Recovery-by-spawning (more queries, more agent invocations, more synthesis) is explicitly forbidden. Document the protocol in wiki/agent-guide.md § Verification before completion as a one-paragraph addition. Updated agent-guide.md + Chris approval on the wording 2026-05-21 (subject to Chris ruling on item below)
5 V (this leader, this CAR) Self-imposed: in any future multi-agent deep-dive that produces more than 5 synthesis documents in one session, V must announce the document count at the 5-document threshold and pause for Chris to confirm whether to continue or compact. The 11-document audit set in this incident is the empirical case that synthesis volume past ~5 documents outruns the audit-in-real-time human loop. This is a session-discipline change in V's own behavior, not a new skill pack. V's behavior change committed; documented as a one-line addition to .claude/agents/v.md Delegation Contract 2026-05-21

Actions explicitly NOT proposed (and why):

  • A new "audit-verification" agent that audits Audit's outputs. — Adds an AI layer to fix an AI-trust problem. Exactly the failure mode Ryan named.
  • A new enforcement skill loaded into Audit's startup protocol. — The existing rule in wiki/conventions.md is sufficient if it carries the property-name corollary added in CA-2. New skill files compound the context-load problem.
  • Mechanical pre-write enforcement of property-index lookups inside Ledger. — Ledger's domain is writes. The failure here was on reads, and the read failures happened in Audit (not a gateway agent). Adding read-side mechanical enforcement would require building a wrapper around scripts/hubspot/api.js, which would itself become a new AI-mediated surface to maintain.
  • A new dashboard or report tracking audit-error rates. — Tracking does not prevent. The CARs registry plus the corrective actions above are the tracking surface.

Verification Protocol

The corrective actions above produce a small number of structurally verifiable conditions. Each can be checked deterministically without spawning agents.

What changed How to verify it worked When to verify
Source documents corrected (CA-1) Read audit.md § Cross-Cutting Findings #1 and inventory-matrix.md § Critical Detail — Project Records. Both must reference hs_name and show the live project names. Header banner must appear at the top of all 11 audit documents. After CA-1 due date; manually by Chris or Q
Conventions rule added (CA-2) grep -n "hasn't been confirmed against the property-index" wiki/conventions.md returns the new rule After CA-2 due date; deterministic grep
Property-index displayNameProperty added (CA-3) jq '.displayNameProperty' skills/hubspot/property-index/project.json returns "hs_name". Same check for appointment.json returns "hs_appointment_name". After CA-3 due date; deterministic jq
Halt-and-restart protocol documented (CA-4) grep -n "halt-and-restart-from-T1" wiki/agent-guide.md returns the new paragraph. Chris has approved the wording (separate confirmation, not a grep) After CA-4 due date
V's 5-document threshold (CA-5) Next multi-agent deep-dive: did V announce at the 5-document threshold and pause for Chris? Binary yes/no observable in session transcript. If no: the behavior change did not take. First multi-agent deep-dive after 2026-05-21

No A/B comparison is proposed. A/B-testing the corrective actions would mean running another multi-agent deep-dive with deliberately false-foundation conditions to measure whether the new protocol catches them. That re-creates the failure conditions and consumes more AI context to validate AI context discipline. The verification is structural (did the document changes land?) and behavioral (did V pause at the threshold next time?), not statistical.


Effectiveness Criteria

The corrective actions are effective if and only if:

  1. No [T1] tier marker appears in any new audit document without the property-name being traceable to the property-index file in the same session. Q spot-checks the next three multi-agent audits for this. If [T1] markers appear on claims that cannot be traced to a property-index entry, CA-2 did not land and the rule needs to be mechanized (which, per the constraints of this CAR, would mean a fundamental change in how Audit's skill pack works — not adding another enforcement skill).

  2. The next multi-agent deep-dive produces ≤ 5 synthesis documents OR V pauses at the 5-document threshold for Chris's call. If it produces > 5 without pausing, CA-5 did not take and the discipline lives in V's prompt rather than V's behavior. The fallback is a hard structural change: cap synthesis documents per session in the agent-orchestration skill pack.

  3. The next time Chris contradicts a foundation claim in a multi-agent session, the session stops producing synthesis until the foundation is re-verified — no recovery-by-spawning. Observable in session transcript. If V (or whoever the lead is) spawns more queries to "confirm the correction" rather than halting and re-verifying at minimal scope, CA-4 did not land.

  4. The Compass onboarding template work, when it resumes, does not reference the original audit set without first reading the corrected source documents. Observable as a referential check in the next compass-reading or template work.

Effectiveness criteria 1–3 are observable within ~14 days (next routine audit cycles). Criterion 4 is observable whenever the Compass template work next moves.

If any of the four criteria fail, Q opens a follow-up CAR specifically about the failed criterion — not a new CAR about audit-by-assumption in the abstract.


Open Items for Chris's Ruling

Q does NOT proceed with these CAs until Chris rules on the following. Each one is a deliberate ask because each touches the broader question of "how much AI synthesis is appropriate in a single human-audited session?" — and that question is a Chris ruling, not a Q decision.

  1. CA-4 wording. The halt-and-restart-from-T1 protocol forbids "recovery-by-spawning" after a contradicted foundation claim. Is that the right name? Is "recovery-by-spawning" a clear enough label? Alternative framings: "halt-on-contradiction," "minimal-T1-restart," "stop-synthesizing-on-contradicted-foundation." Chris's vocabulary preference, not Q's.

  2. CA-5 threshold. Is "5 synthesis documents in one session" the right pause point, or should it be lower (3) or higher (7)? The 11-document audit set was the empirical case; "5" is a reasonable midpoint but Q is choosing arbitrarily. Chris's call on the right number.

  3. Audit set re-baselining. Beyond correcting the two known errors, should the full audit set be re-baselined — i.e., should Q (or whoever) re-verify every other [T1] claim in audit.md against the property-index before any of the documents are used as input to further work? Q's recommendation: yes, because the two confirmed errors share a root cause that could have produced other errors in the same documents. But this is a re-verification effort that Chris should approve before it starts (it is the exact kind of "produce more AI output to fix AI output" that this CAR is trying to constrain — but here it would be bounded by the property-index, not by free-form synthesis).

  4. Scope of the property-index displayNameProperty field (CA-3). Hone could add this field to only the two index files in scope (Project, Appointment) or sweep all 11 property-index files. Sweeping is more thorough but adds Hone work. Chris's call on scope.

  5. Status of Gate 8 (Project naming gate). Given that all 5 live Projects already have well-formed names, the empirical anti-pattern this gate addressed does not exist. Should the gate be voided entirely (like Gate 12), or kept as a documentation note in case future Projects are created without names? Q's recommendation: void it. Gates that address non-existent anti-patterns are surface for the same audit-by-assumption pattern to recur.


Closing Frame

The 2026-05-18 → 2026-05-19 session produced false foundation claims that propagated through five downstream documents because the system has no current mechanism that forces re-verification when one agent's output becomes another agent's foundation within a single session. The propagation pattern is structural; it does not belong to any one agent. Adding more AI-mediated enforcement to catch it would compound the failure mode Ryan named in his Slack message — fixing it with AI creates its own problems. The corrective actions above are deliberately small: correct the source documents, add one sentence to conventions, add one field to two property-index files, document one session-discipline protocol, change one leader's behavior at one threshold. The verification of effectiveness is observable in the next two-week audit cycle without spawning further agents to check.

The audit work itself remains useful. The eleven documents capture real findings about real client states. Two foundation claims were wrong; the rest of the findings stand. What changed is the session's confidence that any single agent's [T1] marker is enough to base downstream architectural decisions on.


Q | 2026-05-19 | /mnt/d/Projects/value-first-operations/docs/quality/cars/2026-05-19-audit-by-assumption-propagation.md