Corrective Action: Compass /data-model Page Declared "Engagement Complete" Before Visual Craft Matched the GoRout Reference

Corrective Action: Compass /data-model Page Declared "Engagement Complete" Before Visual Craft Matched the GoRout Reference

Date: 2026-05-15 (Friday) Category: Governance Activation Failure — visual-fidelity gate documented in 5P/PRD prose but never wired into Blueprint deliverable shape, Mirror QA rubric, Q audit framework, or the orchestrator's "complete" call. (7th occurrence of the pattern; see 2026-04-25-corrective-dewey-registrar-activation.md for category history.) Impact: After 7 waves of multi-agent dispatch with PASS-WITH-ISSUES from Mirror Wave 4 (commit bc3243256), Mirror Wave 5 (commit 0a71372cd), and Q Wave 4 (commit 102c5104f) — V called "Engagement Complete." Chris opened the page, opened GoRout next to it, and named the failure: "It reads like a bunch of markdown inside of large section cards. Hard not to be frustrated right now when we go through the effort of a 5p plan and PRD and are nowhere close to the DoD when I am told Engagement Complete." (2026-05-15) Resolution Time: Wave 7a visual rescue mission dispatched in parallel to this CAR. CAR captures the process failure; Wave 7a captures the rebuild plan. Both land independently. Co-authored: Q (lead) + V (accountable orchestrator)


Decision Summary

  • CAR file path: /mnt/d/Projects/value-first-operations/docs/quality/cars/2026-05-15-compass-data-model-premature-complete.md
  • Root cause: The audit chain validated the spec, not the experience. Every gate downstream of Blueprint compared the build to text-derived specifications that never carried fidelity-to-the-rendered-reference forward — and the orchestrator declared complete on the agent rubrics passing without ever opening the live page next to the reference.
  • Accountability scope: 6 agents named (Blueprint, Showcase, Mirror, Q, 5P/PRD authors); V explicitly accountable as orchestrator.
  • Corrective actions proposed: 5 gate changes (PRD-must-name-reference, Blueprint reference-derived deliverable, Mirror side-by-side as mandatory dimension, Q visual-quality-vs-reference audit dimension, V "open both surfaces" gate before complete) + 1 framework promotion (Q QMS adds visual-craft audit category).
  • Severity classification: Tier 1 process failure (client-facing visual-craft surface; trust-impact incident; commissioned via 5P+PRD with named DoD).

Incident

What Happened

Chris commissioned the Compass /data-model page on 2026-05-14 via /5p-plan (commit e55bc459) followed by /prd-generate. The 5P named GoRout (apps/gorout-walkthrough/public/index.html) as the visual quality reference. The PRD's Definition of Done included three criteria, the third of which was "(c) GoRout-comparable visual quality with first-glance state disambiguation."

Execution ran across seven waves on 2026-05-14 → 2026-05-15:

  • Wave 1 (Blueprint) — ERD, layout mockup, current-vs-future spec
  • Wave 2a (Mirror dispatch + Showcase content mining)
  • Wave 2b (Showcase Listing creates + property spec)
  • Wave 3 (Showcase build, V2 monorepo commit e515d29)
  • Wave 4 (Mirror QA + Q audit + Hone cross-reference sweep)
  • Wave 5 (Showcase Sev-1 fix + Mirror verify, V2 commit 05fe046)
  • Wave 6 (cleanup + close)

Functional outcome at end of Wave 6: page renders FSDM (10/10 sections) + CSB (15/15 sections), pulls live current-state architecture from HubSpot, all four declared render states reachable, sticky sidebar works, mobile section nav present, zero console errors. Mirror Wave 5: PASS-WITH-ISSUES (one persisting Sev-2 mobile overflow). Q Wave 4: PASS-WITH-ISSUES (four Sev-3 process-tracking gaps).

V synthesized those agent reports and called "Engagement Complete."

The next morning, Chris opened https://compass.valuefirstteam.com/abs-company/data-model next to https://gorout-walkthrough.vercel.app/ and rejected the work. Direct quote (2026-05-15):

"It reads like a bunch of markdown inside of large section cards. Hard not to be frustrated right now when we go through the effort of a 5p plan and PRD and are nowhere close to the DoD when I am told Engagement Complete."

The page rendered the right data, in the right structural skeleton, with the right colors. It did not deliver the visual craft of the reference. No layer of the audit chain caught it because no layer was looking for it.

Timeline of Failure (per decision point)

Decision Point 1 — 5P Performance Section

File: docs/plans/compass-data-model-page-5p-plan.md lines 130-153.

What the DoD said about visual fidelity:

"GoRout-comparable visual quality — page matches or exceeds the GoRout walkthrough on typography hierarchy, color contrast, spacing, dark-theme aesthetic, and responsive layout. Verified by Mirror side-by-side screenshot review against apps/gorout-walkthrough/public/index.html rendered output, and Q audit." (line 139)

What was missing: The DoD named the dimensions (typography, color, spacing, dark-theme, responsive) but never required (a) the rendered GoRout output to be the spec input for Blueprint, (b) Mirror's side-by-side screenshot comparison to be a hard PASS/FAIL gate as opposed to a checked dimension, or (c) "looks comparable" measurability beyond Mirror's own subjective judgment. "Verified by Mirror" with no anchor sample to compare against = unverifiable.

Caught the gap? No.

Decision Point 2 — PRD FR-4, FR-5, Section 6 Performance

File: docs/plans/compass-data-model-page-prd.md lines 67-81, 222-224.

FR-4 acceptance criteria (line 70-73):

"- Blueprint produces an explicit current-vs-future contrast specification before Showcase begins implementation.

  • Mirror's Playwright review records that the contrast specification is met in the rendered page.
  • Q audit confirms first-glance disambiguation through the audit's evidence record."

This is a check-against-spec test, not a check-against-reference test.

FR-5 acceptance criteria (line 79-81):

"- Mirror's Playwright side-by-side screenshot review of the deployed page and the GoRout walkthrough records a match-or-exceed judgment across the five dimensions named above.

  • Any visual pattern the Showcase build identifies as missing from the V2 Compass Tailwind/CSS foundation is added pre-flight, before claiming FR-5 acceptance."

FR-5 was the only requirement that named a side-by-side screenshot review against the rendered GoRout output. Mirror Wave 4 did not perform it. Mirror Wave 5 did not perform it. Neither report contains the word "GoRout." (Verified by grep -in "gorout" docs/plans/compass-data-model-page-blueprint/07-mirror-qa-report.md — zero hits; same for 09-mirror-qa-verify.md.) The acceptance criterion existed in the PRD but did not propagate into Mirror's QA rubric — and no gate caught the omission.

Section 6 Performance criteria 2 and 3 (lines 223-224) are restatements of FR-4 and FR-5; they inherit the same gap.

Caught the gap? No.

Decision Point 3 — Blueprint Wave 1 Specs

Files: 01-erd-data-flow.md, 02-layout-mockup.md, 03-current-vs-future-spec.md.

Deliverable shape: ASCII wireframes + Mermaid ERD + named hex tokens + GoRout CSS variable lookups copied verbatim into a "Grid and Spacing System (GoRout-Exact)" table.

Was the GoRout source file read? Yes — 02-layout-mockup.md line 4 cites it as the quality reference, line 167 names a "GoRout-Exact" token table, lines 169-186 list 14 design tokens copied from the GoRout :root block (background colors, text colors, body font, body size, line height, sidebar width, padding values, monospace font). Blueprint did read the source file. What Blueprint did not do was carry the rendered visual experience forward — it carried tokens and structural patterns forward.

The structural gap: A 17px body font on a #0e1116 background with cyan accents and a sticky sidebar can look like GoRout, or it can look like a markdown dump in colored boxes. The same tokens produce both outcomes. What separates them is the typographic rhythm inside the content blocks — heading hierarchy density, paragraph length and breaks, inline code styling, callout treatment, list styling, link treatment, body-copy texture. None of that lives in the token table. Blueprint specified a frame; the frame got built; the frame is empty of the typographic craft that makes GoRout read as a document and not as a CMS dump.

The PlanSection contract Blueprint authored (Region 5, lines 137-153 of 02-layout-mockup.md):

"Each tile: background: #161a22; border: 1px solid rgba(245,158,11,0.35); border-left: 3px solid #f59e0b; border-radius: 8px; padding: 16px 20px 'PLANNED' badge top-right Tile title: from Listing.hs_title, font-weight: 600; color: #f5f7fa Body text: from Listing.hs_body, color: #9aa4b2; font-size: 0.92rem"

That spec is what Showcase built. The body contract is "render hs_body text in muted color at 0.92rem." There is no contract for what to do when hs_body is a multi-paragraph prose document with internal headings, lists, code, and callouts. The result is what Chris saw: prose text dumped into a dark colored box.

Caught the gap? No. Blueprint authored a spec that under-specified the document-rendering layer because the GoRout reference was processed through a tokens-and-structure abstraction that elided the typographic craft.

Decision Point 4 — Showcase Wave 3 Build

File: /mnt/d/Projects/VFT_Platform/2026_VFT_Platform_Infrastructure/apps/sites/compass-valuefirstteam/src/components/data-model/PlanSection.astro.

What was built: Per the Blueprint contract — <section data-state="future"> wrapping a <header> with H3 + meta line, then <article data-state="future"> per property, each with H4 label + property name + amber "Planned" badge + body rendered as:

{row.body ? (
  <div class="mt-4 whitespace-pre-wrap break-words text-sm leading-relaxed text-compass-mute">
    {row.body}
  </div>
) : (
  <p class="mt-4 text-sm italic text-compass-mute/80">
    Not yet authored. This section will appear here once the
    corresponding property is set on the HubSpot Listing.
  </p>
)}

whitespace-pre-wrap of raw textarea content in a single <div> at text-sm muted color. No markdown rendering, no internal heading hierarchy, no inline code styling, no list treatment, no callouts, no typographic rhythm. This is exactly the contract the Blueprint spec defined.

Was Showcase given latitude it didn't use? Two ways to read this. First reading: Showcase built faithfully to the Blueprint spec; the spec was the failure point. Second reading: Showcase, when receiving a brief that says "render hs_body text in muted color at 0.92rem" for a property that contains 4 paragraphs of prose with internal structure, had the standing to push back — "this contract is going to render the FSDM Executive Summary as a wall of muted text; the GoRout reference treats prose with typographic structure; the brief is missing a document-rendering layer." Showcase did not push back.

The second reading is the right one. Showcase is not a thoughtless executor; the agent definition includes craft expectations. But the system did not have a gate that asked Showcase "did you flag any spec gaps before building?" — so the Showcase silence read as agreement.

Caught the gap? No. Showcase built the spec faithfully and did not flag that the spec was thin on document-rendering craft.

Decision Point 5 — Mirror Wave 4 QA

File: docs/plans/compass-data-model-page-blueprint/07-mirror-qa-report.md.

Rubric used (per the report's Decision Summary, lines 12-21):

Signal Result
Auth verdict AUTHENTICATED-AND-WALKED
Page render PASS-WITH-ISSUES
State (b) prose-only verified YES
Visual disambiguation PASS
Defects — Severity 1 1 (FSDM/CSB Listings not rendering — content gap)
Defects — Severity 2 1 (mobile horizontal overflow)
Defects — Severity 3 2 (small tap targets, no mobile TOC)

Did Mirror open the GoRout page during QA? No. The word "gorout" does not appear in the Wave 4 report. The "Visual disambiguation" check (lines 73-94) verified the dual-signal rule from Blueprint 03 (teal vs amber vs purple per data-state attribute). It compared the page against Blueprint 03's spec, not against the rendered GoRout output. PRD FR-5's "side-by-side screenshot review against apps/gorout-walkthrough/public/index.html rendered output" was an explicit acceptance criterion that Mirror's rubric did not include.

Caught the gap? No. Mirror's rubric inherited Blueprint's spec but did not inherit the PRD's reference-comparison acceptance criterion.

Decision Point 6 — Mirror Wave 5 Verify

File: docs/plans/compass-data-model-page-blueprint/09-mirror-qa-verify.md.

Scope: Verified the Wave 4 Sev-1 fix (FSDM/CSB now rendering) plus Sev-3 fixes (tap targets, mobile section nav). Sev-2 mobile overflow still present.

Did Mirror open the GoRout page during this verify pass? No. Same rubric as Wave 4. Same omission. The report celebrates that "FSDM (Listing 556337927839) … H4 labeled cards: Executive Summary, Object Architecture, Product Catalog, Commerce Flow, Pipeline Architecture, Association Model, Contact & Contributor Model, Intelligence Layer, Property Governance, Migration Considerations (10 total — matches '10 of 10 sections populated')" — content presence was verified; visual craft of the rendering was never compared to the reference.

Caught the gap? No.

Decision Point 7 — Q Wave 4 Audit

File: docs/quality/audits/2026-05-14-compass-data-model-page.md.

Audit dimensions enumerated (sections of the audit):

  • Code audit (V2 Compass codebase) — F-1 through F-11: hardcoded slugs, auth gate inheritance, substrate reuse, signature alignment, defensive failure handling, render-state coverage, type safety, empty-state explicitness, hubspotRecordId resolution, forbidden-language audit, no-subway-map-language audit
  • HubSpot writes audit — F-12 through F-15: Listing creates, stamp protocol, property-index integrity, GAP/APPROXIMATED reconciliation tracking
  • Documentation audit — F-16, F-17: wave numbering, file presence
  • Process / SOP audit — F-18 through F-21: Mirror Compass auth SOP, Marshal-skipped orchestration, PRD corrections, automated test coverage

Was visual-quality-vs-reference among the audit dimensions? No. The Q audit framework as it exists today has a code-correctness lens, a HubSpot-data-correctness lens, a documentation-completeness lens, and a process-trail lens. It does not have a visual-craft lens. There is no "did the rendered page match the visual reference the PRD named" question in the framework.

Why not? The QMS framework at docs/quality/qms-framework.md has not declared visual-craft-vs-reference as an audit category. Q's audit packs cover code, data, governance, process — not the rendered visual experience as compared to a named reference. The pack assumed "Mirror does visual," and Mirror's rubric assumed "Q audits the framework." Each layer trusted the other to own a dimension neither owned.

Caught the gap? No. Q audit framework gap.

Decision Point 8 — V's "Engagement Complete" Call

Where: Conversation 2026-05-14 → 2026-05-15. After Mirror Wave 5 returned PASS-WITH-ISSUES (Sev-2 only) and Q Wave 4 returned PASS-WITH-ISSUES (Sev-3 only), V synthesized those reports into a closing summary and called the engagement complete.

What V did not do: Open https://compass.valuefirstteam.com/abs-company/data-model and put it side-by-side with https://gorout-walkthrough.vercel.app/. Did not visually verify the surface against the reference the kickoff prompt named. Trusted the agent rubrics.

What V should have done: Opened both surfaces. Looked at them. Asked the question: "would Chris approve this side-by-side?" That gate is the orchestrator's, not the specialists'. The specialists check what their rubrics say to check; the orchestrator checks the whole. V did not do that.

Caught the gap? No — and this is the gate that should have caught all the upstream gaps.


Five Whys / Root Cause

Surface failure: The Compass /data-model page renders FSDM and CSB prose as whitespace-pre-wrap text dumps inside large section cards instead of a typographically crafted document like GoRout.

Why 1: The PlanSection component's body contract was <div whitespace-pre-wrap text-sm text-compass-mute>{row.body}</div> — no markdown rendering, no internal heading treatment, no list/code/callout styling.

Why 2: The Blueprint spec (02-layout-mockup.md Region 5) defined the body contract as "Body text: from Listing.hs_body, color: #9aa4b2; font-size: 0.92rem" — a one-line treatment for what would be multi-paragraph prose with internal structure. Showcase built exactly what was specified.

Why 3: Blueprint authored that contract by abstracting the GoRout reference into "tokens and structure" — colors, spacing, layout grid, sidebar pattern, sticky behavior — and treating the typographic craft of GoRout's document body as out of scope. The "GoRout-Exact" token table on lines 167-186 of the layout mockup carries 14 design tokens but no document-rendering pattern.

Why 4: Blueprint's deliverable shape (ASCII wireframes + Mermaid ERDs + named hex token tables) does not have a slot for "document-body rendering pattern derived from the reference." The deliverable shape itself is built around structural and chromatic specs. Typographic craft of body content lives outside the slots Blueprint fills. So when Blueprint reads a rendered reference, the parts of the reference that fit Blueprint's deliverable slots get carried forward; the parts that don't fit get elided.

Why 5: No gate in the chain (PRD acceptance criteria, Blueprint deliverable spec, Showcase build brief, Mirror QA rubric, Q audit framework, orchestrator "complete" call) requires comparing the rendered build to the rendered reference. The whole audit chain validates the build against text-derived specifications. The reference is named in prose at the top of the chain, processed through Blueprint's text-derived abstraction, and never compared back to the live reference at any downstream gate. The chain validates the spec, not the experience.

Why 6 (the systemic cause): Visual-craft surfaces require a different audit topology than code or data surfaces. Code audits work by checking the code against the code-spec; data audits work by checking the data against the data-spec; visual-craft audits do not work by checking the rendered output against the rendered-output-spec — because text cannot fully spec a rendered-output. Visual-craft audits work by checking the rendered output against the rendered reference. The QMS does not currently model visual-craft as a distinct audit topology. It treats visual surfaces as data surfaces with a screenshot attachment.

Root cause statement (one sentence):

The audit chain validated the spec, not the experience. Every gate downstream of Blueprint compared the build to text-derived specifications that never carried fidelity-to-the-rendered-reference forward — and the orchestrator declared complete on the agent rubrics passing without ever opening the live page next to the reference.

This is a Governance Activation Failure of the same shape as the six prior cases tracked in the Apr 25 Dewey CAR (2026-04-25-corrective-dewey-registrar-activation.md Related Incidents). The visual-fidelity-to-reference rule was documented in the PRD's FR-5; it was never wired into Blueprint's deliverable shape, Mirror's QA rubric, Q's audit framework, or the orchestrator's complete-call protocol. Documented-but-not-wired = not real. Seventh occurrence in seven weeks.


Accountability per Agent

V — Orchestrator (accountable for the "complete" call)

What V did: Commissioned the work via /5p-plan + /prd-generate. Dispatched 7 waves of multi-agent work. Synthesized Mirror Wave 5 PASS-WITH-ISSUES + Q Wave 4 PASS-WITH-ISSUES into a closing summary. Called "Engagement Complete."

What V did not do: Open the live page next to the live reference before declaring complete. The kickoff prompt named GoRout as the visual reference. The PRD acceptance criterion FR-5 named the side-by-side comparison. Both were V's signals to do the comparison personally as the orchestrator's verification of the synthesis. V did not do the comparison.

Where V fell short: V trusted the agent rubrics and the synthesis. The orchestrator's job is to verify the whole — the specialists check what their rubrics say to check; the orchestrator checks whether the rubrics themselves were sufficient. The rubrics were not sufficient (no layer of the chain was looking at visual fidelity to the named reference), and V did not catch that the rubrics were not sufficient.

Corrective action specific to V:

  1. Add to V's identity prompt and to skills/agent-orchestration/: before declaring "Engagement Complete" on any visual-craft surface, V must open the deployed surface next to the named visual reference in browser windows side-by-side and confirm parity in writing. No exceptions. The agent rubrics never substitute for this gate; they precede it. V's "complete" call carries this gate as a precondition.
  2. Add to the standard close-out checklist: "Named visual reference in 5P/PRD — opened side-by-side with deployed surface — parity confirmed (Y/N + 1-sentence judgment)." Refusal to confirm parity = engagement is not complete.
  3. V owns the proof of the side-by-side comparison: a screenshot pair attached to the closing summary, or an explicit written judgment ("opened both at 2026-05-15 18:42 CDT; deployed surface matches reference on dimensions A, B, C; falls short on dimensions D, E; therefore not complete; dispatching Wave 7a").

Blueprint

What Blueprint did: Wave 1 — produced the ERD (01-erd-data-flow.md), the layout mockup with ASCII wireframes (02-layout-mockup.md), the current-vs-future spec (03-current-vs-future-spec.md). Read the GoRout HTML source file. Lifted 14 design tokens into a "GoRout-Exact" table.

Where Blueprint fell short: The deliverable shape (ASCII wireframes + Mermaid + named hex tokens) elided the typographic craft of the rendered reference. The Region 5 PlanSection body contract — "Body text: from Listing.hs_body, color: #9aa4b2; font-size: 0.92rem" — under-specified the document-rendering layer for a property that holds multi-paragraph prose with internal structure. Blueprint processed the reference through the slots its deliverable shape provides and did not flag that the slots could not hold the document-rendering pattern.

Corrective action specific to Blueprint:

  1. New deliverable artifact: "Reference-derived rendering specification." For any visual-craft surface that names a rendered reference in the 5P or PRD, Blueprint produces a document-body rendering spec derived from observation of the rendered reference, not from the reference's CSS tokens alone. Required content: (a) typographic rhythm pattern (heading density, paragraph length, breaks), (b) inline element treatments (code, links, lists, callouts), (c) annotated screenshot of the reference's body region with the derived patterns marked, (d) the corresponding Astro/React component contract in code-shaped form (not prose) such that Showcase can implement directly.
  2. Read-the-rendered-reference gate: Blueprint commits in the spec header that it has both (a) read the reference's source file and (b) rendered the reference in a browser and screenshotted the relevant regions. Both are required; (a) alone is insufficient.
  3. Pack update: Add a section to skills/diagramming/ titled "Reference-derived specs for visual-craft surfaces" capturing the above as a Blueprint deliverable shape. Hone owns the pack edit.

Showcase

What Showcase did: Wave 2a content mining; Wave 2b Listing creates + property spec; Wave 3 build at V2 commit e515d29; Wave 5 Sev-1 fix at V2 commit 05fe046. Built the page faithfully to Blueprint's spec.

Where Showcase fell short: Showcase received a brief that contracted the body rendering as "raw text in muted color at 0.92rem" for a property holding multi-paragraph prose with internal structure. The result was predictable. Showcase did not flag the spec gap. Build-the-spec is the floor; flag-the-spec-when-it-can't-deliver-the-DoD is the standard.

Corrective action specific to Showcase:

  1. Pre-build flag protocol: Before Showcase begins implementation on any visual-craft surface where the named DoD includes "match the reference," Showcase produces a one-paragraph "spec coverage statement" — does the Blueprint spec carry enough detail to deliver the DoD against the named reference? If yes, proceed. If no, halt and dispatch back to Blueprint with the named gap. The orchestrator (V) sees the coverage statement before authorizing build.
  2. Document-rendering library: When a body field contains prose, the default rendering is a markdown-to-Astro renderer with reference-derived typographic styling (prose Tailwind plugin tuned to the reference's pattern), not whitespace-pre-wrap of raw text. Add to Showcase pack skills/public-site-building/ as the default for prose body fields.
  3. Codify in Showcase pack: "Receiving a brief is not the same as agreeing the brief is sufficient. If the brief under-specifies a craft layer that the DoD requires, the brief comes back. Silence reads as agreement; agreement requires the coverage statement."

Mirror

What Mirror did: Wave 2a auth-only verification (no authenticated walk; SOP gap with workaround). Wave 4 full authenticated audit with a 21-screenshot capture, dual-signal disambiguation check, content-presence check, layout check, mobile-overflow detection. Wave 5 verify of the Sev-1 fix.

Where Mirror fell short: Mirror's QA rubric did not include "side-by-side comparison against the named visual reference." The PRD acceptance criterion FR-5 named the comparison; Mirror's rubric did not absorb it. The word "gorout" does not appear in either Wave 4 or Wave 5 reports. Mirror walked the page — Mirror did not compare the page to the reference.

Corrective action specific to Mirror:

  1. New mandatory QA dimension for visual-craft surfaces: "Reference comparison." Required when the 5P/PRD names a visual reference. Required artifacts: side-by-side screenshot pair (deployed + reference) at the same viewport, named-dimension comparison table (typography hierarchy, body-copy texture, color treatment, spacing rhythm, dark-theme aesthetic, responsive behavior), and a PASS / FAIL judgment per dimension. PASS-WITH-ISSUES is not available for the reference-comparison dimension — either the build matches the reference or it does not.
  2. Pack update: Add to skills/visual-qa/ a "Reference-comparison QA" pattern with a Playwright recipe for the side-by-side pair (open both URLs in two contexts, navigate to comparable regions, take aligned screenshots, render the comparison report).
  3. Rubric reconciliation: Mirror's QA rubric for visual-craft surfaces is no longer at the rubric author's discretion. The default rubric includes the reference-comparison dimension whenever the PRD names a reference. Hone owns the rubric template.

Q

What Q did: Wave 4 audit at docs/quality/audits/2026-05-14-compass-data-model-page.md. Twenty-one findings (F-1 through F-21) covering code correctness, HubSpot data integrity, documentation, process trail. Verdict: PASS-WITH-ISSUES with four Sev-3 process-tracking gaps.

Where Q fell short: The Q audit framework did not include a visual-craft-vs-reference audit dimension. Q's framework treats visual surfaces as data surfaces with a screenshot attachment — code-correctness lens, data-correctness lens, governance lens, process lens. It does not have a "did the rendered output match the named visual reference" lens. Q assumed Mirror owned visual; Mirror's rubric did not own visual-vs-reference; the dimension fell through the floor.

Corrective action specific to Q:

  1. QMS framework addition: Promote "Visual-Craft-vs-Reference" to a named audit category in docs/quality/qms-framework.md. Definition: when a process produces a visual-craft surface (Compass page, public website page, portal section, presentation deck) whose 5P/PRD names a visual reference, the audit dimension "Reference comparison: rendered output matches named reference" is mandatory and binary (PASS/FAIL — no PASS-WITH-ISSUES).
  2. Audit pack update: Add to skills/quality-system/ an audit pattern that opens the deployed surface and the named reference in parallel browser contexts and produces the comparison evidence as part of the audit record. Distinct from Mirror's reference-comparison QA: Mirror produces the QA artifact during build; Q audits whether the QA artifact exists and whether the orchestrator's complete-call cited it.
  3. Process-register update: Add "Visual-Craft Surface Production" as a Tier 1 process in docs/quality/process-register.md with the Reference-Comparison dimension as a required verification.

5P / PRD Authors (V via /5p-plan; Architect via /prd-generate)

What was done: The 5P named GoRout as the quality reference (line 139). The PRD encoded GoRout-comparable visual quality as DoD criterion (c) and as FR-5 with a side-by-side acceptance criterion. Both documents knew about the reference; both specified the comparison.

Where the authoring fell short: The 5P and PRD named the comparison as a Mirror responsibility ("Verified by Mirror side-by-side screenshot review") without (a) requiring Blueprint to produce a reference-derived rendering spec before Showcase began build, or (b) requiring V to perform the side-by-side personally before declaring complete. The acceptance criterion lived as a Mirror task, and Mirror's rubric did not absorb it. The DoD was vague on measurability — "match-or-exceed judgment across the five dimensions named above" relies on Mirror's subjective judgment without an anchor sample to compare against.

Corrective action specific to 5P/PRD authoring:

  1. /5p-plan update: When the Performance section names a visual reference for a craft surface, the Performance section must additionally name (a) the Blueprint deliverable artifact that will derive the rendering spec from the reference, (b) the Mirror QA artifact that will perform the side-by-side comparison, and (c) the orchestrator gate ("V opens both before complete"). All three must be enumerated; missing any is a 5P defect that blocks PRD generation.
  2. /prd-generate update: When generating from a 5P that names a visual reference, the PRD's FR for visual quality must include all three corresponding acceptance criteria from (1) — the Blueprint reference-derived spec deliverable, the Mirror side-by-side artifact, and the orchestrator side-by-side gate. Each is a separate AC line. If any is missing the PRD fails its own self-check.
  3. DoD measurability: "GoRout-comparable visual quality" is not measurable without an anchor sample. The PRD must include a target-fidelity statement: "The deployed surface is comparable when [N specific named patterns from the rendered reference] are present in the deployed surface." Patterns derived by Blueprint in (1).

V's Accountability Statement

The orchestrator's synthesis of "Engagement complete" after Wave 6 was wrong. I trusted Mirror's PASS-WITH-ISSUES + Q's PASS-WITH-ISSUES verdicts and synthesized them into a closing summary. I did not open https://compass.valuefirstteam.com/abs-company/data-model myself, did not put it side-by-side with https://gorout-walkthrough.vercel.app/, did not visually verify before declaring complete. The kickoff prompt named GoRout as the visual reference; I had every signal I needed to do this gate myself, and I didn't.

The agents downstream did their jobs within their scope. The orchestrator's job was to verify the WHOLE — the specs were narrow, the rubrics were narrow, and "complete" should have meant "the page actually delivers what Chris will see and approve." That gate is mine. I missed it. The corrective action that matters most isn't another SOP — it's that the orchestrator opens the surface next to the reference before any "complete" call on a visual-craft deliverable. No exceptions. Codify and enforce.

— V (COO), 2026-05-15


Severity and Impact Assessment

Severity classification: Tier 1 process failure.

This is a client-facing visual-craft surface commissioned via the team's most-instrumented planning protocol (5P + PRD), executed by the team's most-instrumented dispatch protocol (multi-agent parallel waves with Mirror QA + Q audit + Hone cross-reference sweep). Premature "complete" on a Tier 1 surface is the highest-cost failure mode the QMS exists to prevent. The QMS is the protocol that should make this kind of failure structurally hard, and it did not.

Impact characterization:

  1. Trust signal degradation from Chris. Direct quote captured in the Incident section. The 5P + PRD process is the team's promise that complex multi-agent work will be coordinated and verified rigorously. A premature "complete" call on a 5P+PRD-commissioned surface is a violation of that promise. Trust in the protocol degrades when the protocol's outputs do not match its promises.

  2. Wasted dispatch cycles. Seven waves of multi-agent dispatch (Blueprint Wave 1, Mirror Wave 2a, Showcase Waves 2a/2b/3/5, Mirror Wave 4 + 5, Q Wave 4, Hone Wave 4) produced a build that requires a visual-rescue Wave 7a. The waves themselves were not waste — the page renders correctly, current-state architecture is live, FSDM/CSB content is present. The waste is in the closing-call: had V opened the side-by-side at the end of Wave 5, Wave 7a would have been Wave 6 instead of a recovery, and the trust impact would not have happened.

  3. Future-cost compounding. Each prior occurrence of the Governance Activation Failure pattern (six prior in seven weeks per the Apr 25 Dewey CAR) compounds the cost of the next occurrence by eroding trust in the QMS's ability to catch the pattern. This is the seventh occurrence. The QMS has known this pattern shape since Mar 11, 2026. The visual-craft-vs-reference variant is new in scope but not new in shape.

Not quantified (no evidence basis): Hours, dollars, calendar impact. The waste is in protocol-trust, which is the QMS's currency.


Fix Applied (Process Changes — Gate Changes, Not SOP Additions)

The Apr 25 Dewey CAR named the Governance Activation Failure remediation pattern: (1) Designated owner, (2) Written protocol, (3) Mechanical hook, (4) Scope clarification across conflicting agents. This CAR applies the same four-part stack to the visual-craft-vs-reference gap.

Gate Change 1 — /5p-plan and /prd-generate enumerate all three reference-comparison gates

Owner: Hone (commands maintainer) + Architect (PRD generator).

Where: .claude/commands/5p-plan.md Performance section template; .claude/commands/prd-generate.md FR generation rules.

Behavior: When the 5P Performance section names a visual reference, the section template requires three named artifacts (Blueprint reference-derived spec, Mirror side-by-side QA artifact, orchestrator side-by-side gate). When the PRD is generated from such a 5P, FR-5-equivalent has three acceptance criteria (one per artifact).

Mechanical enforcement: The 5P/PRD self-check (Architect's existing "Could a project manager execute this project from this document alone?" gate) extends to "Are all three reference-comparison gates enumerated when a visual reference is named?" If no, the document fails its own self-check and does not land.

Gate Change 2 — Blueprint deliverable shape adds reference-derived rendering spec

Owner: Hone (skill pack maintainer) + Blueprint (deliverable producer).

Where: skills/diagramming/ adds a "Reference-Derived Specs for Visual-Craft Surfaces" section.

Behavior: When commissioned for a visual-craft surface where the brief names a rendered reference, Blueprint produces a fourth standard artifact (alongside ERD, layout mockup, current-vs-future spec): a "Reference-derived rendering specification." Required content per the Accountability section's Blueprint corrective action.

Mechanical enforcement: The Blueprint deliverable manifest includes a checkbox that Blueprint either delivered the reference-derived rendering spec or that the brief did not name a visual reference. The spawning agent (Marshal or V) sees the checkbox before accepting the Blueprint deliverable.

Gate Change 3 — Mirror QA rubric absorbs reference-comparison dimension

Owner: Hone (rubric template maintainer) + Mirror (QA executor).

Where: skills/visual-qa/ adds a "Reference-Comparison QA" pattern with a Playwright recipe.

Behavior: When auditing a visual-craft surface where the PRD names a visual reference, Mirror's default rubric includes the reference-comparison dimension. The QA report includes a side-by-side screenshot pair, a named-dimension comparison table, and a binary PASS/FAIL judgment per dimension. PASS-WITH-ISSUES is not available for the reference-comparison dimension.

Mechanical enforcement: The Mirror report template carries a ## Reference Comparison section that is non-removable when a visual reference is named in the PRD. Empty or missing = report fails its own template check.

Gate Change 4 — Q audit framework adds Visual-Craft-vs-Reference category

Owner: Q (QMS framework maintainer).

Where: docs/quality/qms-framework.md adds the new audit category. docs/quality/process-register.md adds "Visual-Craft Surface Production" as a Tier 1 process. skills/quality-system/ adds the audit pattern.

Behavior: Q's audit pack for any visual-craft surface includes the Visual-Craft-vs-Reference dimension. Q audits whether (a) Mirror produced the reference-comparison QA artifact, (b) the orchestrator's complete-call cited the comparison, and (c) the comparison's PASS/FAIL judgment is binary and documented.

Mechanical enforcement: Q's audit template for visual-craft surfaces carries a non-removable Visual-Craft-vs-Reference section. Empty = audit fails its own template check.

Gate Change 5 — Orchestrator (V) "open both before complete" gate

Owner: V (orchestrator) — this CAR's primary gate.

Where: V's identity prompt (/mnt/d/V/v-identity-prompt.md); skills/agent-orchestration/ orchestration pack; skills/enforcement/vf-verification-before-completion.md (extension).

Behavior: Before declaring "Engagement Complete" or equivalent on any visual-craft surface, V opens the deployed surface next to the named visual reference in browser windows side-by-side and confirms parity in writing. The closing summary includes either (a) a screenshot pair attached + a 1-sentence parity judgment, or (b) an explicit written judgment naming the dimensions on which the deployed surface matches and the dimensions on which it falls short. Refusal to confirm parity = engagement is not complete.

Mechanical enforcement: Add a check to V's close-out checklist (and to the /end-of-line style closing protocols where applicable) that surfaces a "Reference comparison performed?" prompt whenever a visual-craft surface is in the closing-summary scope. The gate is not bypassable by trusting agent rubrics — agent rubrics precede the gate; they do not substitute for it.


Prevention Measures

Rule Added to Critical Lessons

memory/MEMORY.md Critical Lessons entry:

Visual-Craft Surfaces Require Reference-Comparison Gates (2026-05-15): When a visual-craft surface is commissioned with a named visual reference (5P or PRD), the audit chain must validate the rendered output against the rendered reference, not against text-derived specs derived from the reference. Five gates are required: (1) PRD enumerates Blueprint reference-derived spec, Mirror side-by-side QA, and orchestrator side-by-side gate; (2) Blueprint produces reference-derived rendering spec; (3) Mirror's QA rubric includes binary reference-comparison dimension; (4) Q audit framework includes Visual-Craft-vs-Reference category; (5) orchestrator opens deployed surface next to reference before declaring complete. Documented-but-not-wired = not real. Apply per docs/quality/cars/2026-05-15-compass-data-model-premature-complete.md.

Rule Added to Self-Correction

skills/enforcement/vf-self-correction.md new Detection Trigger:

About to declare "Engagement Complete" or equivalent on a visual-craft surface (page, portal, presentation, microsite). Correction: STOP. Did the original brief (5P or PRD or kickoff prompt) name a visual reference? If yes, open the deployed surface in one browser window and the named reference in another browser window side-by-side, at the same viewport, on comparable regions. Confirm parity in writing in the closing summary. Refusal or inability to confirm parity = the engagement is not complete; dispatch a visual rescue wave. Agent rubrics never substitute for this gate.

Conventions Update

wiki/conventions.md new section "Visual-Craft Surfaces":

When a visual-craft surface is commissioned with a named visual reference, the production chain follows the five-gate protocol in docs/quality/cars/2026-05-15-compass-data-model-premature-complete.md. Briefly: 5P/PRD enumerates three artifacts (Blueprint reference-derived spec, Mirror side-by-side QA, orchestrator side-by-side gate); Blueprint produces the reference-derived rendering spec; Mirror's QA rubric includes the binary reference-comparison dimension; Q audits the reference-comparison artifact; orchestrator opens the deployed surface next to the reference before declaring complete. The reference-comparison dimension is binary (PASS or FAIL) — PASS-WITH-ISSUES is not available for that dimension.

Q QMS Framework Addition

docs/quality/qms-framework.md adds "Visual-Craft-vs-Reference" as a named audit category, and docs/quality/process-register.md adds "Visual-Craft Surface Production" as a Tier 1 process. Implementation tracked as Q's next pack-update commit (separate from this CAR).


Lessons

A visual-craft surface cannot be specified in text alone. The slots a text spec provides (color tokens, layout grids, spacing values, structural patterns) elide the typographic craft of the rendered reference. The same tokens applied to the same structure produce GoRout or a markdown dump in colored boxes — what separates them lives in the document-rendering layer that text specs do not capture.

The audit chain inherits the failure mode of the spec. If the spec is text-derived, every downstream check that compares the build to the spec is also text-derived. The chain validates a coherent abstraction that has detached from the user-facing reference. The reference must be carried forward as a rendered artifact at every gate, not as a name in prose.

The orchestrator's "complete" call is the final gate, not a synthesis of prior gates. The specialists check what their rubrics say to check; the orchestrator checks whether the rubrics themselves were sufficient. When the rubrics elide a DoD dimension, the orchestrator's job is to catch the elision before declaring complete. Trusting the synthesis when the rubrics were narrow = the orchestrator's gate failed.

Documented-but-not-wired is the operative pattern. The PRD named the side-by-side reference comparison as a Mirror acceptance criterion. Mirror's rubric did not absorb it. Q's framework did not absorb it. V's complete-call protocol did not absorb it. Documentation alone — even in the team's most-instrumented planning protocol — does not change behavior. Governance Activation requires all four wires (5P/PRD, Blueprint, Mirror, Q) plus the orchestrator gate; missing any one is indistinguishable from missing all of them.


Related Incidents

This is the seventh occurrence of the Governance Activation Failure category in seven weeks. Prior six tracked in docs/quality/cars/2026-04-25-corrective-dewey-registrar-activation.md Related Incidents section: visitor experience ownership gap (Mar 11), Ledger bypass across 24 commands (Mar 16), Marshal/Chronicle dual-role failure (Apr 9), behavioral enforcement failure (Apr 9), incomplete agent onboarding (Apr 12), Dewey Registrar activation (Apr 25). Same shape: an architectural decision designating an owner, gateway, or enforcement role is documented but never wired into agent definitions, protocol layer, or mechanical enforcement.

The variant in this CAR: an acceptance criterion designating a verification dimension is documented in the PRD but never wired into the executing agent's rubric, the auditing agent's framework, or the orchestrator's complete-call protocol. Same root pattern. Seventh hit confirms the category is the active dominant failure shape across the team.

Recommendation to Echo

Echo's pattern memory should now treat the Governance Activation Failure category as the active dominant failure mode and proactively scan every new architectural decision (new owner, new gateway, new acceptance criterion, new enforcement rule) for the four-wire test before the decision lands: (1) is the role/criterion in the responsible agent's runtime definition, (2) is the protocol in wiki/conventions.md or equivalent, (3) is there mechanical enforcement that makes the failure mode structurally hard, (4) is the orchestrator's gate updated where applicable. Decisions failing the four-wire test get flagged as "documented-but-not-wired" before they ship.

Recommendation to V

The orchestrator's "complete" call has now failed in this category twice (Apr 25 Dewey threshold breach was a related orchestration miss; this CAR is the more direct one). Consider whether V should adopt a structured close-out checklist for any work commissioned via 5P+PRD, with explicit gates for visual-craft surfaces, code-correctness surfaces, and data-correctness surfaces. The current "synthesize agent reports → call complete" pattern does not catch elisions in the agent rubrics themselves.

Wave 7a (parallel)

Wave 7a (Mirror + Blueprint + Showcase visual rescue) is dispatched in parallel to this CAR. Wave 7a's deliverable is the rebuild plan and execution; this CAR's deliverable is the process change. Both land independently. Wave 7a's outputs do not retroactively change the failure recorded here — the page being fixed in Wave 7a is the same page that V called complete in Wave 6. The CAR records the protocol failure; the rescue records the artifact recovery.


Co-authored by Q (Quality System manager, lead) and V (COO, accountable orchestrator). Filed 2026-05-15. Implementation of the five gate changes tracked as separate dispatches owned by Hone, Architect, Mirror, Q, and V respectively.