Corrective Action Report: Mirror Content Quality Gap
Date: April 10, 2026 Reported by: Marquee (Media BU Leader) Severity: High -- content shipped to production that a human visitor would immediately recognize as broken, but QA agent certified as PASS
Incident
On April 9, 2026, during /media-prep, Mirror was spawned to verify the episode page for "Value-First Platform: AI Data Readiness - Apr 8, 2026" at valuefirstteam.com/media/episodes/vf-platform-ai-data-readiness-2026-04-08.
Mirror reported PASS and stated "All key elements are confirmed."
The page had seven defects visible to any human visitor:
- Wall of text. The entire 33,701-character transcript rendered as a single
<p>element -- no paragraph breaks, no visually distinct speaker labels, no topic navigation. Unreadable. - No description. Both
summaryanddescriptionfields were null. A visitor landing on this page had zero context about what the episode covered. - No attribution. Trisha Merriam and Erin Wiggers were not listed as hosts or guests. The page gave no indication of who was speaking.
- No duration. No runtime displayed -- a visitor could not assess the time commitment.
- Empty show badge. The show title was null, so the badge rendered blank.
- Unused fullSummary. A 1,696-character summary existed in Sanity but the episode template never reads the
fullSummaryfield, so it was invisible on the page. - Buried key takeaways. Nine
aiKeyPointsexisted but were rendered only inside the transcript viewer, not as top-level scannable takeaways.
Chris caught the wall of text himself and asked "did you see the wall of text on that page?" The issue was immediately obvious to a human visitor but invisible to Mirror's automated checklist.
Root Cause Analysis
Primary: Mirror had no content quality criteria
Mirror's agent definition (.claude/agents/mirror.md) contained thorough checks for:
- Visual rendering (layout, typography, branding, responsiveness)
- Interactive behavior (click handlers, state changes, keyboard navigation, accessibility)
- Technical health (HTTP status, console errors, broken images, link validation)
It contained zero checks for:
- Is this text readable? (formatting, paragraph structure, breathing room)
- Is this page useful to a first-time visitor? (context, attribution, completeness)
- Are key fields populated with meaningful content? (not just "field exists" but "field serves the visitor")
Mirror's quality model was binary: element rendered or not. A 33,701-character transcript in a single <p> tag counted as "transcript present." A page with no description counted as acceptable because the description field was simply absent rather than displaying an error. The distinction between "technically present" and "actually useful" did not exist in Mirror's vocabulary.
This is equivalent to a building inspector verifying that a door frame exists without checking whether the door opens.
Contributing: Transcription pipeline produced unformatted output
The transcription pipeline (apps/website/scripts/transcribe-pending.ts) sends audio to Gemini 2.0 Flash with a detailed TRANSCRIPTION_PROMPT that requests speaker labels, timestamps at topic transitions, paragraph breaks, and topic headings. On this episode, Gemini returned a 0-character response -- twice. The transcript was eventually generated with a simplified prompt (no speaker identification, no timestamps, no topic segmentation), producing 33,701 characters of flat, unformatted text.
This flat text was then patched into Sanity as a single block and rendered by the episode template as a single <p> element. The pipeline has no post-transcription quality check -- it verifies character count (non-zero) but not formatting quality.
Contributing: Episode template has no fallback for missing fields
The episode page template does not fall back to fullSummary when both summary and description are null. The fullSummary field (populated by the transcription pipeline's SUMMARY_PROMPT) existed with genuinely useful content but was inaccessible to the visitor. The template treats fullSummary as a transcript-viewer-only field rather than a page-level content source.
Fix Applied
Mirror's agent definition was updated on April 10, 2026 with a new Mode 3: Content Quality Audit that makes content readability and completeness mandatory before any PASS verdict.
The Stranger Test
Before marking any page PASS, Mirror must confirm that a first-time visitor arriving from a search engine can answer three questions within 10 seconds:
- What is this about? -- Clear title, description, or summary visible above the fold
- Who created this? -- Authors, hosts, guests, or contributors attributed
- Is it worth my time? -- Duration, takeaways, or topic indicators visible to inform engagement decision
If any answer is "no," the page cannot PASS.
Content completeness checklists by page type
Episode pages, article pages, and portal sections each have tiered checklists separating FAIL conditions (missing description, missing attribution, missing show name, unformatted transcript) from WARN conditions (missing duration, missing tags, missing timestamps).
Readability enforcement
- Any unbroken text block exceeding approximately 500 characters without a visual break (paragraph, heading, speaker label, or list) is flagged as FAIL
- Empty sections showing "null" or "undefined" are flagged as FAIL
- Content with no author or source identified is flagged as FAIL
Transcript-specific checks
- Speaker labels must be visually distinct
- Paragraph breaks must exist at topic transitions
- A 33,000-character transcript as a single
<p>element is a FAIL, not a suggestion
Updated report format
Mirror's report format now requires a Stranger Test section and uses PASS/FAIL/WARN severity (replacing the old pass/issues binary). The delegation contract quality bar was updated to require content quality checks before any PASS.
Cross-agent quality review
Baldwin (journalist agent) provided the content criteria from a visitor's perspective -- what makes a page worth reading versus technically complete. This cross-agent review pattern (technical QA agent + content specialist defining quality together) produced criteria that neither agent would have generated alone.
Prevention
Structural changes
Mirror cannot certify PASS without content quality. The Stranger Test is embedded in the agent definition's Mode 3 and the delegation contract quality bar. A page that renders correctly but fails to inform a visitor is a FAIL.
Page-type checklists enforce completeness. Episode pages without a visible description, host attribution, or show name are automatic FAILs regardless of how well the video player works.
Wall-of-text detection is automatic. Any text block exceeding approximately 500 characters without a visual break triggers a FAIL. This catches the exact class of defect that was invisible to the old Mirror.
Process changes
- Cross-agent quality gate pattern. When establishing quality criteria for a page type, the technical QA agent (Mirror) should be informed by a domain specialist (Baldwin for content, Pavilion for portal sections, Showcase for walkthrough apps). Mirror checks what the specialist defines as quality.
Open Items
| Item | Owner | Status |
|---|---|---|
transcribe-pending.ts complex prompt causes Gemini 2.0 Flash to return 0-char transcripts intermittently |
Marquee / Encore | Open -- needs investigation into prompt length vs. model capacity |
Episode template does not fall back to fullSummary when summary and description are null |
Aegis (website) | Open -- template change needed in apps/website/src/pages/shows/[show]/[slug].astro or equivalent |
| This specific episode still has wall-of-text transcript | Canon (Sanity write) | Open -- needs re-generation with formatting or manual cleanup |
speakerMap field is never populated by the transcription pipeline |
Marquee / Encore | Open -- requires post-transcription speaker identification step |
Lessons Learned
1. "Present" is not "useful"
Mirror's original criteria treated content as binary: rendered or not rendered. A 33,701-character wall of text is technically present, but it serves nobody. The distinction between "data exists on the page" and "a visitor can use this page" is the difference between QA and quality. Mirror was doing QA. Now it does quality.
2. Quality criteria require domain expertise, not just technical inspection
Mirror is a visual and interactive testing agent. Asking it to independently determine what makes an episode page useful is like asking a structural engineer to evaluate a restaurant menu. Baldwin's content perspective -- "would a stranger find this page worth their time?" -- produced the Stranger Test. Technical agents need domain input to define quality for the surfaces they inspect.
3. Silent pipeline failures compound into visible page failures
The transcription pipeline's fallback to a simplified prompt produced output that "worked" (non-zero characters, saved to Sanity, no error logged) but degraded the visitor experience. Pipeline success metrics should include output quality, not just output existence. A transcript without speaker labels and paragraph breaks is a partial failure even if it contains all the words.
4. The gap between "no errors" and "good experience" is where trust erodes
A visitor who arrives at this episode page sees a video player, a blank space where a description should be, no information about who is speaking, and a wall of text. There are no error messages. There are no broken images. There are no 404s. Everything looks like it "works." But the visitor leaves, because the page communicates nothing about why they should stay. This is the gap that QA automation misses when it only checks for failures rather than checking for value.
Files Changed
| File | Change |
|---|---|
.claude/agents/mirror.md |
Added Mode 3: Content Quality Audit, Stranger Test, page-type checklists, readability enforcement, transcript-specific checks, updated report format, updated delegation contract quality bar |