New Capability: Sanity Episode Document Integrity

New Capability: Sanity Episode Document Integrity

Date: 2026-03-24 Origin: 12 duplicate Sanity episode documents were created for a single Mar 17, 2026 episode. Root cause: client.create() without deterministic _id in the content pipeline's createEpisodeForAsset() function. No agent was responsible for Sanity data quality -- the gap allowed duplicates to accumulate undetected. Impact: Episode documents are now validated at both ends of the lifecycle -- pre-stream by Prelude and post-stream by Encore -- with Marquee as the quality authority for all Sanity episode data. Duplicates are detected before they accumulate, and creation standards (deterministic IDs only) are enforced. Filed by: Q (Instruction Optimizer) CAR: Leadership/reports/2026-03-24-corrective-sanity-duplicate-episodes.md


Capability

The Value-First media operation produces 11 shows. Every show episode requires a Sanity document that connects the content pipeline: scheduling, recording, transcription, social distribution, and website display. Before this capability, episode documents were created by automated pipeline code without validation at either end. No agent checked whether an episode document was production-ready before a show aired. No agent checked whether an episode document was complete after a show aired. No agent owned the integrity of the Sanity episode data at all.

The content pipeline's createEpisodeForAsset() function used client.create() -- which generates a random _id on every call -- instead of createIfNotExists() with a deterministic ID. The correct pattern already existed in create-weekly-episodes.ts but had not been applied to the asset-triggered code path. The result was 12 duplicate documents for a single episode, each with a random ID, each missing critical metadata.

Marquee (Media BU Leader) is now the quality authority for Sanity episode data. Two specialists -- Prelude and Encore -- execute validation at the pre-production and post-production boundaries. The pipeline code has been fixed to use deterministic IDs. The organizational gap that allowed duplicates to accumulate undetected is closed.


What Was Built

Pre-Production Validation (Prelude)

Prelude's ownership surface was expanded to include Sanity episode document readiness. Before any show airs, Prelude validates:

Check Rule
Deterministic ID Format: episode-{show-slug}-{YYYY-MM-DD}
Slug Present and correctly formatted
Air date Present and matches scheduled date
Show reference Valid reference to parent show document
Title Follows naming convention
Duplicate detection Flags episodes with random-format IDs

Health signal: When Prelude runs /media-prep pre-production, zero episodes should have random-format IDs for upcoming shows.

Post-Production Validation (Encore)

Encore's ownership surface was expanded to include Sanity episode document completeness. After a show airs, Encore validates:

Check Rule
Mux recording Linked to episode document
Transcription Status progressing (not stalled)
Description Populated within 48 hours of air date
Social snippets LinkedIn post populated within 48 hours
Duplicate detection Flags and reports any duplicate documents
Creation pattern All episodes must use createIfNotExists() with deterministic IDs

Health signal: When Encore runs /media-prep post-production, zero episodes should be missing recordings or descriptions beyond the 48-hour window.

Quality Authority (Marquee)

Marquee coordinates the response when Prelude or Encore flags issues:

  • Duplicate deletion -- remove random-ID duplicates, preserve deterministic-ID canonical documents
  • Pipeline code fixes -- ensure all creation paths use createIfNotExists()
  • Prevention verification -- confirm fixes hold across subsequent episodes

Marquee's ownership table in the agent definition now includes Sanity episode integrity alongside the existing media pipeline responsibilities.

Pipeline Fix

The createEpisodeForAsset() function in agents/content-pipeline/run-pipeline.ts was fixed:

Before After
client.create({ _type: 'episode', ... }) client.createIfNotExists({ _id: episodeId, _type: 'episode', slug, airDate, ... })
Random _id on every invocation Deterministic _id: episode-{show-slug}-{YYYY-MM-DD}
No slug or airDate Full metadata matching create-weekly-episodes.ts pattern
Duplicates on every re-run Idempotent -- safe to run multiple times

Architecture

Lifecycle Validation Flow

Episode Scheduled
    |
    v
Prelude (pre-production validation)
    - Deterministic ID present?
    - Slug, airDate, show reference, title?
    - Any random-ID duplicates?
    |
    v
Show Airs
    |
    v
Content Pipeline (createEpisodeForAsset)
    - createIfNotExists() with deterministic ID
    - Idempotent -- does not create duplicates
    |
    v
Encore (post-production validation)
    - Mux recording linked?
    - Transcription progressing?
    - Description + social within 48h?
    - Any duplicates detected?
    |
    v
Marquee (quality authority)
    - Coordinates response to flagged issues
    - Duplicate deletion
    - Prevention verification

The episode lifecycle is bookended: Prelude validates readiness before the stream, Encore validates completeness after the stream. The content pipeline operates between them with idempotent creation. Marquee has authority over the entire chain.

Ownership Boundary

Owner Scope
Prelude Pre-production validation rules, episode readiness checks
Encore Post-production validation rules, completeness checks, creation pattern enforcement
Marquee Quality authority, duplicate resolution, prevention verification, escalation to V
Content Pipeline code createEpisodeForAsset() implementation, deterministic ID generation

Prelude and Encore do not modify Sanity data directly. They detect and report. Marquee coordinates the response. Code fixes go through standard development workflow.


Agent Definitions Updated

Agent File Changes
Marquee .claude/agents/marquee.md Added Sanity episode integrity to ownership table; added validation section under Episode Pipeline Health
Prelude .claude/agents/prelude.md Added Sanity pre-validation ownership surface, health signal, validation rules, duplicate detection, escalation protocol
Encore .claude/agents/encore.md Added Sanity post-validation ownership surface, health signal, validation rules, duplicate detection, creation rule enforcement, escalation protocol

Why It Matters

The Duplicate Accumulation Problem

12 duplicate documents for a single episode is not a catastrophic failure. It is a governance gap. No single duplicate caused visible damage -- the website picked the first match, social distribution used the first match, transcription attached to the first match. But each duplicate was a document with a random ID, no slug, no airDate, no show reference. They polluted the Sanity dataset and created ambiguity about which document was canonical.

The root cause was not bad code. The correct pattern existed in create-weekly-episodes.ts. The root cause was that nobody was responsible for verifying the creation pattern was applied consistently across all code paths. When a second creation path (createEpisodeForAsset) was written without the deterministic ID pattern, no validation caught it.

Bookended Validation

The architecture of pre-production and post-production validation means an episode document is checked twice -- once for readiness and once for completeness. This is not redundant. Pre-production catches setup failures (missing metadata, wrong IDs) before they compound. Post-production catches delivery failures (missing recordings, stalled transcription) before the 48-hour window closes. The two checks cover different failure modes at different points in the lifecycle.

Quality Authority

Marquee having explicit quality authority over Sanity episode data means there is a named owner for the question "is our episode data clean?" This is the same pattern as Aegis owning the org chart pages -- the fix is not better auditing, it is ownership. When the next code path is added that creates episode documents, Marquee's team (Prelude and Encore) will detect whether it follows the deterministic ID pattern.


Verification

The capability is confirmed when:

  • Prelude agent definition at .claude/agents/prelude.md contains Sanity pre-validation ownership surface with deterministic ID rules, duplicate detection, and health signal
  • Encore agent definition at .claude/agents/encore.md contains Sanity post-validation ownership surface with completeness rules, creation pattern enforcement, and health signal
  • Marquee agent definition at .claude/agents/marquee.md contains Sanity episode integrity in ownership table with quality authority scope
  • createEpisodeForAsset() in agents/content-pipeline/run-pipeline.ts uses createIfNotExists() with deterministic _id format episode-{show-slug}-{YYYY-MM-DD}
  • No random-ID duplicate episode documents exist for episodes created after the fix

Significance

This capability closes the last unowned segment of the media pipeline. Marquee already owned show scheduling, recording verification, and distribution. Episode document integrity -- the Sanity data that connects all of those stages -- was the gap. Prelude and Encore were already operating at the boundaries of the episode lifecycle for their respective pre-production and post-production responsibilities. Extending them to cover Sanity validation is a natural expansion of their existing ownership surfaces, not a new responsibility grafted onto agents with unrelated domains.

The pattern here is consistent with how the organization has grown: identify a governance gap through an incident, assign ownership to the agent closest to the problem, document the validation rules, and make the health signal measurable. The 12 duplicate documents were the incident. Marquee, Prelude, and Encore were the closest agents. Deterministic IDs and lifecycle validation are the rules. Zero random-ID episodes and zero stale completeness gaps are the signals.

The human role is relationships and judgment. The system role is everything else -- including ensuring its own content data is clean.