Corrective Action: 12 Duplicate Sanity Episode Documents Created by Content Pipeline
Date: 2026-03-24 Category: Idempotency Failure -- Non-Deterministic Document Creation Impact: 12 phantom episode documents in Sanity CMS polluting GROQ queries for "Value-First Platform: AI Data Readiness - Mar 17, 2026." No user-facing website impact. Resolution Time: Same session as discovery Filed by: Q (Instruction Optimizer)
Incident
What Happened
The content pipeline (agents/content-pipeline/run-pipeline.ts) created 12 duplicate episode documents in Sanity CMS between Mar 17 13:40 UTC and Mar 18 04:48 UTC -- roughly one per hour. All 12 duplicates had random auto-generated Sanity IDs, no slug, no airDate, and no meaningful content beyond a title.
The real episode document (episode-vf-platform-ai-data-readiness-2026-03-17) had been created correctly on Mar 15 by create-weekly-episodes.ts with a deterministic _id, slug, airDate, show reference, and full metadata.
What Should Have Happened
When the pipeline encountered an orphaned Mux recording, it should have matched it to the existing episode document by deterministic ID or searched for a matching episode before attempting creation. If a matching episode already existed, no new document should have been created.
Root Cause
The createEpisodeForAsset() function in run-pipeline.ts (line 556) used client.create() without specifying a deterministic _id. Sanity auto-generates random IDs when _id is omitted. Every time the pipeline ran and found an orphaned Mux recording without a linked episode, it created a new episode document instead of finding the existing one.
The correct pattern already existed in the same codebase. create-weekly-episodes.ts uses createIfNotExists() with a deterministic _id following the format episode-{show-slug}-{date}. This pattern guarantees idempotency -- calling it 100 times produces exactly one document. The orphan-handling path in run-pipeline.ts did not follow this pattern.
Two failures compounded:
- No deterministic
_id--client.create()with an auto-generated ID means every invocation creates a new document, regardless of whether a matching document exists - No pre-creation search -- The function did not query Sanity for an existing episode matching the show and date before attempting to create one
The pipeline runs on a cron schedule. Each cycle detected the same orphaned Mux asset, attempted to create an episode for it, succeeded (because create() always succeeds with a new random ID), and moved on. The next cycle found the same orphan again because the newly created episode lacked the metadata needed to link it to the Mux asset. This produced one duplicate per hour for approximately 15 hours.
Impact
- Data pollution: 12 phantom episode documents in the Sanity dataset with random IDs, no slugs, no air dates, and no content
- Query degradation: GROQ queries filtering by episode title (e.g., show-prep episode lookups) returned 13 results instead of 1, making it difficult to identify the real episode
- Duration: Mar 17 13:40 UTC through Mar 18 04:48 UTC (~15 hours of hourly duplicate creation)
- Scope: Single episode affected ("Value-First Platform: AI Data Readiness - Mar 17, 2026")
- User-facing impact: None -- the duplicates had no slug and would not resolve to a page on the website
Fix Applied
1. Deleted 12 Duplicate Documents
Removed all 12 phantom documents via a Sanity transaction (transaction ID: nVTrH0M6Un046Uvs6SBqBm). The real episode document (episode-vf-platform-ai-data-readiness-2026-03-17) was verified intact with all metadata.
2. Fixed createEpisodeForAsset() in run-pipeline.ts
Changed the orphan-handling path from client.create() to createIfNotExists() with:
- Deterministic
_idfollowing theepisode-{show-slug}-{date}convention - Slug generated from the episode title
airDatederived from the recording date- Show reference linking to the correct show document
- Full metadata matching the pattern established in
create-weekly-episodes.ts
This makes the function idempotent. Running it against the same orphaned asset any number of times produces exactly one episode document.
3. Ownership Assignment to Marquee Media Team
Sanity episode document integrity is now explicitly assigned to Marquee's media production agents:
- Prelude (pre-production): Validates episode documents are complete before streams -- slug,
airDate, and show reference must be present - Encore (post-production): Validates episode documents are complete after streams -- Mux asset linked, no duplicates exist for the same show and date, transcription status tracked
Timeline
| Date | Event |
|---|---|
| Mar 15 | Real episode document created by create-weekly-episodes.ts with deterministic ID |
| Mar 17 13:40 UTC | First duplicate created by run-pipeline.ts orphan handler |
| Mar 17-18 | Pipeline cron creates one duplicate per hour (~12 total) |
| Mar 18 04:48 UTC | Last duplicate created |
| Mar 24 | Duplicates discovered during content audit |
| Mar 24 | 12 duplicates deleted via Sanity transaction |
| Mar 24 | createEpisodeForAsset() fixed to use createIfNotExists() with deterministic ID |
| Mar 24 | Prelude and Encore agents assigned episode document integrity validation |
Prevention
Sanity Document Creation Rule
All Sanity episode creation must use createIfNotExists() with deterministic _id values following the episode-{show-slug}-{date} convention. Using client.create() with auto-generated IDs for episode documents is prohibited. This rule applies to every code path that creates episodes -- not only the weekly creation script, but also orphan handling, backfill scripts, and any future creation path.
Pre-Production Validation (Prelude)
Before any scheduled stream, Prelude validates that the target episode document exists and is complete:
_idfollows deterministic conventionslugis setairDateis set- Show reference is present
Missing or incomplete episode documents are flagged before the stream occurs, preventing post-stream orphan scenarios that trigger the faulty creation path.
Post-Production Validation (Encore)
After any stream completes, Encore validates:
- The Mux asset is linked to the correct episode document
- No duplicate episode documents exist for the same show and date
- Transcription status is tracked
Duplicate detection catches any recurrence before phantom documents accumulate.
Enforcement Lesson
When a codebase contains both an idempotent creation pattern (createIfNotExists() with deterministic IDs) and a non-idempotent creation pattern (create() with auto-generated IDs) for the same document type, the non-idempotent path will eventually run in a loop and produce duplicates. The only question is when.
The architectural rule is simple: every document type that can be created by automated processes must use deterministic IDs and idempotent creation. client.create() is appropriate for user-initiated, one-time document creation in Sanity Studio. It is never appropriate for code paths that run on schedules or in retry loops.
The correct pattern already existed in the codebase. The failure was not a lack of knowledge -- it was a lack of consistency. Two scripts creating the same document type used two different creation strategies. The fix is not just patching the broken function; it is establishing that createIfNotExists() with deterministic IDs is the only acceptable pattern for automated episode creation, and assigning ownership (Prelude and Encore) to validate compliance.