Corrective Action: Media Transcription Pipeline Silent Failure — 36 Episodes Stuck for 11 Weeks

Date: March 30, 2026 Category: Verification Failure + Missing Monitoring Impact: 36 episodes (Jan 13 – Mar 29) accumulated in transcriptionStatus: "processing" without completing. No agent or system detected the stall. Transcription intelligence (summaries, key points, trap detection) missing from these episodes. Content multiplication pipeline operated on incomplete data. Resolution Time: ~30 minutes (batch reset). Root cause investigation ongoing.

Incident

What Happened

During the March 30 /media-prep briefing, Encore flagged that the Tech Stack Reimagined episode (Mar 29, 153 minutes with Nico) had been in transcriptionStatus: "processing" for over 24 hours. Investigation revealed this was not an isolated case — 36 episodes spanning January 13 through March 29, 2026 were stuck in the same state. The Inngest transcription workflow was receiving events (status was being set to "processing") but never completing. Meanwhile, 71 other episodes have completed transcriptions, indicating the pipeline works intermittently.

Contributing Factors

Inngest credentials were only added to Vercel on March 25, 2026. The 33 episodes from Jan 13 – Mar 24 were set to "processing" by the Mux webhook handler but the Inngest workflow had no credentials to execute.
No monitoring exists between webhook dispatch and workflow completion. The Mux webhook fires, sets status to "processing", sends an Inngest event, and returns 200. Nothing checks whether Inngest actually picks up and completes the workflow.
/media-recap only checks yesterday's recordings when manually run. It does not scan for systemic pipeline stalls.
/media-prep morning briefing did not surface transcription health. The dashboard shows individual episode transcript status but not pipeline-wide health.
get-social-schedule.js contained a hardcoded "26 undistributed articles" string (lines 180, 223) that was presented as live data, masking the actual distribution state.

Timeline

Date	Event
Jan 13	Earliest episode set to `processing` (AI Daily, Data)
Jan 13 – Mar 24	33 episodes accumulate in `processing` without Inngest credentials
Mar 25	`INNGEST_EVENT_KEY` and `INNGEST_SIGNING_KEY` added to Vercel
Mar 24-29	3 more episodes enter `processing` post-credential-add (different failure mode)
Mar 30 05:58	`/media-prep` run. Encore flags Tech Stack Reimagined stall
Mar 30 06:15	Investigation reveals 36 total stuck episodes
Mar 30 06:25	Batch reset: all 36 episodes set from `processing` → `pending`

Root Cause

Two distinct failures:

Failure 1 (33 episodes, Jan 13 – Mar 24): The Mux webhook handler at apps/website/src/pages/api/webhooks/mux.ts successfully received video.asset.ready events and dispatched Inngest events via inngest.send(). However, the Inngest runtime on Vercel had no credentials (INNGEST_EVENT_KEY, INNGEST_SIGNING_KEY) until March 25. The inngest.send() call likely failed silently or the events were accepted but no function could authenticate to process them.

Failure 2 (3 episodes, Mar 24-29): After credentials were added, 3 more episodes still stalled. This indicates a secondary issue — possibly the Inngest function itself failing (missing GEMINI_API_KEY, Gemini API quota, audio download failure, or function timeout on long recordings like the 153-minute Tech Stack Reimagined).

Monitoring failure (all 36): No system checked whether the pipeline completed. The status was set to "processing" by the webhook handler and never updated because the downstream workflow never ran. The only detection mechanism — Encore checking for >24h stalls — is manual and session-dependent.

Category: Verification Failure + Missing Monitoring

This is a pipeline without a circuit breaker. The webhook handler returns success after dispatching the event, but the actual transcription happens asynchronously with no completion verification. When the async step fails, the pipeline enters a permanently stuck state that accumulates silently.

Additional finding: get-social-schedule.js contained hardcoded strings ("26 undistributed articles") on lines 180 and 223 that were presented as live data by the Broadcast agent. This is a data integrity issue — static strings in operational scripts that agents report as facts.

Fix Applied

Immediate Resolution

Batch reset all 36 episodes from transcriptionStatus: "processing" → "pending" via Sanity client
Removed hardcoded "26 undistributed articles" from get-social-schedule.js
Updated /media-prep command to include yesterday's on-demand audience-ready links (previously missing per Mar 24 feedback)
Updated content-queue.json — "The Leads Trap in Disguise" status corrected from drafted → published

Code/Configuration Changes

File	Change
`scripts/hubspot/get-social-schedule.js:180`	Removed hardcoded "26 undistributed articles" from empty-schedule message
`scripts/hubspot/get-social-schedule.js:223`	Removed hardcoded "26 undistributed articles" from gap-detected message
`.claude/commands/media-prep.md`	Added audience-ready verification for yesterday's episodes in schedule post section
`agents/content-multiplier/data/content-queue.json`	Updated "The Leads Trap in Disguise" status: `drafted` → `published`
36 Sanity episodes	`transcriptionStatus`: `processing` → `pending`

Verification

node scripts/sanity/query.js --query 'count(*[_type == "episode" && transcriptionStatus == "processing"])'
→ 0

node scripts/sanity/query.js --query 'count(*[_type == "episode" && transcriptionStatus == "pending"])'
→ (increased by 36)

Prevention Measures

Rules Added

Layer	File	Rule
Critical Lessons	`MEMORY.md`	NEVER trust async pipeline completion without monitoring. The Mux → Inngest → Gemini transcription pipeline ran for 11 weeks with 36 failures because no agent checked completion. Every async dispatch needs a completion verifier.
Critical Lessons	`MEMORY.md`	NEVER hardcode counts or metrics in operational scripts. `get-social-schedule.js` had "26 undistributed articles" as a static string. Agents reported it as live data. Every number in an ops script must come from a query.
Operations	`memory/operations.md`	Transcription pipeline status: 36 episodes reset to pending (Mar 30). Inngest credentials added Mar 25. Pipeline needs completion monitoring — no agent currently watches for stalled transcriptions.

Detection Triggers

For /media-prep and /media-recap: Add a pipeline health check: query count(*[_type == "episode" && transcriptionStatus == "processing" && dateTime(airDate) < dateTime(now()) - 86400]) — any episode in "processing" for >24 hours is a pipeline failure, not a pending job.

Structural Gaps Identified

No codename for Content Pipeline orchestrator — runs create-weekly-episodes.ts, run-pipeline.ts, check-shows.ts but has no identity. Unnamed agents get less coherent behavior.
No codename for the Inngest transcription workflow — it's infrastructure, not an agent, but it needs monitoring ownership.
No codename for Transcript Harvester or Transcript Backfill — both are manual batch tools with no agent identity.
Encore only runs when Marquee spawns it — no autonomous stall detection.
Content queue scan (lastScanDate) has no freshness monitoring — went 13 days stale without detection.

Lessons

Every async pipeline needs a completion verifier. "Fire and forget" is acceptable for the dispatch — but something must check the other end. In this system, the Mux webhook handler returns 200 and moves on. Nothing asks "did the Inngest workflow finish?" The gap between dispatch and completion is where 36 episodes disappeared for 11 weeks.

Static strings in operational scripts are a particularly insidious form of data drift. Agents trust tool output as ground truth. When a script says "26 undistributed articles," the agent reports it as fact, the briefing presents it as intelligence, and humans act on fabricated data. Every number in an ops script must be computed from a live query.

Related Incidents

Mar 8, 2026: Calendar invitees reported as attendees — a different form of "trusting upstream data without verification"
Feb 18, 2026: HubSpot Broadcast API published 15 posts immediately — fire-and-forget without verification
Mar 4, 2026: esbuild/tsx block comment failures — silent infrastructure failures that accumulate
Mar 23, 2026: Background workers shut down because GitHub Actions reports never reached local filesystem — another case of a pipeline producing "success" while delivering nothing

Recommended Next Steps

Assign codenames to all unnamed media pipeline agents (Content Pipeline, Transcript Harvester, Transcript Backfill)
Build a transcription completion monitor — owned by a named agent, runs during /media-prep and /media-recap
Add pipeline health query to media commands — surface stalled transcriptions automatically
Make get-social-schedule.js compute distribution gaps live from Sanity article count vs. HubSpot broadcast count
Investigate Failure 2 — why 3 post-Mar-25 episodes still stalled (Gemini quota? Audio download? Function timeout?)