Corrective Action: Media Transcription Pipeline Silent Failure — 36 Episodes Stuck for 11 Weeks
Date: March 30, 2026 Category: Verification Failure + Missing Monitoring Impact: 36 episodes (Jan 13 – Mar 29) accumulated in
transcriptionStatus: "processing"without completing. No agent or system detected the stall. Transcription intelligence (summaries, key points, trap detection) missing from these episodes. Content multiplication pipeline operated on incomplete data. Resolution Time: ~30 minutes (batch reset). Root cause investigation ongoing.
Incident
What Happened
During the March 30 /media-prep briefing, Encore flagged that the Tech Stack Reimagined episode (Mar 29, 153 minutes with Nico) had been in transcriptionStatus: "processing" for over 24 hours. Investigation revealed this was not an isolated case — 36 episodes spanning January 13 through March 29, 2026 were stuck in the same state. The Inngest transcription workflow was receiving events (status was being set to "processing") but never completing. Meanwhile, 71 other episodes have completed transcriptions, indicating the pipeline works intermittently.
Contributing Factors
- Inngest credentials were only added to Vercel on March 25, 2026. The 33 episodes from Jan 13 – Mar 24 were set to "processing" by the Mux webhook handler but the Inngest workflow had no credentials to execute.
- No monitoring exists between webhook dispatch and workflow completion. The Mux webhook fires, sets status to "processing", sends an Inngest event, and returns 200. Nothing checks whether Inngest actually picks up and completes the workflow.
/media-recaponly checks yesterday's recordings when manually run. It does not scan for systemic pipeline stalls./media-prepmorning briefing did not surface transcription health. The dashboard shows individual episode transcript status but not pipeline-wide health.get-social-schedule.jscontained a hardcoded "26 undistributed articles" string (lines 180, 223) that was presented as live data, masking the actual distribution state.
Timeline
| Date | Event |
|---|---|
| Jan 13 | Earliest episode set to processing (AI Daily, Data) |
| Jan 13 – Mar 24 | 33 episodes accumulate in processing without Inngest credentials |
| Mar 25 | INNGEST_EVENT_KEY and INNGEST_SIGNING_KEY added to Vercel |
| Mar 24-29 | 3 more episodes enter processing post-credential-add (different failure mode) |
| Mar 30 05:58 | /media-prep run. Encore flags Tech Stack Reimagined stall |
| Mar 30 06:15 | Investigation reveals 36 total stuck episodes |
| Mar 30 06:25 | Batch reset: all 36 episodes set from processing → pending |
Root Cause
Two distinct failures:
Failure 1 (33 episodes, Jan 13 – Mar 24): The Mux webhook handler at apps/website/src/pages/api/webhooks/mux.ts successfully received video.asset.ready events and dispatched Inngest events via inngest.send(). However, the Inngest runtime on Vercel had no credentials (INNGEST_EVENT_KEY, INNGEST_SIGNING_KEY) until March 25. The inngest.send() call likely failed silently or the events were accepted but no function could authenticate to process them.
Failure 2 (3 episodes, Mar 24-29): After credentials were added, 3 more episodes still stalled. This indicates a secondary issue — possibly the Inngest function itself failing (missing GEMINI_API_KEY, Gemini API quota, audio download failure, or function timeout on long recordings like the 153-minute Tech Stack Reimagined).
Monitoring failure (all 36): No system checked whether the pipeline completed. The status was set to "processing" by the webhook handler and never updated because the downstream workflow never ran. The only detection mechanism — Encore checking for >24h stalls — is manual and session-dependent.
Category: Verification Failure + Missing Monitoring
This is a pipeline without a circuit breaker. The webhook handler returns success after dispatching the event, but the actual transcription happens asynchronously with no completion verification. When the async step fails, the pipeline enters a permanently stuck state that accumulates silently.
Additional finding: get-social-schedule.js contained hardcoded strings ("26 undistributed articles") on lines 180 and 223 that were presented as live data by the Broadcast agent. This is a data integrity issue — static strings in operational scripts that agents report as facts.
Fix Applied
Immediate Resolution
- Batch reset all 36 episodes from
transcriptionStatus: "processing"→"pending"via Sanity client - Removed hardcoded "26 undistributed articles" from
get-social-schedule.js - Updated
/media-prepcommand to include yesterday's on-demand audience-ready links (previously missing per Mar 24 feedback) - Updated
content-queue.json— "The Leads Trap in Disguise" status corrected fromdrafted→published
Code/Configuration Changes
| File | Change |
|---|---|
scripts/hubspot/get-social-schedule.js:180 |
Removed hardcoded "26 undistributed articles" from empty-schedule message |
scripts/hubspot/get-social-schedule.js:223 |
Removed hardcoded "26 undistributed articles" from gap-detected message |
.claude/commands/media-prep.md |
Added audience-ready verification for yesterday's episodes in schedule post section |
agents/content-multiplier/data/content-queue.json |
Updated "The Leads Trap in Disguise" status: drafted → published |
| 36 Sanity episodes | transcriptionStatus: processing → pending |
Verification
node scripts/sanity/query.js --query 'count(*[_type == "episode" && transcriptionStatus == "processing"])'
→ 0
node scripts/sanity/query.js --query 'count(*[_type == "episode" && transcriptionStatus == "pending"])'
→ (increased by 36)
Prevention Measures
Rules Added
| Layer | File | Rule |
|---|---|---|
| Critical Lessons | MEMORY.md |
NEVER trust async pipeline completion without monitoring. The Mux → Inngest → Gemini transcription pipeline ran for 11 weeks with 36 failures because no agent checked completion. Every async dispatch needs a completion verifier. |
| Critical Lessons | MEMORY.md |
NEVER hardcode counts or metrics in operational scripts. get-social-schedule.js had "26 undistributed articles" as a static string. Agents reported it as live data. Every number in an ops script must come from a query. |
| Operations | memory/operations.md |
Transcription pipeline status: 36 episodes reset to pending (Mar 30). Inngest credentials added Mar 25. Pipeline needs completion monitoring — no agent currently watches for stalled transcriptions. |
Detection Triggers
For /media-prep and /media-recap:
Add a pipeline health check: query count(*[_type == "episode" && transcriptionStatus == "processing" && dateTime(airDate) < dateTime(now()) - 86400]) — any episode in "processing" for >24 hours is a pipeline failure, not a pending job.
Structural Gaps Identified
- No codename for Content Pipeline orchestrator — runs
create-weekly-episodes.ts,run-pipeline.ts,check-shows.tsbut has no identity. Unnamed agents get less coherent behavior. - No codename for the Inngest transcription workflow — it's infrastructure, not an agent, but it needs monitoring ownership.
- No codename for Transcript Harvester or Transcript Backfill — both are manual batch tools with no agent identity.
- Encore only runs when Marquee spawns it — no autonomous stall detection.
- Content queue scan (
lastScanDate) has no freshness monitoring — went 13 days stale without detection.
Lessons
Every async pipeline needs a completion verifier. "Fire and forget" is acceptable for the dispatch — but something must check the other end. In this system, the Mux webhook handler returns 200 and moves on. Nothing asks "did the Inngest workflow finish?" The gap between dispatch and completion is where 36 episodes disappeared for 11 weeks.
Static strings in operational scripts are a particularly insidious form of data drift. Agents trust tool output as ground truth. When a script says "26 undistributed articles," the agent reports it as fact, the briefing presents it as intelligence, and humans act on fabricated data. Every number in an ops script must be computed from a live query.
Related Incidents
- Mar 8, 2026: Calendar invitees reported as attendees — a different form of "trusting upstream data without verification"
- Feb 18, 2026: HubSpot Broadcast API published 15 posts immediately — fire-and-forget without verification
- Mar 4, 2026: esbuild/tsx block comment failures — silent infrastructure failures that accumulate
- Mar 23, 2026: Background workers shut down because GitHub Actions reports never reached local filesystem — another case of a pipeline producing "success" while delivering nothing
Recommended Next Steps
- Assign codenames to all unnamed media pipeline agents (Content Pipeline, Transcript Harvester, Transcript Backfill)
- Build a transcription completion monitor — owned by a named agent, runs during
/media-prepand/media-recap - Add pipeline health query to media commands — surface stalled transcriptions automatically
- Make
get-social-schedule.jscompute distribution gaps live from Sanity article count vs. HubSpot broadcast count - Investigate Failure 2 — why 3 post-Mar-25 episodes still stalled (Gemini quota? Audio download? Function timeout?)