Corrective Action Report: Inngest Transcription Pipeline Never Worked

Corrective Action Report: Inngest Transcription Pipeline Never Worked

Date: 2026-03-30 Reporter: V (COO) + Marquee (Media BU Leader) Severity: Critical Status: Open — documented, queued for resolution


Incident

During /media-recap on March 29, Chris asked why the Tech Stack Reimagined episode page showed "Generating transcript..." with a permanent spinner. Investigation revealed this is not an isolated issue — 36 episodes dating back to January 2026 are stuck at transcriptionStatus: "processing" with zero transcript content. Zero episodes have ever been successfully transcribed via the Inngest pipeline. The feature has never worked in production.

Every episode page on valuefirstteam.com that has a Mux recording shows a permanent "Transcription in progress..." spinner that will never resolve.

Root Cause

The Pipeline Architecture

The transcription system is an Inngest function chain:

  1. Trigger: Mux webhook (/api/webhooks/mux.ts) fires video.asset.ready event
  2. Event: Inngest receives transcription/requested event with episodeId, assetId, playbackId
  3. Workflow (src/lib/inngest/functions/transcription.ts):
    • Download audio from Mux
    • Upload to Gemini File API
    • Wait for Gemini processing
    • Transcribe via Gemini
    • Generate summary + key points
    • Save results to Sanity
  4. Result: Episode gets transcriptionStatus: "completed", transcript, summary, keyPoints

What Actually Happens

The Mux webhook sets transcriptionStatus: "processing" on the Sanity episode, then fires the Inngest event. The Inngest function either:

  • Never receives the event (Inngest configuration issue)
  • Receives it but fails silently (Gemini API, file size, Vercel function timeout)
  • Fails with retries exhausted but never sets status to "failed"

The onFailure handler in the transcription workflow should set transcriptionStatus: "failed", but 36 episodes are stuck at "processing" — suggesting the failure handler itself is not executing, or the function is never being invoked at all.

Why It Was Invisible

  1. No monitoring: No alert fires when transcription stays in "processing" beyond a reasonable window
  2. Optimistic UI: The episode page shows a spinner for "processing" status indefinitely — there is no timeout or fallback
  3. Silent failure pattern: The try/catch in the Inngest function swallows errors. The onFailure handler exists but apparently doesn't run.
  4. Previously identified but not resolved: The March 25 CAR (corrective-media-prep-audience-blindness.md, line 37) noted "Transcriptions are stalled (Inngest not configured — see separate finding)" — but no separate finding was ever written and no fix was attempted.

Impact

  • 36 episodes showing permanent "Generating transcript..." spinners since January 2026
  • Zero transcripts have ever been generated via this pipeline
  • Every episode page with a Mux recording displays a broken loading state to visitors
  • Content intelligence gap: No transcripts means no summaries, no key points, no searchable content from recordings
  • Visitor experience: Anyone clicking an episode sees a professional page with a working video player... and a broken spinner that never resolves

Scope of Affected Episodes

Period Count Examples
Jan 2026 15 AI Daily (Jan 13-29), Data, Commerce, Delivery, Measurement, Value Path
Feb 2026 11 AI Daily (Feb 4-19), Data, HubSpot Help Line, Wake Up, Unified Views
Mar 2026 3 AI Daily (Mar 27), Data (Mar 24), Tech Stack Reimagined (Mar 29)
Total 36 All with transcriptionStatus: "processing", all with zero transcript

Note: Only 1 episode (from July 2024) has hasTranscript: true with transcriptionStatus: "processing" — likely had a transcript from a different source.

Required Investigation

Before fixing, the following must be diagnosed:

  1. Is Inngest receiving events? Check Inngest dashboard (https://app.inngest.com) for the transcription/requested event history. If events exist, check function run logs for errors.

  2. Is the Gemini API key valid? GEMINI_API_KEY must be in Vercel env vars. The transcription function uses GoogleGenerativeAI and GoogleAIFileManager from @google/generative-ai.

  3. Vercel function timeout? Transcribing a 60-153 minute audio file through Gemini may exceed Vercel's function execution limit (10s on Hobby, 60s on Pro, 300s on Enterprise). The Inngest step architecture should handle this via step functions, but if any single step exceeds the timeout, it fails.

  4. Mux audio download? Step 1 downloads audio from Mux. If the Mux download URL requires authentication that's missing in production, it would fail silently.

  5. Gemini File API upload size? 153-minute recordings could be hundreds of MB. Gemini File API has a 2GB limit, but Vercel's /tmp storage is limited to 512MB.

Corrective Actions

Immediate: Fix the Episode Page UX

The page must not show a permanent spinner. Two options:

  • Option A: If transcriptionStatus === "processing" and episode publishedAt is more than 2 hours ago, show "Transcript unavailable" instead of the spinner
  • Option B: Remove the spinner entirely — show transcript section only when transcriptionStatus === "completed" and actual transcript content exists

Immediate: Reset Stuck Episodes

Batch-update all 36 stuck episodes to transcriptionStatus: "failed" or "none" so the UI stops showing broken spinners:

// Sanity mutation: reset all stuck episodes
*[_type == "episode" && transcriptionStatus == "processing" && !defined(transcript)]
  → set transcriptionStatus to "none"

Short-term: Diagnose Inngest Pipeline

  1. Check Inngest dashboard for function execution history
  2. Verify all required env vars exist in Vercel (GEMINI_API_KEY, INNGEST_EVENT_KEY, INNGEST_SIGNING_KEY, MUX_TOKEN_ID, MUX_TOKEN_SECRET, SANITY_API_TOKEN)
  3. Test the transcription workflow with a short episode (<10 min) to isolate the failure point
  4. Add a transcriptionStatus: "failed" fallback that actually fires

Medium-term: Add to Media Roadmap

Add transcription pipeline fix as a dependency for Marquee's M-1 (Content Integrity) or as a standalone brick. Until transcription works, the entire post-production pipeline is incomplete — recordings link but never finish.

Structural: Add Monitoring

  • Alert when any episode stays at transcriptionStatus: "processing" for >2 hours
  • Weekly scan: count episodes stuck in "processing" — if >0, flag

Related

  • 2026-03-25-corrective-media-prep-audience-blindness.md — First identification of stalled transcriptions (line 37), no resolution attempted
  • 2026-03-25-capability-audience-ready-episode-standard.md — Defines what "audience-ready" means; transcription is a component
  • Marquee M-1 brick: Content Integrity & Sanity Unification — should include transcription pipeline fix

Lessons

  1. "Processing" is not a terminal state. Any system that sets status to "processing" MUST have a mechanism to detect and recover from stuck processing. A timeout, a health check, a retry with backoff — something. "Processing" without a ceiling is a lie.

  2. Optimistic UI without verification is broken UI. The episode page trusts transcriptionStatus without checking whether actual content exists. Trust the data, not the status field.

  3. "See separate finding" is not a fix. The March 25 CAR identified this exact problem and deferred it. Deferral without a ticket, a brick, or an owner means it disappears. This CAR exists because the last one punted.


Filed by V. This is a Critical severity CAR because every episode page on the public website displays a broken loading state to visitors.