Corrective Action Report — Showcase Orphan Polling Loops

Corrective Action Report — Showcase Orphan Polling Loops

Date: 2026-04-30 Owner: Q (Quality System manager) Reporter: Marquee (Media BU Lead) Domain: apps/website + apps/media — Showcase agent operating pattern Severity: Medium (UX clutter, masked concurrency status, wasted compute, did not corrupt deliverables) Status: Diagnosed. Corrective actions proposed. Awaiting Q sign-off + agent-definition update.


1. Incident Summary

During the Another Orange Morning launch promo work (2026-04-30, 07:50–09:15 CT), Chris observed 5 simultaneous "background tasks" in the Claude Code harness UI while only one agent was actually doing meaningful work. Investigation revealed 4 of the 5 were orphan until <condition>; do sleep N; done polling loops spawned by prior Showcase agent invocations and never cleaned up. A 5th instance of the same pattern was caught in real-time during diagnosis, confirming the pattern as recurring rather than incidental.

Total instances observed across the session: 6 orphan polling loops from approximately 8 Showcase invocations on this day's launch work.

2. Detection

Detection chain:

  1. Chris noticed 5 background tasks in his harness UI while only one Showcase agent was meant to be active
  2. Marquee inspected /tmp/claude-1000/.../tasks/ and ps -ef to enumerate running shells
  3. Identified five processes matching the pattern until <condition>; do sleep N; done parented to the Claude Code orchestrator
  4. Killed the five orphans
  5. Caught a sixth instance spawn during the same Showcase agent's continuing run — providing real-time evidence

Chris's framing on detection: "5 background tasks ... please troubleshoot... why this is taking so long." The orphans were not directly blocking Showcase's progress, but they made the operational state of the system illegible — Chris could not distinguish active work from orphans.

3. Evidence — The Six Orphan Patterns Captured

All six orphans match the same anti-pattern: until <condition>; do sleep N; done; <verification>. Each was spawned by a Showcase invocation as a "wait for X to complete" mechanism for deploy/build verification.

# Pattern Spawned for
1 until [ -f .../bddngeath.output ] && grep -q "Portal.*has back link" ... Polling another agent's output file for success marker
2 until curl ... grep -q "aom-launch-page"; do sleep 20 Polling production URL for deployed page identifier
3 until [ "$(curl ... = "302" ]; do sleep 15 Polling for auth-gated route to redirect
4 `until [ -f .../bcfzyv0r1.output ] && grep -qE "(Complete error
5 until [ "$(curl ... wc -c)" -gt "1000" ]; do sleep 20 Polling for deploy by content length heuristic
6 until grep -q "Finished in" /tmp/.../bc9uo81wo.output; do sleep 10; done; tail -5 .../bc9uo81wo.output Polling a backgrounded pnpm build's output for completion marker

All six were until infinite loops with no max-iterations guard. Each was spawned via Bash with the loop body running until its condition matched. When the parent Showcase agent's main task completed and the agent terminated, the polling loop was not explicitly killed — it became orphaned (reparented to PID 1 / init) and continued running until either (a) its condition naturally matched, or (b) manual intervention.

4. Root Cause

The Showcase agent operating pattern uses until <condition>; do sleep N; done infinite loops as a deploy/build/verification wait mechanism. The pattern is fundamentally orphan-prone in this harness for three compounding reasons:

Cause 1 — Infinite-loop construct with no cleanup contract. until ... do ... done runs forever by design. There is no max-iterations bound, no timeout, no explicit cleanup hook tied to the parent agent's lifecycle. The loop has no signal that "the agent that wanted me has moved on."

Cause 2 — Indirect verification via output-file polling. Several orphans (#1, #4, #6) polled another task's output transcript file for completion markers like "Finished in" or "✓ built". This double-indirection means Showcase ran a second background task to monitor a first background task. When the Showcase agent terminated, both the original task and its watcher orphaned independently.

Cause 3 — Redundant local-build verification overlapping with Vercel's deploy. Pattern #6 was spawned after the commit and push had already landed on Vercel. The auto-deploy completed in under 3 minutes; Showcase's local pnpm build for verification took ~85 minutes total agent runtime, redundant with Vercel's server-side build. The redundancy multiplied the orphan-spawning surface area: Showcase wanted to verify locally what Vercel was already verifying server-side, and used a polling pattern to detect local completion.

The root cause statement: Showcase uses unbounded until ... do sleep N; done infinite loops as a deploy verification mechanism, and these loops have no lifecycle binding to the Showcase agent that spawned them. When the agent terminates, the loops orphan to init and continue polling for conditions that may have already resolved or will never match.

5. Impact

Dimension Impact
Deliverable correctness None. Both routes (/launch and /promo) deployed correctly. The pattern wastes time but doesn't corrupt output.
User-visible latency High. The Showcase invocation that landed the /promo mirror took ~85 minutes for work that should have completed in 5–10 minutes (refactor + new page + commit + push).
Operational legibility High. UI showed "5 active background tasks" with 4 of them polling defunct conditions. Chris could not trust the operational state to reflect actual work.
Compute waste Low to moderate. Each orphan loop polls every 10–20 seconds indefinitely. Network + filesystem I/O scales with orphan count.
Cumulative drift Material. Orphans accumulate across sessions until WSL VM restart. A multi-week session could accumulate dozens of orphan loops.
Trust in the agent layer Moderate-high. The pattern was invisible to the agent's "completed cleanly" report — Showcase reported success while leaving processes running.

6. Corrective Actions

6.1 — Forbidden pattern (Showcase agent definition)

Update .claude/agents/showcase.md to add a forbidden-patterns section:

NEVER spawn until <condition>; do sleep N; done infinite polling loops in Bash invocations. These loops orphan to init when Showcase terminates, polluting the harness background-tasks tracker and consuming compute indefinitely. Use one of the three sanctioned alternatives below.

6.2 — Sanctioned alternatives for "wait for X" patterns

Alternative A — Synchronous Bash call (preferred). When Showcase needs the result of a build or deploy, run it as a foreground Bash call (no run_in_background). Bash blocks until the process exits and returns the exit code. No polling required.

# Correct — sync, returns exit code, no orphan risk
pnpm --filter website build

Alternative B — Bounded loop with explicit max iterations. When polling is genuinely required (e.g., waiting for a deploy to propagate to a CDN), use a for loop with hard bounds:

# Correct — max 30 iterations × 10s = 5 min hard cap, breaks on success
for i in {1..30}; do
  if curl -sI "$URL" | grep -q "200 OK"; then
    echo "URL live after $((i * 10))s"
    break
  fi
  sleep 10
done

Alternative C — Monitor tool. The harness provides a Monitor tool that streams events from a background process and auto-cleans when the parent agent terminates. Use this for any case where Showcase would otherwise want to watch a run_in_background task's output stream.

6.3 — Forbidden redundancy

Showcase must not run a local pnpm build for verification after a commit has been pushed to a Vercel-deploying branch. Vercel will perform the same build server-side; the local build is redundant, slow, and the source of orphan-loop spawning. Local builds belong only as a pre-push gate (sync, in foreground, before git push), never as post-push verification.

6.4 — Reporting requirement

Showcase's agent-completion summary must explicitly state: "No background polling loops remain." If any background process was started during the agent's run, Showcase must explicitly kill it before reporting completion. This makes the orphan-or-not state visible at the agent boundary instead of hidden in the harness UI.

7. Preventive Actions

7.1 — Session-end hook

Add a PostStop hook (or session-end hook in .claude/hooks/) that scans for orphan until loops parented to init or to terminated agent processes. Hook output: list of orphan PIDs and their command lines. Hook action: log a warning (do not auto-kill — leave to human or Marquee to confirm).

Hook implementation candidate:

#!/bin/bash
# .claude/hooks/check-orphan-polls.sh
ORPHANS=$(ps -ef | awk '$3 == 1 && /until / && /do sleep/ {print $2}')
if [ -n "$ORPHANS" ]; then
  echo "[orphan-poll-check] WARNING: orphan polling loops detected:"
  ps -p $ORPHANS -o pid,args
fi

7.2 — Agent definition cross-audit

Aegis or Hone audits all agent definitions in .claude/agents/ for the same anti-pattern. Showcase is the documented offender; other agents may share the pattern. Specifically check: any agent that runs builds, deploys, or polls external URLs.

7.3 — Conventions doc update

Add a section to wiki/conventions.md § Engineering standards:

Forbidden polling pattern. No agent or script may spawn until <condition>; do sleep N; done infinite loops. Use bounded for loops, sync Bash calls, or the harness Monitor tool. The infinite-loop form orphans on agent termination and pollutes the harness state.

7.4 — simplify skill enhancement

Update the simplify skill (or equivalent code-review skill) to flag until ... do sleep N; done patterns in any committed shell scripts. This catches the pattern at code-review time instead of at runtime.

8. Verification

The corrective actions are verified when:

  • .claude/agents/showcase.md has the forbidden-pattern section + three alternatives codified
  • Aegis or Hone has audited all agent definitions and confirmed no other agents use the pattern (or has identified them as additional CAR scope)
  • wiki/conventions.md has the forbidden polling pattern in engineering standards
  • The session-end hook is installed and running
  • Next Showcase invocation that involves a build or deploy is observed cleanly — no orphan loops in ps -ef after the agent reports completion
  • Chris can run consecutive Showcase invocations on apps/website without seeing a growing background-tasks count in the UI

9. Evidence Files

  • This CAR — docs/quality/cars/2026-04-30-corrective-showcase-orphan-polling-loops.md
  • Session transcript with 6 orphan patterns captured — orchestrator session 4d0f08e0-a69f-46ee-a3df-3482b412c5fe
  • ps -ef snapshots taken during diagnosis (orphans parented to PID 1, child sleeps reparented to init after parent kill)
  • Showcase final completion message confirming both routes verified — task af1b5d32dccdf08c1 returned 2026-04-30 ~09:15 CT after ~85 minutes runtime, 103 tool calls
  • Three commits pushed to main during the session: 2ce5a13ee, aa693ad6f, abf9cc7eb

10. Sign-Off

Role Name Status
Reporter Marquee Drafted 2026-04-30
Owner Q Pending review
Implementer Aegis (agent definition update), Hone (cross-audit), Squire (hook installation) Pending assignment
Reviewer Chris Carolan (Advisory Committee) Pending review

Drafted by Marquee under direction from Chris during AOM launch promo work, 2026-04-30. Forwarded to Q for QMS publishing through /corrective-action-report workflow.