Corrective Action: scripts/managed-agents/upload-agents.js Missing Update Path
Date: 2026-05-11 Category: Platform Tooling Defect — Anthropic Managed-Agent provisioning script Impact: The
upload-agents.jsscript can create Managed Agents on the Anthropic Agent API but cannot update them in place. Every re-upload of an existing agent requires manual workaround. During the 2026-05-10/11 pack stream, Aegis re-uploaded 4 agents (Vigil, Horizon, Klaxon, Tuner) and discovered the script had no update path — workaround used was unclean. V flagged this for Hone; Q is recording it as a CAR because the cleaner workaround pattern needs to be canonized and the script needs a structural fix. Resolution Status: Open. Stream-closing re-upload of 4 agents completed via workaround. Sentinel (the 5th) is blocked by a separate constraint (bundle cap — see sibling CAR). Workaround pattern is documented below; permanent script fix is the canonical resolution.
Incident
What Happened
scripts/managed-agents/upload-agents.js (per Leadership/reports/ADR-2026-04-21-option-1-system-prompt-injection.md) provisions VFT agents to the Anthropic Managed-Agent surface. The script reads data/managed-agents/agents.json, assembles per-agent system prompts from the registry, and POSTs them to the Anthropic Agent API.
During the 2026-05-10/11 pack stream, Aegis patched 45 V1 agent Startup Protocols (commit 90e594033). Five of those 45 agents are deployed to the Anthropic Managed-Agent surface and required re-upload to pick up the patches:
| Agent | Re-upload outcome | Workaround used |
|---|---|---|
| Vigil | Updated | Manual delete-then-create via API |
| Horizon | Updated | Manual delete-then-create via API |
| Klaxon | Updated | Manual delete-then-create via API |
| Tuner | Updated | Manual delete-then-create via API |
| Sentinel | Blocked at upload | Separate constraint — bundle exceeds 100k cap (sibling CAR) |
The script's behavior: upload-agents.js checks whether an agent with the same ID exists on the Anthropic side. If it does, the script errors out ("Agent already exists; cannot create") and exits. There is no --update flag, no automatic detection of "this is a re-upload, use PUT/PATCH instead of POST," and no documented manual update path.
The workaround Aegis used: manually delete the existing agent via the Anthropic Agent API DELETE endpoint, then re-run upload-agents.js which now succeeds because the agent doesn't exist.
Why This Matters
Three issues compound:
Issue 1 — Delete-then-create breaks any external reference to the agent ID. If anything outside VFT's monorepo holds a reference to the agent ID (a webhook handler, a chained spawn config, a Slack integration), the delete invalidates that reference. The new agent has the same name but a new ID. This is the same pattern as "delete the user and re-create them" — usually a bug.
Issue 2 — Delete-then-create creates a window of unavailability. Between the DELETE and the successful POST, the agent does not exist on the surface. If anything was attempting to spawn it during the gap, the spawn fails. For Tier 1 cadenced agents (Pulse, Sentinel) this could mean a missed cycle. Not a problem during the May 11 stream because re-uploads ran outside cadence windows — but the pattern is fragile.
Issue 3 — Manual API calls outside the script bypass any logging/audit trail the script provides. Aegis's batch report cannot record "I called DELETE on agent X at HH:MM" with the same structure as "I uploaded agent X at HH:MM." The audit trail for this stream's re-uploads is partly outside Aegis's standard output.
Timeline
| Time | Event |
|---|---|
| 2026-04-21 | ADR-2026-04-21 specs Option 1 system-prompt injection via Managed Agents. upload-agents.js written for initial provisioning. |
| 2026-04-21 → 2026-05-11 | Multiple Managed Agent re-uploads happen with the same workaround (manual delete-then-create). The defect is known but not formally tracked. |
| 2026-05-10/11 | Pack stream patches Startup Protocols for 45 agents; 5 require Anthropic re-upload. |
| 2026-05-11 | Aegis re-uploads 4 successfully (workaround), 1 blocked by bundle cap (separate CAR). |
| 2026-05-11 | V flags the missing update path for Hone in the closeout doc. |
| 2026-05-11 | Q files this CAR per V's question: "deserves a CAR or just a ticket?" Q's call: CAR — defects affecting Tier 1 agent provisioning warrant the formal lifecycle even when severity is low. |
Root Cause
Primary Mechanism
upload-agents.js was written to handle the first-time provisioning case (per ADR-2026-04-21). The update case was not in the initial scope. As the agent roster has evolved through pack work, agent definition refinements, and methodology updates, re-upload has become the more common case — but the script never received the update path.
The Anthropic Agent API supports update operations (PUT or PATCH on the agent resource). The script could be extended to detect "agent X already exists" and route to the update endpoint instead of failing.
Secondary Mechanisms
S1 — No registry tracking of "this agent's deployed version vs registry version" exists. Aegis cannot tell from agents.json alone whether the deployed Anthropic version is in sync with the local registry. A deployed_version_hash or similar field in agents.json would let Aegis know "this agent needs re-upload" or "this agent is current."
S2 — The "create" failure on existing agent is the only signal. The current workflow is: try to create, fail, manually intervene. A pre-flight check ("does agent X exist on the surface? If yes, this is an update operation; route accordingly") would catch the case before the failed create.
S3 — Pack stream provoked the issue because it was the first batch re-upload. Single re-uploads pre-stream were rare enough that the workaround was tolerable. Five re-uploads in one batch (with one blocked by an adjacent bundle-cap constraint, requiring extra attention) revealed the friction.
What This Is Not
This is not Aegis's defect. Aegis used the cleanest available workaround given the script's limitations.
This is not an Anthropic API limitation. Anthropic supports the update operation; VFT's script doesn't call it.
This is not blocking. The workaround works. It's just unclean and accumulates risk over time.
Immediate Fix
Stream-closing re-uploads completed via workaround. Sentinel is blocked by a separate constraint, not this one.
For the next re-upload before the permanent fix:
The workaround pattern is canonical for now:
- Call DELETE on the existing agent via direct Anthropic API call (with explicit audit note in Aegis's batch report citing this CAR).
- Run
upload-agents.jsto re-create. - Record the delete-and-recreate cycle in Aegis's batch report.
This is fragile but acceptable for low-frequency updates.
Permanent Prevention
| Action | Owner | Mechanism | Timeline |
|---|---|---|---|
P1 — Add update path to upload-agents.js |
Hone (script ownership) → Squire (implementation) | Detect "agent X already exists." If yes, route to the Anthropic update endpoint (PUT/PATCH). If no, use the existing create path. Preserve agent ID. | Next pack stream — not urgent but escalates if more re-uploads accumulate |
P2 — Add deployed_version_hash field to agents.json |
Hone + Aegis | Track each agent's last-deployed bundle hash. Aegis's pre-deploy check can identify which agents need re-upload (hash mismatch) vs. which are current. | Same change as P1 |
| P3 — Add pre-flight existence check | Squire | Before any upload, query the Anthropic API for "does agent X exist?" — route correctly. | Same change as P1 |
P4 — Document the canonical re-upload flow in Leadership/reports/ADR-2026-04-21-*.md follow-up note |
V (ADR ownership) + Hone | Append to the ADR: "Re-uploads use the update path. Delete-then-create is deprecated. Aegis batch report records hash-changed agents." | After P1 ships |
| P5 — Bundle the pre-flight bundle-size check from the Sentinel CAR (P1) with this work | Squire | Both changes touch upload-agents.js. Ship as one PR. Bundle-size pre-check + update-path detection share the same pre-flight discipline. |
Same PR |
Verification
This CAR's permanent prevention is VERIFIED when:
upload-agents.jscorrectly updates an existing Managed Agent without delete-then-create (P1)- Agent IDs persist across re-uploads (P1)
agents.jsoncarriesdeployed_version_hashper agent (P2)- Aegis batch report shows hash-changed agents pre-deploy (P2)
- Pre-flight existence check is in place (P3)
- ADR follow-up documents the canonical flow (P4)
Until P1 ships: this CAR remains OPEN. The workaround continues to be used as the canonical re-upload pattern.
Related
docs/quality/audits/2026-05-11-pack-stream.md § Required Actions— Q's audit listed this as a CAR-worthy item per V's open questiondocs/quality/cars/2026-05-11-corrective-sentinel-bundle-cap.md— sibling CAR; both affectupload-agents.jspre-flight discipline (P5 bundles them)Leadership/reports/ADR-2026-04-21-option-1-system-prompt-injection.md— the ADR that established the upload pathscripts/managed-agents/upload-agents.js— the script with the missing update pathdata/managed-agents/agents.json— the registry that should carrydeployed_version_hash