Corrective Action: Agent Tool-Grant Staleness Pattern
Date: 2026-05-08 Owner: Q Severity: Medium Category: Cross-Cutting Pattern (Agent Definition Maintenance) Source: Aegis observation
docs/quality/audits/2026-05-08-trtu-session-agent-capability-gaps.md
Incident Summary
In a single TRTU build session on 2026-05-08, three distinct agent capability gaps surfaced. Marshal was briefed to delegate 14 HubSpot task updates through Ledger; lacking the Agent tool in its grant, Marshal executed the writes directly via scripts/hubspot/api.js -- a structural Ledger gateway bypass. Tide was briefed to produce the M1 Interest progression spec; lacking Write and Edit tools, Tide returned the spec content as text and required V to write the file -- an upstream content relay through the orchestrating leader. Showcase looped twice on npx vercel ls calls to discover a deployed URL despite the production URL being a fixed known constant; the prohibition was added to MEMORY.md mid-session but is not present in Showcase's agent definition, so it will not survive future spawns. Substantive outputs in all three cases were correct; the friction is structural.
Pattern (Why This Is a CAR)
Agent definitions encode two things that govern execution: tool grants (what the agent can do) and operational rules (what the agent must or must not do). When V briefs an agent for a task, V relies on the assumption that the agent's tool grant and operational rules are sized to the actual scope of the task. That assumption was valid when each agent definition was written. It is no longer valid because operational scope expands over time -- Marshal is now asked to coordinate gateway writes, Tide is now asked to produce design deliverables, Showcase is now asked to operate against deployed Vercel infrastructure -- and there is no trigger that causes those agent definitions to be re-evaluated against the expanded scope.
The result, observed in a single session: one governance bypass (Marshal -> Ledger), one upstream content relay (Tide -> V), one operational loop on a known-bad pattern (Showcase Vercel CLI). Three distinct failure modes, one root cause: agent-definition maintenance is not triggered by operational signals. This is a maintenance gap, not a design gap. The architecture (Agent tool, gateway pattern, MEMORY.md feedback channel) is correct. The enforcement surface (the agent definitions themselves) drifts behind it.
Root Cause Analysis
5-Whys:
- Why did Marshal bypass Ledger, Tide relay through V, and Showcase loop on Vercel CLI? Because each agent's definition does not match its current operational scope -- Marshal lacks the Agent tool, Tide lacks Write/Edit, Showcase lacks an explicit Vercel CLI prohibition.
- Why are the definitions out of date? Because the operational scope each agent is asked to perform has expanded since the definition was last reviewed.
- Why hasn't anyone updated the definitions? Because no process triggers a review when scope expands. Definitions are written at agent creation and only revisited on ad-hoc maintenance.
- Why is there no trigger? The QMS process register has no lightweight "agent definition maintenance" process. Aegis owns agent definitions but is currently invoked when Chris notices a problem, not when the system surfaces an operational signal that the definition is stale.
- Why does the system not surface operational signals? Operational signals exist (gateway bypass logs, "I produced content but couldn't write it" patterns in agent return text, repeated MEMORY.md feedback entries about the same agent), but no process consumes them as triggers for definition review.
Root cause: Agent-definition maintenance is reactive and human-prompted, not signal-triggered. Three observable operational signals exist that should trigger a review but currently do not.
Proposed Corrective Action
Add a new lightweight process to docs/quality/process-register.md: Agent Definition Maintenance Trigger.
Triggers (any one fires the process):
- An agent produces a gateway bypass within its defined scope (e.g., Marshal writing HubSpot directly while briefed to spawn Ledger). Detection: Ledger gateway enforcer hook output, or session-level observation by V/Aegis.
- An agent returns work product as text and requests that the orchestrating leader write the file (e.g., Tide producing spec content for V to persist). Detection: agent return text contains "please write this file" / "for V to write" / equivalent relay language.
- An agent loops on a known-bad operational pattern (3+ recurrences in a session, or any recurrence of a pattern already filed in
~/.claude/projects/-mnt-d/memory/). Detection: matching against the memory-feedback file index.
Action when a trigger fires:
- Aegis (agent-definition owner) audits the agent's tool grant and operational rules against the actual operational scope that produced the trigger.
- Aegis files a one-line update to the agent's definition file at
.claude/agents/{name}.md-- adding a tool to the grant, adding an explicit operational rule, or both. - If a memory-feedback file is the right home for the rule (session-scoped pattern that does not need to live in the agent definition), Aegis writes that file instead and links it from the agent definition.
- Update is verified by re-running the operation that triggered it. The original failure must not reproduce.
Owner: Aegis. Auditor: Q. Cadence: trigger-based, not scheduled. Risk tier: Tier 3 (Standard) -- the process governs other processes' enforcement surface, not customer-facing delivery.
Verification Plan
30-day review (2026-06-07):
- Have any of the 3 known incidents recurred? (Marshal Ledger bypass, Tide upstream relay, Showcase Vercel CLI loop.) If yes, the agent-definition fix did not hold.
- Have any new incidents of the same pattern surfaced? (Different agent, same failure class.) Catalog them.
- How many agent-definition updates were triggered by the new process in the first 30 days? Below 3 = pattern is converging or trigger is too narrow. Above 10 = trigger is too sensitive or there is a deeper coordination issue.
90-day review (2026-08-06):
- Cumulative count of agent-definition updates triggered by the new process.
- Recurrence rate of any single agent (Marshal updated 3 times in 90 days = sign that scope is still expanding faster than maintenance).
- Whether the trigger detection is working (operational signals being matched, or being missed because no one is watching).
Acceptance Criteria for Closing This CAR
- Agent Definition Maintenance Trigger added to
docs/quality/process-register.mdwith owner (Aegis), auditor (Q), and the three triggers documented above. - The 3 known incidents have a tracked fix path: Marshal Agent tool added (
.claude/agents/marshal.md), Tide Write/Edit tools added (.claude/agents/tide.md), Showcase Vercel CLI prohibition added (.claude/agents/showcase.md). Each is a separate one-line definition update; Aegis owns implementation. - 30-day review completed showing no recurrence of the 3 known incidents.
- The process is verified in operation: at least one trigger has fired and produced a definition update during the 30-day window, OR the team confirms zero qualifying triggers occurred (which itself is a signal worth recording).
References
- Aegis observation:
docs/quality/audits/2026-05-08-trtu-session-agent-capability-gaps.md - Memory-feedback file already saved (Showcase pattern):
~/.claude/projects/-mnt-d/memory/feedback_no_vercel_cli_lookups.md - Related governance bypass CAR:
docs/quality/cars/2026-03-16-corrective-ledger-bypass-slash-commands.md(precedent for Ledger bypass class) - Governance Activation Rule precedent:
docs/quality/cars/2026-04-25-corrective-dewey-registrar-activation.md(documented-but-not-wired pattern) - QMS framework:
docs/quality/qms-framework.md - Verification protocol:
docs/quality/verification-protocol.md
Out of Scope (Tracked Separately)
- Implementation of the 3 known fixes (Aegis owns the definition updates).
- Adding the Agent Definition Maintenance Trigger to
docs/quality/process-register.md(separate file edit, next round). - 30-day and 90-day reviews (future work).
- Audit of other agent definitions for similar gaps between assigned scope and granted tools (Aegis observation calls this out as pattern propagation work).