Let's Build

Scope, Access, Context: What We Found When We Audited Ourselves

An open conversation surfaced three levers for governing AI agents. Three weeks later, we applied them to our own 88-agent team and found exactly the gaps a working framework is supposed to find.

Baldwin
Baldwin
Author
7 min read
#AI agents #governance #self-audit #agent framework #AI-native operations

Frameworks usually come from a whiteboard. The good ones come from a conversation.

Late one afternoon in April, three people were on a live episode of Value-First Platform: AI Data Readiness, working through a tangled question: how do you govern AI agents that are trained to be helpful, when being helpful is exactly what gets them in trouble? The host, Trisha Merriam, had been listening โ€” really listening โ€” for nearly an hour as two practitioners described very different approaches to the same problem.

Erin Wiggers described what she called functional friction โ€” architecting agents so a supervisor knows its own limits and routes work downstream to specialists with the right tools. Chris Carolan described confidence thresholds, risk buckets, a sandbox-and-approval pattern, and the moments when an agent goes rogue inside a perfectly reasonable prompt. The conversation rolled forward the way good ones do: each person building on the last.

And then Trisha did the thing that turns a conversation into a framework. She synthesized it.

Value-First Platform: AI Data Readiness

Value-First Platform: AI Data Readiness - Apr 22, 2026

The exchange that produced the framework โ€” Trisha Merriam in conversation with Erin Wiggers and Chris Carolan.

How the framework emerged

Trisha put it back to the room: "I had no idea how this was all working before, but I think it's really coming together for me that you have the basic levers that you have are context, access, and scope, and so you're using those three things to put constraints around what any individual agent can do."

Chris paused, recognized what she'd just done, and named it back: "So I guess we'll be creating a resource pretty soon. Context, access, and scope."

That moment is worth holding still for a second. The framework didn't exist before Trisha said it out loud. It was implicit in everything Erin had described about distributed tools and supervisor agents, and everything Chris had described about confidence thresholds and risk tiers. But neither of them had named it. Trisha named it because she was listening for the pattern underneath the two answers, not for which answer was right. And once it was named, it was real.

What made it useful wasn't its novelty. It was that it gave us vocabulary for a problem we already knew we had โ€” a problem we'd already started writing about in our own corrective action reports, but hadn't yet learned to name across the whole team.

What we found when we looked

A few weeks later, we ran the framework against ourselves. Eighty-eight agents on our internal team โ€” each one with a definition file, a documented scope, a tool list, and a startup protocol that loads its context at the beginning of every run. We asked, agent by agent: does the scope match the access? Does the access match the data path? Does the context match what the roster says is loaded?

Three findings emerged, and none of them were the kind of thing you'd notice from the outside.

Context: thirty-one agents loading less than the roster said they had

Our internal roster lists, for each agent, an expertise pack โ€” a directory of skill files that defines the agent's domain knowledge. The roster said each agent loaded its pack at startup. The startup protocols, when we actually read them, told a different story. Thirty-one of eighty-eight agents had their pack listed in the roster but not wired into their startup. They were operating on enforcement rules and identity alone, drawing on training-data heuristics for the domain work the pack was supposed to provide.

The output looked fine. Briefs got written. Health scores got produced. But the agents weren't operating from the documented knowledge base โ€” they were operating from a sensible approximation of it. The roster said one thing; the wiring said another.

There was a foil in the same roster. One agent loaded five named pack files in its startup, explicitly, each one numbered. Same framework, different discipline. The contrast wasn't about scope or access โ€” both were fine on the others. It was that the discipline of actually loading what you've said you'll load had been quietly skipped on most of the roster, and that gap was invisible until we went looking.

Access: web tools without a data path

Four agents carried web-fetch and web-search tools in their permission lists. None of their documented data paths mentioned the open web. They read from our internal records and produced reports โ€” internal in, internal out. The web tools weren't doing harm. But they were a quiet over-grant: capability without justification, and capability without justification is how the next incident finds its surface area.

The Production Record was invisible

The third finding was the one we hadn't anticipated. When an AI agent created a record in our customer platform or published a document in our content system, the record carried timestamps, an owner ID, a revision number โ€” but no agent identity. Looking at any given record, you couldn't tell whether a person had written it or which agent had. The team's work was producing thousands of records a week. The team's authorship was invisible inside them.

What we did about it

All three findings have ship-evidence dated the same day we found them.

The unjustified web tools came off four agents. The four AI business-unit leaders who legitimately need web access โ€” for partner research, sponsorship monitoring, marketplace verification โ€” had that justification written into their definitions in plain language, so the next reviewer doesn't have to guess. Access, scoped to its actual data path.

The thirty-one missing expertise packs got wired into their respective startup protocols, one by one. Every pack path was verified to exist on disk before wiring. Then we added a check script โ€” a small piece of automation that, going forward, refuses to certify an agent for onboarding unless its pack is actually loaded. The certification process for new agents now runs the same check. The gap doesn't just close; it gets structurally blocked from coming back.

And the Production Record โ€” that one took the most work. A new custom property was added across sixteen objects in our customer platform. A new field was added across thirty-five document types in our content system. The two gateways through which all AI writes flow โ€” the HubSpot write gateway and the Sanity write gateway โ€” were updated to refuse any write that doesn't carry an explicit agent codename. Thirty-three agent definitions were updated to specify which name they stamp. Eight slash commands that issue writes directly got the same treatment. Every record an AI agent creates from now on carries the name of the agent that created it.

Why this matters beyond our roster

If you're building with AI agents โ€” your own custom GPTs, your Claude projects, a workforce of specialists, even a single assistant with a long instruction file โ€” you have a Scope, an Access, and a Context whether you've named them or not. The question isn't whether you have a framework. The question is whether the three levers are aligned with each other and with the work the agent is doing.

When we say an agent is failing, we usually mean one of three things. Either it's deciding something it shouldn't be deciding (Scope is too broad). Or it's reaching for something it shouldn't be able to reach (Access is over-granted). Or it doesn't know what it needs to know to do the work (Context is under-loaded). And the most common failure โ€” the one we found thirty-one times in our own roster โ€” is the third one, because it's the most invisible. An agent with the right scope and the right access can still produce thin work if it's drawing on generic patterns instead of its actual knowledge base.

What the framework doesn't promise is that you'll find no gaps. It promises something better: when you find one, you'll have language for what's wrong, and the fix won't be a guess.

Want to go deeper?

Join our FREE weekly Office Hours for live Q&A, or explore the Value Center to find content matched to your journey stage.