Capability Report: Content Vault

Date: 2026-04-08 Author: Q (Quality System) Project: Unified Canonical Architecture (Layers 2-4) Status: Operational

What Was Built

The Content Vault is a 56MB SQLite database at /mnt/d/data/content-vault.db that serves as the canonical record of everything in the Value-First system. It stores 4,478 records across 42 tables representing 23 entity types, organized through the Four Unified Views (Customer Health, Company Health, Context Health, Community Health).

What Problem It Solves

Before the Content Vault, agents answered questions like "how many episodes does AI Daily have?" or "which shows is Chris a host on?" by querying multiple systems independently -- Sanity for CMS records, the filesystem for session files, HubSpot for contacts, YouTube for episodes. Each system held a partial picture. No system could answer cross-system questions ("does this show exist in Sanity AND YouTube AND the filesystem?"). Agents guessed at file paths, grepped the codebase for operational knowledge, and produced conflicting counts depending on which system they queried first.

The Content Vault eliminates this by providing a single queryable surface across all systems. An agent queries one database instead of five APIs.

Schema: 42 Tables, 23 Entity Types

Core Entity Tables (31 tables serving the data model)

Unified Customer View -- Customer Health (5 entity types):

Table	Entity	Records	Source Systems
`contributors`	Friend	116	Sanity contributors, `.claude/agents/`, `people/` configs
`clients`	Organization	34	`clients/` filesystem, HubSpot Companies
`signals`	Signal	0	HubSpot-primary (live queries)
`sessions`	Session	351	`clients/*/sessions/` filesystem, Upstash
`events`	Event	0	HubSpot-primary (live queries)

Unified Revenue View -- Company Health (4 entity types):

Table	Entity	Source Systems
`engagements`	Engagement	HubSpot-primary (Deals, Services)
`investments`	Investment	HubSpot-primary (custom object 2-53994804)
`offerings`	Offering	HubSpot-primary (Products)
`deliverables`	Deliverable	HubSpot-primary (custom object 2-18484424)

Unified Business Context -- Context Health (7 entity types):

Table	Entity	Records	Source Systems
`methodology`	Methodology Content	52	Sanity (valueReality, trap, valuePathStage), Obsidian, `skills/methodology/`
`episodes`	Episode (legacy)	1,086	YouTube API, Sanity, Mux
`canonical_episodes`	Episode (canonical)	1,579	YouTube (VF + Profoundly channels), Sanity, HubSpot Events
`articles`	Article	127	Sanity blogPost documents
`plans`	Plan	51	`docs/plans/` filesystem
`reports`	Report	3	`Leadership/reports/` filesystem
`architecture_decisions`	Architecture Decision	0	To be populated from decision records
`reference_content`	Reference Content	177	Sanity (tool, persona, sponsor, presentation, event, course, playbook)
`educational_content`	Educational Content	152	Sanity (course, lesson, goldenPathQA)

Unified Team Enablement -- Community Health (6 entity types):

Table	Entity	Records	Source Systems
`skills`	Skill	157	`skills/` filesystem
`commands`	Command	73	`.claude/commands/` filesystem
`scripts`	Script	86	`scripts/` filesystem
`shows`	Show	42	Sanity show documents, YouTube playlists
`assets`	Asset	0	To be populated from filesystem + Sanity CDN
`tasks`	Task	0	HubSpot-primary (Tasks)

Relationship and Index Tables (5 tables)

Table	Purpose	Records
`contributor_shows`	Many-to-many: which contributors appear on which shows	96
`episode_methodology`	Many-to-many: which episodes discuss which methodology concepts	0
`cross_references`	Generic cross-entity references	0
`transcripts`	Raw transcript index (client session transcripts)	296
`hubspot_contacts` / `hubspot_listings` / `hubspot_projects`	HubSpot record caches	0

Infrastructure Tables (6 tables)

Table	Purpose
`content_fts`	FTS5 virtual table for full-text search (1,987 indexed records)
`content_fts_config`	FTS5 internal
`content_fts_content`	FTS5 internal
`content_fts_data`	FTS5 internal
`content_fts_docsize`	FTS5 internal
`content_fts_idx`	FTS5 internal
`vault_meta`	Schema version tracking (currently v3)
`sync_log`	Sync operation audit trail
`youtube_comments`	YouTube comment data
`youtube_live_chat`	YouTube live chat data
`sqlite_sequence`	SQLite auto-increment tracking

Population: How Data Gets In

Data enters the Content Vault through two paths:

Batch population scripts -- scripts/canonical/populate-methodology.js and similar scripts scan source systems (Sanity CMS, YouTube API, filesystem directories) and write records to the Vault. This was the initial seeding mechanism during Layer 4 (FR-5) of the architecture.
Vault agent (ongoing) -- The Vault agent owns all SQLite writes. When agents process new content (transcripts, episodes, articles), Vault creates or updates the corresponding records.

Tables with zero records (signals, events, engagements, investments, offerings, deliverables, tasks, assets) are intentionally empty. These entities are HubSpot-primary -- their source of truth is live HubSpot data queried through MCP or CLI. The Vault tables exist as the schema for future caching or batch intelligence, not as replacements for the live system.

FTS5 Search Capability

The content_fts virtual table provides full-text search across 1,987 records using SQLite's FTS5 extension. The table indexes five columns:

source_type -- Entity type (episode, session, transcript, skill, article)
source_id -- Primary key reference back to the source table
title -- Searchable title
body_text -- Full content text
tags -- Associated tags/categories

Current FTS distribution:

Source Type	Indexed Records
episode	1,056
session	351
transcript	296
skill	157
article	127

FTS5 enables queries like "find all content mentioning NetSuite" across every entity type in a single query, ranked by relevance. This complements Upstash Vector (semantic search over transcripts/sessions) with structured full-text search over the entire Vault.

Consumers: Who Uses the Content Vault

9 Canonical Query Tools (`scripts/canonical/`)

Every query tool reads from the Content Vault as its primary data source:

show-query.js -- Show roster, cross-system alignment
episode-query.js -- Episode lifecycle, recording pipeline status
article-query.js -- Publishing cadence, author attribution
friend-query.js -- People intelligence (human + AI, cross-system)
methodology-query.js -- Framework content alignment, drift detection
organization-query.js -- Client/company cross-system view
customer-health-query.js -- Session history, relationship signals
company-health-query.js -- Revenue entity health
unified-audit.js -- Full 23-entity census

25+ Slash Commands (rewired in Layer 5)

Commands across all three leaders now consume Vault data through canonical query tools instead of direct system queries. These include daily operations (/daily-ops, /daily-recap), media commands (/media-prep, /media-recap, /show-prep, /show-health), relationship commands (/relationship-brief, /relationship-pulse, /sentinel-check, /meeting-prep), revenue commands (/revenue-brief, /investment-brief), planning commands (/weekly-plan, /weekly-review), BU briefs (/academy-brief, /collective-brief, /apps-brief, /commerce-brief), and content commands (/content-multiply, /media-brief).

Agents

Any agent can query the Vault through canonical tools or direct SQLite queries. Key consumers include Vault (write operations), Catalog (indexing), Beacon (operational output), Prelude/Encore (media intelligence), Sentinel (relationship monitoring), and Pulse (health scoring).

How to Verify It Works

# Check database exists and has current schema
sqlite3 /mnt/d/data/content-vault.db "SELECT * FROM vault_meta;"
# Expected: schema_version|3

# Check record counts
sqlite3 /mnt/d/data/content-vault.db "SELECT 'Total: ' || COUNT(*) FROM (
  SELECT id FROM articles UNION ALL SELECT id FROM episodes
  UNION ALL SELECT id FROM sessions UNION ALL SELECT slug FROM methodology
  UNION ALL SELECT id FROM skills UNION ALL SELECT id FROM commands
);"

# Check FTS5 works
sqlite3 /mnt/d/data/content-vault.db "SELECT source_type, title FROM content_fts WHERE content_fts MATCH 'value path' LIMIT 5;"

# Run unified audit for full census
node /mnt/d/Projects/value-first-operations/scripts/canonical/unified-audit.js

What This Enables

Cross-system alignment detection -- Every canonical query tool has an --audit flag that compares Vault records against their source systems (Sanity, YouTube, filesystem). Discrepancies surface immediately.
Single-query intelligence -- "How many active shows do we have?" is one query, not three API calls to Sanity + YouTube + filesystem.
Entity lifecycle tracking -- Episodes have recording_lifecycle (scheduled through distributed). Sessions have synthesis_status (raw through intelligence_extracted). These lifecycle columns enable pipeline health monitoring.
Foundation for remaining FRs -- FR-7 (Rendered Surface Query Layer), FR-8 (Agent Query Interface), FR-9 (Surface Refactoring), and FR-11 (Functional Validation) all depend on the Content Vault as the canonical data source.

Ownership

Write authority: Vault agent (all SQLite writes)
Schema authority: V (architecture decisions)
Audit authority: Q (quality verification)
Location: /mnt/d/data/content-vault.db
Schema documentation: /mnt/d/Projects/value-first-operations/docs/plans/unified-canonical-data-model.md