Capability Report: Content Vault

Capability Report: Content Vault

Date: 2026-04-08 Author: Q (Quality System) Project: Unified Canonical Architecture (Layers 2-4) Status: Operational


What Was Built

The Content Vault is a 56MB SQLite database at /mnt/d/data/content-vault.db that serves as the canonical record of everything in the Value-First system. It stores 4,478 records across 42 tables representing 23 entity types, organized through the Four Unified Views (Customer Health, Company Health, Context Health, Community Health).

What Problem It Solves

Before the Content Vault, agents answered questions like "how many episodes does AI Daily have?" or "which shows is Chris a host on?" by querying multiple systems independently -- Sanity for CMS records, the filesystem for session files, HubSpot for contacts, YouTube for episodes. Each system held a partial picture. No system could answer cross-system questions ("does this show exist in Sanity AND YouTube AND the filesystem?"). Agents guessed at file paths, grepped the codebase for operational knowledge, and produced conflicting counts depending on which system they queried first.

The Content Vault eliminates this by providing a single queryable surface across all systems. An agent queries one database instead of five APIs.

Schema: 42 Tables, 23 Entity Types

Core Entity Tables (31 tables serving the data model)

Unified Customer View -- Customer Health (5 entity types):

Table Entity Records Source Systems
contributors Friend 116 Sanity contributors, .claude/agents/, people/ configs
clients Organization 34 clients/ filesystem, HubSpot Companies
signals Signal 0 HubSpot-primary (live queries)
sessions Session 351 clients/*/sessions/ filesystem, Upstash
events Event 0 HubSpot-primary (live queries)

Unified Revenue View -- Company Health (4 entity types):

Table Entity Records Source Systems
engagements Engagement 0 HubSpot-primary (Deals, Services)
investments Investment 0 HubSpot-primary (custom object 2-53994804)
offerings Offering 0 HubSpot-primary (Products)
deliverables Deliverable 0 HubSpot-primary (custom object 2-18484424)

Unified Business Context -- Context Health (7 entity types):

Table Entity Records Source Systems
methodology Methodology Content 52 Sanity (valueReality, trap, valuePathStage), Obsidian, skills/methodology/
episodes Episode (legacy) 1,086 YouTube API, Sanity, Mux
canonical_episodes Episode (canonical) 1,579 YouTube (VF + Profoundly channels), Sanity, HubSpot Events
articles Article 127 Sanity blogPost documents
plans Plan 51 docs/plans/ filesystem
reports Report 3 Leadership/reports/ filesystem
architecture_decisions Architecture Decision 0 To be populated from decision records
reference_content Reference Content 177 Sanity (tool, persona, sponsor, presentation, event, course, playbook)
educational_content Educational Content 152 Sanity (course, lesson, goldenPathQA)

Unified Team Enablement -- Community Health (6 entity types):

Table Entity Records Source Systems
skills Skill 157 skills/ filesystem
commands Command 73 .claude/commands/ filesystem
scripts Script 86 scripts/ filesystem
shows Show 42 Sanity show documents, YouTube playlists
assets Asset 0 To be populated from filesystem + Sanity CDN
tasks Task 0 HubSpot-primary (Tasks)

Relationship and Index Tables (5 tables)

Table Purpose Records
contributor_shows Many-to-many: which contributors appear on which shows 96
episode_methodology Many-to-many: which episodes discuss which methodology concepts 0
cross_references Generic cross-entity references 0
transcripts Raw transcript index (client session transcripts) 296
hubspot_contacts / hubspot_listings / hubspot_projects HubSpot record caches 0

Infrastructure Tables (6 tables)

Table Purpose
content_fts FTS5 virtual table for full-text search (1,987 indexed records)
content_fts_config FTS5 internal
content_fts_content FTS5 internal
content_fts_data FTS5 internal
content_fts_docsize FTS5 internal
content_fts_idx FTS5 internal
vault_meta Schema version tracking (currently v3)
sync_log Sync operation audit trail
youtube_comments YouTube comment data
youtube_live_chat YouTube live chat data
sqlite_sequence SQLite auto-increment tracking

Population: How Data Gets In

Data enters the Content Vault through two paths:

  1. Batch population scripts -- scripts/canonical/populate-methodology.js and similar scripts scan source systems (Sanity CMS, YouTube API, filesystem directories) and write records to the Vault. This was the initial seeding mechanism during Layer 4 (FR-5) of the architecture.

  2. Vault agent (ongoing) -- The Vault agent owns all SQLite writes. When agents process new content (transcripts, episodes, articles), Vault creates or updates the corresponding records.

Tables with zero records (signals, events, engagements, investments, offerings, deliverables, tasks, assets) are intentionally empty. These entities are HubSpot-primary -- their source of truth is live HubSpot data queried through MCP or CLI. The Vault tables exist as the schema for future caching or batch intelligence, not as replacements for the live system.

FTS5 Search Capability

The content_fts virtual table provides full-text search across 1,987 records using SQLite's FTS5 extension. The table indexes five columns:

  • source_type -- Entity type (episode, session, transcript, skill, article)
  • source_id -- Primary key reference back to the source table
  • title -- Searchable title
  • body_text -- Full content text
  • tags -- Associated tags/categories

Current FTS distribution:

Source Type Indexed Records
episode 1,056
session 351
transcript 296
skill 157
article 127

FTS5 enables queries like "find all content mentioning NetSuite" across every entity type in a single query, ranked by relevance. This complements Upstash Vector (semantic search over transcripts/sessions) with structured full-text search over the entire Vault.

Consumers: Who Uses the Content Vault

9 Canonical Query Tools (scripts/canonical/)

Every query tool reads from the Content Vault as its primary data source:

  • show-query.js -- Show roster, cross-system alignment
  • episode-query.js -- Episode lifecycle, recording pipeline status
  • article-query.js -- Publishing cadence, author attribution
  • friend-query.js -- People intelligence (human + AI, cross-system)
  • methodology-query.js -- Framework content alignment, drift detection
  • organization-query.js -- Client/company cross-system view
  • customer-health-query.js -- Session history, relationship signals
  • company-health-query.js -- Revenue entity health
  • unified-audit.js -- Full 23-entity census

25+ Slash Commands (rewired in Layer 5)

Commands across all three leaders now consume Vault data through canonical query tools instead of direct system queries. These include daily operations (/daily-ops, /daily-recap), media commands (/media-prep, /media-recap, /show-prep, /show-health), relationship commands (/relationship-brief, /relationship-pulse, /sentinel-check, /meeting-prep), revenue commands (/revenue-brief, /investment-brief), planning commands (/weekly-plan, /weekly-review), BU briefs (/academy-brief, /collective-brief, /apps-brief, /commerce-brief), and content commands (/content-multiply, /media-brief).

Agents

Any agent can query the Vault through canonical tools or direct SQLite queries. Key consumers include Vault (write operations), Catalog (indexing), Beacon (operational output), Prelude/Encore (media intelligence), Sentinel (relationship monitoring), and Pulse (health scoring).

How to Verify It Works

# Check database exists and has current schema
sqlite3 /mnt/d/data/content-vault.db "SELECT * FROM vault_meta;"
# Expected: schema_version|3

# Check record counts
sqlite3 /mnt/d/data/content-vault.db "SELECT 'Total: ' || COUNT(*) FROM (
  SELECT id FROM articles UNION ALL SELECT id FROM episodes
  UNION ALL SELECT id FROM sessions UNION ALL SELECT slug FROM methodology
  UNION ALL SELECT id FROM skills UNION ALL SELECT id FROM commands
);"

# Check FTS5 works
sqlite3 /mnt/d/data/content-vault.db "SELECT source_type, title FROM content_fts WHERE content_fts MATCH 'value path' LIMIT 5;"

# Run unified audit for full census
node /mnt/d/Projects/value-first-operations/scripts/canonical/unified-audit.js

What This Enables

  1. Cross-system alignment detection -- Every canonical query tool has an --audit flag that compares Vault records against their source systems (Sanity, YouTube, filesystem). Discrepancies surface immediately.

  2. Single-query intelligence -- "How many active shows do we have?" is one query, not three API calls to Sanity + YouTube + filesystem.

  3. Entity lifecycle tracking -- Episodes have recording_lifecycle (scheduled through distributed). Sessions have synthesis_status (raw through intelligence_extracted). These lifecycle columns enable pipeline health monitoring.

  4. Foundation for remaining FRs -- FR-7 (Rendered Surface Query Layer), FR-8 (Agent Query Interface), FR-9 (Surface Refactoring), and FR-11 (Functional Validation) all depend on the Content Vault as the canonical data source.

Ownership

  • Write authority: Vault agent (all SQLite writes)
  • Schema authority: V (architecture decisions)
  • Audit authority: Q (quality verification)
  • Location: /mnt/d/data/content-vault.db
  • Schema documentation: /mnt/d/Projects/value-first-operations/docs/plans/unified-canonical-data-model.md