ADR-19: Hierarchical Spec Document Schema

🇰🇷 한국어 버전

Date	Author	Repos
2026-01-12	@specvital	infra, worker, web

Context

The existing spec_view_cache table was a flat key-value store designed for simple AI conversion result caching. The Document View feature requires:

Business domain-based hierarchical organization aligned with BDD/Specification concepts
Level-independent queries (fetch domains without loading all behaviors)
Test case traceability back to source analysis results
Content-hash-based caching with AI model version awareness

The flat cache structure cannot represent the natural hierarchy of specification documents: Domain → Feature → Behavior.

Requirements

Requirement	Description
Hierarchical Representation	Domain → Feature → Behavior structure matching BDD concepts
Level-Independent Queries	Fetch domains for overview without loading all behaviors
Cascade Deletion	Analysis deletion propagates through entire document tree
Test Traceability	Link behaviors back to source test_cases for navigation
Cache Efficiency	Prevent redundant AI API calls via content-hash keying
Model Versioning	Different AI model versions produce different documents

Constraints

Constraint	Impact
PostgreSQL Backend	Must use relational schema patterns
Existing Analysis Schema	Foreign key to analyses table required
sqlc Code Generation	No VIEWs, inline JOINs in queries preferred

Decision

Adopt a 4-table normalized hierarchical schema aligned with BDD specification structure.

sql

-- Level 0: Document (per analysis)
CREATE TABLE spec_documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  analysis_id UUID NOT NULL REFERENCES analyses(id) ON DELETE CASCADE,
  content_hash BYTEA NOT NULL,
  language VARCHAR(10) NOT NULL DEFAULT 'en',
  executive_summary TEXT,
  model_id VARCHAR(100) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  CONSTRAINT uq_spec_documents_hash_lang_model UNIQUE (content_hash, language, model_id)
);

-- Level 1: Domain (business classification)
CREATE TABLE spec_domains (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  document_id UUID NOT NULL REFERENCES spec_documents(id) ON DELETE CASCADE,
  name VARCHAR(255) NOT NULL,
  description TEXT,
  sort_order INTEGER NOT NULL DEFAULT 0,
  classification_confidence NUMERIC(3,2),
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Level 2: Feature (functional grouping)
CREATE TABLE spec_features (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  domain_id UUID NOT NULL REFERENCES spec_domains(id) ON DELETE CASCADE,
  name VARCHAR(255) NOT NULL,
  description TEXT,
  sort_order INTEGER NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Level 3: Behavior (leaf test specifications)
CREATE TABLE spec_behaviors (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  feature_id UUID NOT NULL REFERENCES spec_features(id) ON DELETE CASCADE,
  source_test_case_id UUID REFERENCES test_cases(id) ON DELETE SET NULL,
  original_name VARCHAR(2000) NOT NULL,
  converted_description TEXT NOT NULL,
  sort_order INTEGER NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Key Design Decisions

Decision	Rationale
`content_hash + language + model_id` unique constraint	Cache key for deduplication; same content with different language/model produces different documents
`classification_confidence` at domain level only	AI assigns domains during Phase 1 classification; features are grouped deterministically
`source_test_case_id` with SET NULL on delete	Maintains traceability while allowing test_case cleanup without breaking spec documents
`sort_order` per level	Preserves AI-assigned ordering for consistent UI rendering
No VIEWs	sqlc generates type-safe Go code; inline JOINs preferred

Table Relationships

spec_documents (document level)
    │
    │ content_hash + language + model_id → unique
    │ analysis_id → FK to analyses (CASCADE delete)
    │
    └──► spec_domains (business domain classification)
            │
            │ document_id → FK to spec_documents (CASCADE delete)
            │ classification_confidence → AI confidence score
            │
            └──► spec_features (feature grouping)
                    │
                    │ domain_id → FK to spec_domains (CASCADE delete)
                    │
                    └──► spec_behaviors (individual test behaviors)
                            │
                            │ feature_id → FK to spec_features (CASCADE delete)
                            │ source_test_case_id → FK to test_cases (SET NULL)

Options Considered

Option A: Hierarchical 4-Table Normalized Structure (Selected)

Four tables with proper foreign key relationships representing Document → Domain → Feature → Behavior hierarchy.

Pros	Cons
Level-independent queries	More complex queries requiring JOINs
Proper FK constraints with cascade deletion	4 tables increase schema maintenance surface
BDD/Specification concept alignment	INSERT requires 4 sequential operations
Statistics via aggregate JOINs	Ordering requires sort_order column per level
Test case traceability via FK

Option B: Single Denormalized Table

All hierarchy levels in one table with nullable parent columns and item_type discriminator.

Pros	Cons
Simple schema (1 table)	Cannot enforce level-specific constraints
Easy writes (single INSERT)	No type safety for domain vs feature fields
	Cannot query "all domains" efficiently
	Recursive CTE required for hierarchy traversal

Rejected: Cannot represent hierarchy properly; no level-specific queries possible without complex filtering.

Option C: JSON Column Storage

Store entire document as JSON blob in single table.

Pros	Cons
Schema flexibility	Cannot query domains/features independently
Single row per document	No FK constraints to test_cases
Natural for document storage	Difficult aggregations (count by domain)
	JSON path queries less efficient than relational

Rejected: Eliminates relational query capabilities; no independent domain/feature statistics possible.

Option D: 2-Table Structure (Document + Behaviors)

Only top-level document and leaf behaviors, losing intermediate hierarchy.

Pros	Cons
Simpler than 4 tables	Loses domain/feature as first-class entities
Direct document-to-behavior relationship	Cannot fetch distinct domains efficiently
	Domain/feature counts require GROUP BY
	No domain-level metadata (confidence, description)

Rejected: Loses domain/feature classification context; full document scan required for statistics.

Consequences

Positive

Area	Benefit
Query Flexibility	Fetch domains for overview, expand to features on demand
BDD Alignment	Schema mirrors specification document mental model
Test Traceability	`source_test_case_id` FK enables "view source" navigation
Cascade Deletion	DELETE analysis → document → domains → features → behaviors
Cache Efficiency	(content_hash, language, model_id) prevents redundant AI calls
Statistics	COUNT/GROUP BY at each level without materialized views
Type Safety	sqlc generates distinct types per table

Negative

Area	Trade-off	Mitigation
Query Complexity	Multi-table JOINs for full document retrieval	sqlc named queries encapsulate JOINs
Write Complexity	4 INSERTs per document within transaction	Single transaction, batch INSERTs
Schema Surface	4 tables to maintain, migrate, index	Clear table responsibilities
Ordering	sort_order column at each level	AI pipeline assigns sort_order
Confidence Asymmetry	classification_confidence only at domain level	Feature confidence addable if needed

Indexes

Index	Purpose
`idx_spec_documents_analysis`	Fast lookup by analysis_id
`idx_spec_domains_document_sort`	Ordered domain retrieval
`idx_spec_features_domain_sort`	Ordered feature retrieval
`idx_spec_behaviors_feature_sort`	Ordered behavior retrieval
`idx_spec_behaviors_source`	Partial index for test case traceability

References

ADR-14: AI-Based Spec Document Generation Pipeline
ADR-13: Billing and Quota Architecture
Worker ADR-08: SpecView Worker Binary Separation
Commit 38a33ad - feat(db): replace spec_view_cache with hierarchical spec document schema

ADR-19: Hierarchical Spec Document Schema ​

Context ​

Requirements ​

Constraints ​

Decision ​

Key Design Decisions ​

Table Relationships ​

Options Considered ​

Option A: Hierarchical 4-Table Normalized Structure (Selected) ​

Option B: Single Denormalized Table ​

Option C: JSON Column Storage ​

Option D: 2-Table Structure (Document + Behaviors) ​

Consequences ​

Positive ​

Negative ​

Indexes ​

References ​