Skip to content

ADR-19: Hierarchical Spec Document Schema

🇰🇷 한국어 버전

DateAuthorRepos
2026-01-12@specvitalinfra, worker, web

Context

The existing spec_view_cache table was a flat key-value store designed for simple AI conversion result caching. The Document View feature requires:

  1. Business domain-based hierarchical organization aligned with BDD/Specification concepts
  2. Level-independent queries (fetch domains without loading all behaviors)
  3. Test case traceability back to source analysis results
  4. Content-hash-based caching with AI model version awareness

The flat cache structure cannot represent the natural hierarchy of specification documents: Domain → Feature → Behavior.

Requirements

RequirementDescription
Hierarchical RepresentationDomain → Feature → Behavior structure matching BDD concepts
Level-Independent QueriesFetch domains for overview without loading all behaviors
Cascade DeletionAnalysis deletion propagates through entire document tree
Test TraceabilityLink behaviors back to source test_cases for navigation
Cache EfficiencyPrevent redundant AI API calls via content-hash keying
Model VersioningDifferent AI model versions produce different documents

Constraints

ConstraintImpact
PostgreSQL BackendMust use relational schema patterns
Existing Analysis SchemaForeign key to analyses table required
sqlc Code GenerationNo VIEWs, inline JOINs in queries preferred

Decision

Adopt a 4-table normalized hierarchical schema aligned with BDD specification structure.

sql
-- Level 0: Document (per analysis)
CREATE TABLE spec_documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  analysis_id UUID NOT NULL REFERENCES analyses(id) ON DELETE CASCADE,
  content_hash BYTEA NOT NULL,
  language VARCHAR(10) NOT NULL DEFAULT 'en',
  executive_summary TEXT,
  model_id VARCHAR(100) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  CONSTRAINT uq_spec_documents_hash_lang_model UNIQUE (content_hash, language, model_id)
);

-- Level 1: Domain (business classification)
CREATE TABLE spec_domains (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  document_id UUID NOT NULL REFERENCES spec_documents(id) ON DELETE CASCADE,
  name VARCHAR(255) NOT NULL,
  description TEXT,
  sort_order INTEGER NOT NULL DEFAULT 0,
  classification_confidence NUMERIC(3,2),
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Level 2: Feature (functional grouping)
CREATE TABLE spec_features (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  domain_id UUID NOT NULL REFERENCES spec_domains(id) ON DELETE CASCADE,
  name VARCHAR(255) NOT NULL,
  description TEXT,
  sort_order INTEGER NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Level 3: Behavior (leaf test specifications)
CREATE TABLE spec_behaviors (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  feature_id UUID NOT NULL REFERENCES spec_features(id) ON DELETE CASCADE,
  source_test_case_id UUID REFERENCES test_cases(id) ON DELETE SET NULL,
  original_name VARCHAR(2000) NOT NULL,
  converted_description TEXT NOT NULL,
  sort_order INTEGER NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Key Design Decisions

DecisionRationale
content_hash + language + model_id unique constraintCache key for deduplication; same content with different language/model produces different documents
classification_confidence at domain level onlyAI assigns domains during Phase 1 classification; features are grouped deterministically
source_test_case_id with SET NULL on deleteMaintains traceability while allowing test_case cleanup without breaking spec documents
sort_order per levelPreserves AI-assigned ordering for consistent UI rendering
No VIEWssqlc generates type-safe Go code; inline JOINs preferred

Table Relationships

spec_documents (document level)

    │ content_hash + language + model_id → unique
    │ analysis_id → FK to analyses (CASCADE delete)

    └──► spec_domains (business domain classification)

            │ document_id → FK to spec_documents (CASCADE delete)
            │ classification_confidence → AI confidence score

            └──► spec_features (feature grouping)

                    │ domain_id → FK to spec_domains (CASCADE delete)

                    └──► spec_behaviors (individual test behaviors)

                            │ feature_id → FK to spec_features (CASCADE delete)
                            │ source_test_case_id → FK to test_cases (SET NULL)

Options Considered

Option A: Hierarchical 4-Table Normalized Structure (Selected)

Four tables with proper foreign key relationships representing Document → Domain → Feature → Behavior hierarchy.

ProsCons
Level-independent queriesMore complex queries requiring JOINs
Proper FK constraints with cascade deletion4 tables increase schema maintenance surface
BDD/Specification concept alignmentINSERT requires 4 sequential operations
Statistics via aggregate JOINsOrdering requires sort_order column per level
Test case traceability via FK

Option B: Single Denormalized Table

All hierarchy levels in one table with nullable parent columns and item_type discriminator.

ProsCons
Simple schema (1 table)Cannot enforce level-specific constraints
Easy writes (single INSERT)No type safety for domain vs feature fields
Cannot query "all domains" efficiently
Recursive CTE required for hierarchy traversal

Rejected: Cannot represent hierarchy properly; no level-specific queries possible without complex filtering.

Option C: JSON Column Storage

Store entire document as JSON blob in single table.

ProsCons
Schema flexibilityCannot query domains/features independently
Single row per documentNo FK constraints to test_cases
Natural for document storageDifficult aggregations (count by domain)
JSON path queries less efficient than relational

Rejected: Eliminates relational query capabilities; no independent domain/feature statistics possible.

Option D: 2-Table Structure (Document + Behaviors)

Only top-level document and leaf behaviors, losing intermediate hierarchy.

ProsCons
Simpler than 4 tablesLoses domain/feature as first-class entities
Direct document-to-behavior relationshipCannot fetch distinct domains efficiently
Domain/feature counts require GROUP BY
No domain-level metadata (confidence, description)

Rejected: Loses domain/feature classification context; full document scan required for statistics.

Consequences

Positive

AreaBenefit
Query FlexibilityFetch domains for overview, expand to features on demand
BDD AlignmentSchema mirrors specification document mental model
Test Traceabilitysource_test_case_id FK enables "view source" navigation
Cascade DeletionDELETE analysis → document → domains → features → behaviors
Cache Efficiency(content_hash, language, model_id) prevents redundant AI calls
StatisticsCOUNT/GROUP BY at each level without materialized views
Type Safetysqlc generates distinct types per table

Negative

AreaTrade-offMitigation
Query ComplexityMulti-table JOINs for full document retrievalsqlc named queries encapsulate JOINs
Write Complexity4 INSERTs per document within transactionSingle transaction, batch INSERTs
Schema Surface4 tables to maintain, migrate, indexClear table responsibilities
Orderingsort_order column at each levelAI pipeline assigns sort_order
Confidence Asymmetryclassification_confidence only at domain levelFeature confidence addable if needed

Indexes

IndexPurpose
idx_spec_documents_analysisFast lookup by analysis_id
idx_spec_domains_document_sortOrdered domain retrieval
idx_spec_features_domain_sortOrdered feature retrieval
idx_spec_behaviors_feature_sortOrdered behavior retrieval
idx_spec_behaviors_sourcePartial index for test case traceability

References

Open-source test coverage insights