Skip to content

ADR-12: Phase 2 Behavior Cache

🇰🇷 한국어 버전

DateAuthorRepos
2026-01-24@KubrickCodeworker, infra

Context

Phase 2 Cost Problem

The AI-Based Spec Document Generation Pipeline (ADR-14) uses a two-phase approach:

PhaseModelCostPurpose
Phase 1gemini-2.5-flash$0.30/1MTest classification by domain
Phase 2gemini-2.5-flash-lite$0.10/1MTest name → behavior conversion

While Phase 1 results are cached at the document level via content_hash, Phase 2 involves per-test AI calls that are expensive when:

  • Same test file is analyzed across multiple commits
  • Similar test names appear across different repositories
  • Re-analysis triggered by parser version updates or user requests

Caching Opportunity

Test behavior descriptions have high cache reusability:

ScenarioExample
Same test, different commitstest_user_login in commit A and B produces identical behavior
Cross-repository similaritytestAuthentication behavior is language/framework agnostic
Re-analysisParser upgrade doesn't change behavior semantics

Cache Key Design Challenge

Problem: What uniquely identifies a test behavior?

ApproachProsCons
Test name onlyMaximum reuseIgnores context (same name, different test)
Test name + file pathContext-awarePath changes invalidate cache
Content hash of testExact matchNo reuse across similar tests
Semantic fingerprintCaptures intentComplex, requires additional AI call

Decision

Implement a PostgreSQL-backed behavior cache using composite key (test_name_hash, language, model_id) with TTL-based expiration.

Table Schema

sql
CREATE TABLE behavior_caches (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    test_name_hash TEXT NOT NULL,         -- SHA-256 of normalized test name
    language VARCHAR(10) NOT NULL,        -- en, ko, etc.
    model_id VARCHAR(100) NOT NULL,       -- gemini-2.5-flash-lite
    behavior_description TEXT NOT NULL,   -- Cached AI output
    confidence DECIMAL(3,2) NOT NULL,     -- 0.00-1.00
    created_at TIMESTAMPTZ DEFAULT NOW(),
    expires_at TIMESTAMPTZ NOT NULL,      -- TTL expiration
    hit_count INTEGER DEFAULT 0,          -- Usage tracking

    CONSTRAINT behavior_caches_unique
        UNIQUE (test_name_hash, language, model_id)
);

-- Index for lookup
CREATE INDEX idx_behavior_caches_lookup
    ON behavior_caches (test_name_hash, language, model_id)
    WHERE expires_at > NOW();

-- Index for cleanup
CREATE INDEX idx_behavior_caches_expiry
    ON behavior_caches (expires_at);

Cache Key Strategy

test_name_hash = SHA256(normalize(test_name))

normalize(test_name):
  1. Lowercase conversion
  2. Remove common prefixes (test_, it_, describe_, should_)
  3. Strip special characters and numbers
  4. Collapse whitespace

Examples:

Original Test NameNormalizedHash (truncated)
test_user_can_loginuser can logina3f2...
TestUserCanLoginuser can logina3f2...
it('should allow user to login')allow user loginb7c1...
describe('User Login')user loginc9e4...

Cache Lookup Flow

┌─────────────────────────────────────────────────────────────────┐
│                    Phase 2 Processing                            │
├─────────────────────────────────────────────────────────────────┤
│  For each test in feature:                                       │
│                                                                  │
│  1. Compute test_name_hash                                       │
│                                                                  │
│  2. Cache lookup:                                                │
│     SELECT behavior_description, confidence                      │
│     FROM behavior_caches                                         │
│     WHERE test_name_hash = ? AND language = ? AND model_id = ?   │
│       AND expires_at > NOW()                                     │
│                                                                  │
│  3. If HIT:                                                      │
│     - Increment hit_count                                        │
│     - Return cached behavior (skip AI call)                      │
│     - No quota consumption                                       │
│                                                                  │
│  4. If MISS:                                                     │
│     - Call Gemini API                                            │
│     - Store result with TTL                                      │
│     - Consume quota                                              │
└─────────────────────────────────────────────────────────────────┘

TTL Configuration

TierTTLRationale
Free7 daysLimited storage, high churn
Pro30 daysStandard business retention
Pro Plus90 daysExtended caching value
Enterprise180 daysMaximum cost optimization

Integration with Quota System

Per ADR-13, cache hits do not consume quota:

go
type Phase2Result struct {
    Behavior   string
    Confidence float64
    FromCache  bool  // If true, quota not consumed
}

// In usage tracking
if !result.FromCache {
    quotaService.RecordUsage(ctx, userID, QuotaTypeSpecView)
}

Options Considered

Option A: PostgreSQL Table (Selected)

Database-backed cache with TTL and hit tracking.

Pros:

  • Integrated with existing PostgreSQL infrastructure
  • Queryable for analytics (hit rates, popular tests)
  • Automatic cleanup via scheduled job
  • Transactional consistency with spec document writes

Cons:

  • Database load for high-volume lookups
  • Storage costs for large caches

Option B: Redis Cache

In-memory cache with automatic expiration.

Pros:

  • Sub-millisecond lookups
  • Native TTL support
  • Reduced database load

Cons:

  • Additional infrastructure (not currently in stack)
  • Cache loss on Redis restart
  • Memory cost scaling with cache size

Option C: Document-Level Caching Only

Rely on existing spec_documents.content_hash caching.

Pros:

  • No new infrastructure
  • Already implemented

Cons:

  • No reuse for similar tests across repositories
  • Full Phase 2 re-run on any test change
  • Misses cross-document optimization opportunity

Option D: No Additional Caching

Accept Phase 2 costs as operational expense.

Pros:

  • Simplest implementation
  • No cache invalidation complexity

Cons:

  • Higher API costs at scale
  • Slower response times for repeated tests
  • Poor cost efficiency for high-volume users

Consequences

Positive

AreaBenefit
Cost Reduction40-60% Phase 2 API cost savings (estimated)
Response TimeCache hits avoid 1-2s AI latency per test
Quota FairnessCache hits don't consume user quota
AnalyticsHit rate metrics enable cache tuning
Cross-RepoSimilar tests across repos share cached behavior

Negative

AreaTrade-off
StorageCache table grows with unique test diversity
Stale Data RiskCached behavior may not reflect model updates
Normalization ErrorsAggressive normalization may cause false matches
Cleanup OverheadScheduled job required for TTL enforcement

Technical Notes

  • Cache warming: Not implemented; cache builds organically
  • Invalidation: Model version change invalidates via model_id in key
  • Conflict resolution: First writer wins; concurrent writes are rare
  • Monitoring: Log cache hit/miss ratio per analysis

Configuration

BEHAVIOR_CACHE_ENABLED=true
BEHAVIOR_CACHE_DEFAULT_TTL=30d
BEHAVIOR_CACHE_CLEANUP_SCHEDULE=0 4 * * *  # 4 AM UTC daily
BEHAVIOR_CACHE_CLEANUP_BATCH_SIZE=5000

References

Open-source test coverage insights