Skip to content

ADR-09: Phase 1 V1 Quality Enhancement Architecture

🇰🇷 한국어 버전

DateAuthorRepos
2026-02-04@KubrickCodeworker

Context

Problem Statement

The AI-based SpecView generation pipeline (ADR-14) uses Gemini for test classification, but raw LLM output exhibits quality issues that degrade user experience:

Quality IssueDescriptionUser Impact
Domain Naming VarianceSame concept appears as "Auth", "Authentication", "AuthService"Fragmented domain groupings
Orphaned TestsTests returned without domain assignmentMissing specifications
Abbreviation Inconsistency"db", "DB", "Database" treated as different domainsDuplicate domains in output
Structural ErrorsOccasional malformed JSON responsesPipeline failures
Quality Blind SpotsNo metrics on classification qualityUnable to measure improvements

Experimental Alternatives

Two alternative architectures were explored before returning to enhanced V1:

ArchitectureApproachOutcome
V2 Two-StageSeparate domain extraction followed by classificationAbandoned - complexity overhead without proportional quality gain
V3 Sequential BatchProcess batches sequentially with anchor propagationAbandoned - latency increase outweighed consistency benefits

Both were disabled via feature flags (SPECVIEW_PHASE1_V2=false, SPECVIEW_PHASE1_V3=false) and code has been removed.

Requirements

RequirementDescription
Backward CompatibleMust layer on existing ADR-14 pipeline
Low Latency ImpactPost-processing overhead < 100ms per chunk
ObservableProvide metrics for quality monitoring
DeterministicSame input produces consistent domain assignment

Decision

Enhance the V1 single-pass classification architecture with a post-processing pipeline that validates, normalizes, and recovers classification results.

Post-Processing Pipeline

LLM Classification Output


┌─────────────────────────────────────────────────────┐
│              Phase1PostProcessor                     │
├─────────────────────────────────────────────────────┤
│  1. JSON Validation        → Reject malformed       │
│  2. Domain Normalization   → Merge similar names    │
│  3. Abbreviation Expansion → auth → Authentication  │
│  4. Orphaned Detection     → Flag unclassified      │
│  5. Path-based Fallback    → Derive from file path  │
└─────────────────────────────────────────────────────┘


    Normalized Classification Result


┌─────────────────────────────────────────────────────┐
│           Quality Metrics Collector                  │
├─────────────────────────────────────────────────────┤
│  - Orphaned test count                              │
│  - Normalization frequency                          │
│  - Fallback usage rate                              │
│  - Domain distribution                              │
└─────────────────────────────────────────────────────┘

Component Responsibilities

ComponentResponsibilityTrigger
Phase1PostProcessorValidate and normalize LLM classification outputEvery classification response
Domain NormalizationMerge semantically equivalent domain namesOn domain name extraction
Domain Abbreviation ExpandExpand common abbreviations to full namesOn domain name extraction
Path-based FallbackDerive domain from test file path for orphaned testsWhen test has no domain
Orphaned Test DetectionIdentify and flag tests that failed classificationAfter normalization
Quality Metrics CollectorTrack classification quality metricsAfter each batch

Normalization Rules

Domain Name Merging:

VariantNormalized To
Auth, AuthService, AuthModuleAuthentication
User, UserService, UserMgmtUserManagement
Pay, Payments, PaymentServicePayment
DB, Database, DataAccessDatabase

Abbreviation Expansion:

AbbreviationExpanded
authAuthentication
dbDatabase
uiUserInterface
apiAPI
httpHTTP

Path-based Fallback Strategy:

test/services/payment/checkout_test.go


Extract: "payment" from path


Expand:  "Payment" domain

Options Considered

Option A: V1 with Post-Processing Enhancement (Selected)

Add validation, normalization, and fallback layers after LLM response while maintaining the single-pass classification architecture from ADR-14.

AspectAssessment
Architecture ImpactAdditive - no changes to core LLM pipeline
Latency+10-50ms per chunk (post-processing)
Quality ImprovementModerate - addresses naming consistency
Implementation6 focused components with clear responsibilities
RollbackFeature-flaggable per component

Selection Rationale:

  • V2/V3 experiments demonstrated that additional LLM calls did not proportionally improve quality
  • Post-processing addresses observed quality issues without API cost increase
  • Maintains proven reliability characteristics of V1 pipeline
  • Enables incremental improvement without architectural risk

Option B: V2 Two-Stage Taxonomy Architecture (Abandoned)

Separate domain extraction into a dedicated LLM call before classification.

AspectAssessment
API Calls2x (domain extraction + classification)
ComplexityHigh - two-phase coordination
Quality GainMinimal in testing
Cost Impact2x Gemini API costs for Phase 1
StatusAbandoned - SPECVIEW_PHASE1_V2=false

Rejection Rationale:

  • Empirical testing showed domain consistency did not improve proportionally
  • Doubled API costs without user-visible quality improvement
  • Added failure modes and debugging complexity

Option C: V3 Sequential Batch Architecture (Abandoned)

Process test batches sequentially with explicit context carryover between batches.

AspectAssessment
LatencyIncreased - sequential processing eliminates parallelism
ComplexityHigh - batch ordering and state management
Quality GainMarginal consistency improvement
StatusAbandoned - SPECVIEW_PHASE1_V3=false

Rejection Rationale:

  • Sequential processing significantly increased total processing time
  • Anchor propagation in V1 already provides cross-chunk consistency
  • Complexity not justified by quality improvement

Consequences

Positive

AreaBenefit
Domain ConsistencyNormalization eliminates duplicate domains from naming variance
Test CoveragePath-based fallback recovers orphaned tests
ObservabilityMetrics collector enables quality monitoring and regression detection
LatencyNo additional API calls - post-processing is local computation
MaintainabilityEach enhancement is isolated and independently testable
Architecture SimplicityAvoided V2/V3 complexity by improving existing pipeline

Negative

AreaTrade-off
Quality CeilingSingle-pass LLM classification limits maximum achievable quality
Normalization RulesManual curation required for domain synonym mappings
Path Fallback AccuracyFile path may not accurately reflect business domain
Metric OverheadQuality metrics add minor processing and storage overhead

Technical Implications

AspectImplication
Pipeline PositionPostProcessor runs synchronously after LLM response parsing
Error HandlingNormalization failures log warning but do not fail pipeline
Metrics StorageQuality metrics logged via structured logging (no database)
ConfigurationNormalization rules configurable without code change
TestingGolden snapshot tests for normalization behavior

References

Open-source test coverage insights