Skip to content

ADR-11: Data Retention-Based Cleanup Service

🇰🇷 한국어 버전

DateAuthorRepos
2026-02-02@KubrickCodeworker, infra

Context

Data Growth Problem

The Specvital platform accumulates temporal data across multiple tables:

Data TypeTableGrowth PatternRetention Need
Analysis Historyuser_analysis_historyPer-commit analysisTier-based retention
Spec Documentsspec_documentsPer-SpecView generationTier-based retention
AnalysesanalysesShared analysis recordsOrphan cleanup
Usage Eventsusage_eventsPer-operation (ADR-13)Audit + cleanup
Quota Reservationsquota_reservationsPer-concurrent requestShort-lived (TTL)

Without active cleanup, storage costs grow unbounded and query performance degrades.

Retention Policy Requirements

Subscription plans define tier-based retention periods:

TierRetention PeriodRationale
Free30 daysCost control
Pro90 daysStandard business needs
Pro Plus180 daysExtended history
EnterpriseUnlimited (NULL)Compliance requirements

Downgrade Fairness Problem

Challenge: When users downgrade from Pro to Free, should their Pro-tier data (90-day retention) be deleted immediately?

Answer: No. Data created under a higher-tier plan should retain its original retention period. This requires capturing retention policy at record creation time, not at cleanup time.

Architectural Context

The Scheduler service removal (ADR-22) eliminated the centralized cron job runner. Cleanup tasks previously embedded in the Scheduler now require Railway Cron deployment.

Decision

Implement data retention cleanup as an independent binary (cmd/retention-cleanup) triggered by Railway Cron at 3:00 AM UTC daily.

Creation-Time Retention Snapshot

Store retention_days_at_creation when records are created:

sql
ALTER TABLE user_analysis_history ADD COLUMN retention_days_at_creation integer;
ALTER TABLE spec_documents ADD COLUMN retention_days_at_creation integer;

-- Constraint: positive or NULL
CHECK ((retention_days_at_creation IS NULL) OR (retention_days_at_creation > 0))

-- Partial index for cleanup queries
CREATE INDEX idx_uah_retention_cleanup ON user_analysis_history (created_at)
WHERE (retention_days_at_creation IS NOT NULL);

Value semantics:

ValueMeaning
NULLUnlimited retention (enterprise/legacy)
Positive intDays until eligible for deletion

Two-Phase Cleanup Strategy

┌─────────────────────────────────────────────────────────────────┐
│                    Cleanup Execution Order                       │
├─────────────────────────────────────────────────────────────────┤
│  Phase 1a: Delete expired user_analysis_history                  │
│            WHERE created_at + retention_days < NOW()             │
│                                                                  │
│  Phase 1b: Delete expired spec_documents                         │
│            WHERE created_at + retention_days < NOW()             │
│                                                                  │
│  Phase 2:  Delete orphaned analyses                              │
│            WHERE no remaining user_analysis_history refs         │
│            AND created_at < NOW() - 1 day (grace period)         │
└─────────────────────────────────────────────────────────────────┘

Binary Architecture

cmd/retention-cleanup/
├── main.go                    # Entry point, run-to-completion
└── (bootstrap via internal/)

src/internal/
├── domain/retention/
│   ├── errors.go              # Domain errors
│   ├── policy.go              # Expiration calculation
│   └── repository.go          # CleanupRepository interface
├── usecase/retention/
│   └── cleanup.go             # CleanupUseCase orchestration
└── adapter/repository/postgres/
    └── retention.go           # PostgreSQL implementation

CleanupRepository Interface

go
type CleanupRepository interface {
    DeleteExpiredUserAnalysisHistory(ctx context.Context, batchSize int) (DeleteResult, error)
    DeleteExpiredSpecDocuments(ctx context.Context, batchSize int) (DeleteResult, error)
    DeleteOrphanedAnalyses(ctx context.Context, batchSize int) (DeleteResult, error)
}

type DeleteResult struct {
    DeletedCount int64
    HasMore      bool
}

Railway Configuration

json
{
  "$schema": "https://railway.com/railway.schema.json",
  "build": {
    "builder": "DOCKERFILE",
    "dockerfilePath": "infra/retention-cleanup/Dockerfile"
  },
  "deploy": {
    "cronSchedule": "0 3 * * *",
    "restartPolicyType": "NEVER"
  }
}

Options Considered

Option A: Individual Binary + Railway Cron (Selected)

Standalone Go binary executed daily via Railway Cron.

Pros:

  • Failure isolation from other workloads
  • Independent deployment and configuration
  • No 24/7 running cost (pay per execution)
  • Railway handles scheduling reliability

Cons:

  • Cold start latency (~10-30 seconds)
  • Railway platform dependency
  • Another binary to maintain

Option B: Embedded in Scheduler Service (Previous)

Cleanup logic as cron job within centralized Scheduler.

Pros:

  • Single service to monitor
  • Warm execution (no cold start)

Cons:

  • Scheduler removed per ADR-22
  • 24/7 running cost for daily job

Option C: PostgreSQL pg_cron Extension

Database-level scheduled job.

Pros:

  • No application binary needed
  • Native PostgreSQL solution

Cons:

  • pg_cron not available on Railway Postgres
  • Complex tier-aware logic in SQL
  • No application-level logging

Option D: Current-Plan Lookup at Cleanup Time

Query user's current plan during cleanup instead of storing at creation.

Pros:

  • No schema changes needed
  • Always reflects current tier

Cons:

  • Unfair on downgrade: Pro data deleted immediately when user downgrades to Free
  • User loses data they paid to retain

Consequences

Positive

AreaBenefit
Storage OptimizationPrevents unbounded table growth
Query PerformanceSmaller tables maintain index efficiency
Cost ControlPer-execution billing (no 24/7 process)
Downgrade FairnessExisting data retains original retention
Compliance ReadinessAutomated data lifecycle management
Audit TrailDeletion counts logged per execution

Negative

AreaTrade-off
Scheduling GranularityDaily minimum (Railway Cron limitation)
Cold Start10-30 second startup overhead per execution
Platform DependencyScheduling tied to Railway
Schema AdditionNew column on high-write tables

Technical Notes

  • Deletion order: Phase 1 before Phase 2 (foreign key awareness)
  • Batch processing: Configurable batch size with sleep intervals
  • Timeout: Railway execution timeout set to 30 minutes
  • Monitoring: Total records deleted per table logged for trending

Configuration

DATABASE_URL=postgres://...
RETENTION_TIMEOUT=30m
RETENTION_BATCH_SIZE=1000
RETENTION_BATCH_SLEEP=100ms

References

Open-source test coverage insights