ADR-22: Scheduler Removal and Railway Cron Migration

🇰🇷 한국어 버전

Date	Author	Repos
2026-02-02	@KubrickCode	worker, infra

Status

Accepted - Supersedes ADR-01: Scheduled Re-collection

Context

Original Architecture Problem

The Scheduler service (ADR-01) was designed to pre-analyze repositories for instant user responses. However, operational data revealed fundamental cost-benefit issues:

Metric	Expected	Actual
Analysis time	30+ seconds	~5 seconds
Pre-compute value	High	Low (5s wait is acceptable)
Data freshness	Maintainable	Impossible for active repos
Storage growth	Controlled	Rapid (unviewed results)
24/7 running cost	Justified	Excessive for utility

Why Pre-Compute Failed

Low value proposition: 5-second analysis time is acceptable for users
Freshness impossible: Active repositories have frequent commits, making pre-computed results immediately stale
Database bloat: Unviewed analysis results accumulated rapidly
Cost inefficiency: 24/7 scheduler running cost exceeded actual utility

Scheduler Architecture Overhead

The Scheduler service introduced significant complexity:

Distributed locking: PostgreSQL-based lock for single-instance guarantee
go-cron internal scheduling: In-process cron job management
24/7 running cost: Always-on service with minimal actual work
Failure coupling: Scheduler failure affected all scheduled jobs

Decision

Remove the Scheduler service entirely. Migrate to Railway Cron triggers with individual single-purpose binaries.

New Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Before (Scheduler Service)                │
├─────────────────────────────────────────────────────────────┤
│  cmd/scheduler/    → 24/7 running, internal go-cron         │
│                      ├── Auto-refresh cron job              │
│                      ├── Distributed lock (PostgreSQL)      │
│                      └── Cleanup jobs (embedded)            │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    After (Railway Cron)                      │
├─────────────────────────────────────────────────────────────┤
│  cmd/analyzer/         → Queue consumer (River, ON_FAILURE) │
│  cmd/spec-generator/   → Queue consumer (River, ON_FAILURE) │
│  cmd/retention-cleanup/→ Cron binary (Railway, "0 3 * * *") │
│  cmd/enqueue/          → Manual utility                     │
└─────────────────────────────────────────────────────────────┘

Railway Cron Configuration

Each cron job is a separate Railway service with its own configuration:

retention-cleanup/railway.json:

json

{
  "$schema": "https://railway.com/railway.schema.json",
  "build": {
    "builder": "DOCKERFILE",
    "dockerfilePath": "infra/retention-cleanup/Dockerfile"
  },
  "deploy": {
    "cronSchedule": "0 3 * * *",
    "restartPolicyType": "NEVER"
  }
}

Key configuration points:

cronSchedule: Standard cron expression for scheduling
restartPolicyType: NEVER: Binary runs to completion and exits
Separate Dockerfile per service for clean build isolation

Options Considered

Option A: Railway Cron + Individual Binaries (Selected)

How It Works:

Each periodic task becomes an independent Go binary
Railway triggers binary execution via cron expression
Binary runs to completion and exits (no 24/7 process)
No distributed locking needed (Railway handles single execution)

Pros:

No 24/7 running cost for cron scheduling
No distributed lock complexity (platform manages this)
Per-job cost visibility in Railway dashboard
Railway handles scheduling reliability and retries
Simpler deployment (just a binary that exits)
Independent scaling and configuration per job

Cons:

Cold start latency for each execution
Railway platform dependency for scheduling
Requires Railway IaC for each cron job

Option B: Keep Scheduler with Reduced Scope

Description:

Maintain Scheduler but remove auto-refresh; keep only cleanup jobs.

Pros:

Minimal code changes
Existing monitoring and alerting preserved
Familiar operational model

Cons:

Still requires 24/7 process for infrequent jobs
Distributed lock complexity remains
Cost not proportional to actual work

Option C: External Cron Service (GitHub Actions, CloudWatch)

Description:

Use external cron triggers that invoke API endpoints or queue jobs.

Pros:

Free tier available (GitHub Actions)
Platform-agnostic approach
No Railway-specific configuration

Cons:

Additional security surface (exposed endpoints)
Cross-service coordination complexity
Rate limiting and retry logic needed
Monitoring fragmentation

Implementation

Removed Components

Files deleted from worker repository:

src/cmd/scheduler/main.go
src/internal/app/bootstrap/scheduler.go
src/internal/app/container_scheduler.go
src/internal/domain/analysis/autorefresh.go
src/internal/domain/analysis/decay.go
src/internal/handler/scheduler/autorefresh.go
src/internal/infra/scheduler/cron.go
src/internal/infra/scheduler/lock.go
src/internal/usecase/autorefresh/refresh.go

New Binary Structure

cmd/
├── analyzer/           # Queue consumer (River)
├── spec-generator/     # Queue consumer (River)
├── retention-cleanup/  # Cron binary (Railway)
└── enqueue/            # Manual utility

Infrastructure Configuration

infra/
├── analyzer/
│   ├── Dockerfile
│   └── railway.json
├── spec-generator/
│   ├── Dockerfile
│   └── railway.json
└── retention-cleanup/
    ├── Dockerfile
    └── railway.json

Deployment Comparison

Aspect	Before (Scheduler)	After (Railway Cron)
Running cost	24/7 (even when idle)	Per-execution only
Distributed lock	Required (PostgreSQL)	Not needed (Railway manages)
Scaling	Fixed single instance	Per-job independent
Failure isolation	All jobs fail together	Per-job isolation
Configuration	Environment variables + code	Railway IaC per service
Monitoring	Single service metrics	Per-service metrics in Railway

Consequences

Positive

Cost Optimization:

Eliminates 24/7 running cost for infrequent cron jobs
Pay only for actual execution time
Per-job cost visibility enables optimization

Operational Simplicity:

No distributed lock to manage or debug
Each cron job is a simple run-to-completion binary
Railway handles scheduling, retries, and single execution

Deployment Independence:

Each cron job can be deployed independently
Different schedules don't require code changes
IaC-based configuration (infrastructure as code)

Failure Isolation:

One failing cron job doesn't affect others
Clear per-job logs and metrics
Independent retry policies

Negative

Platform Dependency:

Tied to Railway's cron implementation
Migration requires reconfiguring all cron jobs
Railway-specific IaC format

Cold Start Latency:

Each execution starts a new container
Not suitable for sub-minute intervals
Initial connection setup overhead

Configuration Sprawl:

Multiple railway.json files to maintain
Sync between Dockerfile and railway.json required
More files in infra repository

Superseded ADRs

ADR	Status	Notes
Worker ADR-01	Superseded	Auto-refresh scheduler removed
Worker ADR-05	Partially Superseded	Scheduler no longer exists; binary separation pattern still valid

Document	Update Needed
ADR-04	Add note about Railway Cron alternative
ADR-12	Remove Scheduler references

References

Commit c163239: Remove Scheduler service
Commit f3fae45: Separate worker binaries
Commit 6e03a7f: Add retention-cleanup bootstrap
Railway Cron Documentation

ADR-22: Scheduler Removal and Railway Cron Migration ​

Status ​

Context ​

Original Architecture Problem ​

Why Pre-Compute Failed ​

Scheduler Architecture Overhead ​

Decision ​

New Architecture ​

Railway Cron Configuration ​

Options Considered ​

Option A: Railway Cron + Individual Binaries (Selected) ​

Option B: Keep Scheduler with Reduced Scope ​

Option C: External Cron Service (GitHub Actions, CloudWatch) ​

Implementation ​

Removed Components ​

New Binary Structure ​

Infrastructure Configuration ​

Deployment Comparison ​

Consequences ​

Positive ​

Negative ​

Superseded ADRs ​

Related Updates Required ​

References ​

ADR-22: Scheduler Removal and Railway Cron Migration

Status

Context

Original Architecture Problem

Why Pre-Compute Failed

Scheduler Architecture Overhead

Decision

New Architecture

Railway Cron Configuration

Options Considered

Option A: Railway Cron + Individual Binaries (Selected)

Option B: Keep Scheduler with Reduced Scope

Option C: External Cron Service (GitHub Actions, CloudWatch)

Implementation

Removed Components

New Binary Structure

Infrastructure Configuration

Deployment Comparison

Consequences

Positive

Negative

Superseded ADRs

Related Updates Required

References