ADR-12: Worker-Centric Analysis Lifecycle
| Date | Author | Repos |
|---|---|---|
| 2024-12-16 | @KubrickCode | web, worker |
Context
Existing Architecture
ADR-03 established API (Web) and Worker service separation. ADR-04 introduced queue-based async processing. Initial implementation caused dual ownership issues:
Previous Flow:
User Request → Web creates "pending" record → Enqueue → Worker processes → Record updateProblem: Two services manipulating the same database record
Dual Ownership Issues
| Issue | Impact |
|---|---|
| Duplicate Records | Web creates pending record, Worker may create another |
| State Inconsistency | Web's DB state may not match actual queue state |
| Complex Error Recovery | Failure requires coordination between two services |
| Race Conditions | Retry requests may create multiple pending records |
| Unclear Responsibility | Ambiguous authoritative source for record state |
Root Cause
Core problem: No Single Source of Truth for analysis record lifecycle. Both Web and Worker have write access to the same record, causing synchronization complexity.
Decision
Adopt Worker-centric analysis lifecycle where Worker exclusively owns record creation, processing, and completion.
New Architecture
User Request → Web (enqueue only) → Queue → Worker (create → process → complete)
↓ ↓
UUID generation Single ownership of record lifecycle
Check status from queue Create record on job start
Update on completion/failureCore Principles:
- Web: Enqueue Only - Generate analysis UUID, enqueue job, no writes to analysis table
- Worker: Full Ownership - Create record on job start, update on completion
- Queue as State Source - Web checks in-progress analysis from queue, not DB
- Single Writer - Only Worker writes to analysis records
Options Considered
Option A: Worker-Centric Ownership (Selected)
Web enqueues analysis request with generated UUID. Worker creates record on processing start, updates on completion.
Pros:
- Single source of truth: One service owns entire lifecycle
- No duplicate records: Only Worker creates analysis entries
- Clear service boundaries: Web handles HTTP, Worker handles analysis
- Simple error handling: All failure states managed in one place
- Better transaction consistency: Create and update in same service context
- Independent scaling: Worker scales without affecting Web
Cons:
- Web depends on queue for status queries
- Queue system becomes critical infrastructure
- Slightly more complex status checking logic
Option B: Web-Centric Ownership
Web creates and manages all records. Worker only updates existing records.
Pros:
- Immediate DB record for tracking
- Simple status queries (always from DB)
Cons:
- Worker must handle "record not found" cases
- Timing issues if queue processes before DB commit
- More complex retry logic (must check record existence)
- Web becomes bottleneck for record creation
Option C: Dual Ownership (Existing)
Both services can create and modify records with coordination logic.
Pros:
- Implementation flexibility
Cons:
- Duplicate record risk
- Complex coordination required
- Similar complexity to distributed transactions
- Ambiguous source of truth
Implementation Details
Queue Payload
{
"owner": "github-org",
"repo": "repo-name",
"commit_sha": "abc123",
"user_id": "uuid",
"analysis_id": "uuid"
}analysis_id: Pre-generated by Web
Status Query Flow (Web)
- Check if active job matching owner/repo exists in queue
- Active job found → Return "pending" status
- No active job → Query DB for completed/failed analysis
Deduplication
CommitSHA-based uniqueness prevents duplicate analysis:
- Multiple requests for same repo+commit → Single queue job
- River's unique constraint on (owner, repo, commit_sha)
- Prevents unnecessary computation on concurrent requests
Consequences
Positive
Data Integrity:
- No duplicate analysis records
- Consistent lifecycle state machine
- Race conditions eliminated through clear ownership
Operational Simplicity:
- Single service for debugging analysis issues (Worker)
- Clearer logs and traces
- Simpler monitoring (only one writer to track)
Scalability:
- Worker scales independently based on queue depth
- Web stays lightweight (no heavy DB writes for analysis)
- Queue naturally buffers load
Future Compatibility:
- Aligns with scheduled re-analysis (Worker ADR-01)
- Same lifecycle whether user-initiated or scheduled
- Consistent ownership model
Negative
Queue Dependency:
- Web must query queue for status
- Queue unavailability affects status queries
- Mitigation: Queue shares PostgreSQL backend (River), same availability as DB
Delayed Visibility:
- Analysis doesn't appear in DB until Worker starts processing
- Short delay between enqueue and record creation
- Mitigation: Queue status provides immediate feedback
Complexity Shift:
- Status logic moves from simple DB query to queue inspection
- Mitigation: Encapsulate in repository/adapter layer
Technical Implications
| Aspect | Implication |
|---|---|
| Transaction Scope | Record creation + initial state in single Worker transaction |
| Failure Handling | All retries managed in Worker |
| Queue Schema | Must support owner/repo lookup for status |
| Monitoring | Queue metrics indicate analysis status |
| Scheduled Analysis | User-initiated and scheduled share same lifecycle |
