Skip to content

ADR-18: GitHub API Cache Tables

🇰🇷 한국어 버전

DateAuthorRepos
2025-12-24@specvitalinfra, web

Context

GitHub API imposes strict rate limits: 5,000 requests per hour for authenticated users. This creates UX and reliability issues for the dashboard repository selection feature.

IssueImpact
Rate limit exhaustionDashboard unusable when limit reached
Repeated API callsEvery dashboard visit fetches repository list
Latency variability100-500ms GitHub API response times
Concurrent request wasteMultiple browser tabs trigger duplicate calls

UX Problems

  • Slow initial load (200-500ms wait for repository list)
  • Power users with many repositories consume rate limit quickly
  • No visibility into data freshness

This extends ADR-09: GitHub App Integration Strategy which established the 5,000/hr rate limit constraint.

Decision

Database-backed cache with hybrid normalization and user-controlled refresh.

Schema

sql
-- Shared organization metadata (global deduplication)
CREATE TABLE github_organizations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  github_org_id BIGINT NOT NULL UNIQUE,
  login VARCHAR(255) NOT NULL,
  avatar_url TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- User-Org N:N relationship
CREATE TABLE user_github_org_memberships (
  user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  github_org_id UUID NOT NULL REFERENCES github_organizations(id) ON DELETE CASCADE,
  CONSTRAINT uq_user_org UNIQUE (user_id, github_org_id)
);

-- Unified repository cache per user
CREATE TABLE user_github_repositories (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  github_repo_id BIGINT NOT NULL,
  full_name VARCHAR(511) NOT NULL,
  source_type VARCHAR(50) NOT NULL, -- 'personal' | 'organization'
  source_org_id UUID REFERENCES github_organizations(id),
  CONSTRAINT uq_user_repo UNIQUE (user_id, github_repo_id)
);

Access Pattern

User Request → Check Cache → Hit? → Return cached data (instant)
                    ↓ No
              Singleflight Group

              GitHub API Call → Store → Return

Refresh Strategy

  • Cache-First: Check database before GitHub API
  • User-Controlled: ?refresh=true forces invalidation
  • No TTL: Cache persists until explicit refresh
  • Singleflight: Prevents duplicate concurrent requests

Options Considered

Option A: No Caching (Direct API)

ProsCons
Always freshRapid rate limit consumption
Simple200-500ms latency per request
Cascading failures on exhaustion

Rejected: Unacceptable UX and rate limit risk.

Option B: TTL-Based Cache

Cache with automatic time-based expiration (e.g., 15-minute TTL).

ProsCons
Automatic freshnessTTL tuning complexity
Predictable stalenessCache stampede at TTL boundary
Industry-standardBackground refresh jobs needed

Rejected: TTL boundaries create stampede risk; arbitrary values don't align with actual change frequency.

Option C: User-Controlled Refresh (Selected)

Cache persists until user explicitly refreshes.

ProsCons
Instant loads (0ms hit)Data may become stale
User agencyUsers must know to refresh
Minimal API consumptionNew repos invisible until refresh
No TTL complexityStorage grows with users

Selected: Optimal UX for primary use case (selecting existing repos).

Option D: Redis Cache

External Redis for repository data.

ProsCons
Sub-millisecond readsAdditional infrastructure
Built-in TTLData loss on restart
Shared across instancesOperational complexity

Rejected: Over-engineering; PostgreSQL already sufficient.

Consequences

Positive:

  • Repository list displays instantly from cache
  • API calls only on explicit refresh or first visit
  • User control over data freshness
  • Hybrid normalization shares org data across users
  • Singleflight prevents duplicate in-flight requests
  • Cache serves requests during GitHub outages

Negative:

  • New repos invisible until refresh (mitigated by refresh button)
  • Storage cost per user (pruned on user deletion)
  • Users must learn refresh pattern (onboarding tooltip)

Schema Design Rationale

Hybrid Normalization

TableRationale
Global github_organizationsOrgs are shared; 1000 users in same org don't duplicate metadata
Per-user user_github_repositoriesVisibility is user-specific; permissions vary
Junction tableClean N:N for "which orgs does user belong to" queries

UNIQUE (user_id, github_repo_id)

  • Prevents duplicate cache entries
  • Enables upsert (INSERT ... ON CONFLICT)
  • Maintains referential integrity

source_type Column

  • UI grouping: "Personal" vs "Organization" sections
  • Query filtering for org-only views
  • Future permission logic differentiation

No TTL Column

  • User-controlled refresh eliminates automatic expiration
  • Simpler queries without TTL checks
  • No background jobs for enforcement

References

Open-source test coverage insights