TidyShelves – Configuration System Design (50M+ Ready)
Overview
This document defines the final configuration system design for TidyShelves, built to scale safely from 0 → 50M+ users.
The system is designed to:
- Eliminate hardcoding across server and iOS
- Be offline-first
- Avoid refetch storms at scale
- Serve reads from cache, not the database
- Support per-device log escalation
- Use POST-only APIs
- Be incrementally adoptable (in-memory → Redis → Event-driven)
No implementation code is included.
Design Principles
- Server-driven configuration (clients render only)
- Version-first, fetch-on-change
- Cache-first reads
- Sparse overrides (most devices have none)
- POST-only APIs
- Backward-compatible evolution
- Operational safety built-in
Non-Negotiable Scale Requirements (50M+)
- Global config changes must not cause device-wide refetch storms
- Per-device changes must affect only that device
- Database must not be on the hot read path
- Clients fetch config only when versions change
- App must function fully using cached config when offline
- Config response shape must be versioned
Data Model
1. Config Key Catalog
config_keys
id(PK)key(unique)data_type(string | int | bool | enum | json)allowed_values(JSON array string, nullable)default_value(string, JSON-encoded for json)description
This table is small and stable (hundreds of rows at most).
2. Global Overrides
global_config
config_key_id(unique)valueupdated_at
Used for global behavior changes.
3. Device Overrides (Sparse)
device_config
device_idconfig_key_idvalueupdated_at
Constraints:
UNIQUE(device_id, config_key_id)- Indexed by
device_id
Only a small percentage of devices will have overrides (e.g., log escalation).
Versioning Strategy (Critical for Scale)
Global Version
conf_global_version
- Single row
- Bumped only when:
config_keyschangesglobal_configchanges
Device Version (Sparse)
conf_device_version
- One row per device only when needed
- Bumped only when that device’s effective output changes
❗ Do not bump for internal bookkeeping (e.g., decrementing counters).
(Future) Tenant Version
conf_tenant_version
- Per-tenant versioning
- Enables tenant-scoped config without waking all devices
Caching Strategy
Layer 1: In-Process Memory Cache
Each service instance caches:
GlobalSnapshot = { globalVersion, mergedKeys }
TTL: 30–120 seconds (or event-driven invalidation).
Layer 2: Distributed Cache (Redis)
Redis becomes the primary read source.
Recommended keys:
conf:global:versionconf:global:snapshotconf:device:version:<deviceId>conf:device:overrides:<deviceId>- (future)
conf:tenant:*
Database remains the system of record.
API Design (POST-Only)
1. Version Check
POST /conf/mobile/config/version
Returns:
configSchemaVersionglobalVersiondeviceVersiontenantVersionshouldFetchEffectiveConfiguseCachedConfigOnly(ops safety switch)
Used on:
- app launch
- foreground (debounced)
- after sign-in
- after error/fatal log upload
2. Fetch Effective Config
POST /conf/mobile/config/effective
Returns:
- version values
keysas a dictionary map
Supports optional:
groups(e.g.,brand,log,invite)requestId(idempotency)
Client Behavior (Offline-First)
Local Storage
Clients store:
- cached config map
- version values
- schema version
- timestamp
Refresh Algorithm
- If offline → use cached config
- If online → POST version check
- If
useCachedConfigOnly=true→ stop - If versions changed → fetch effective config
Most requests are cheap and return no change.
Device Log Escalation
Purpose
Temporarily increase logging for devices that encounter errors, without affecting others.
Config Keys
log.level.globallog.escalation.policylog.level.device_override
Lifecycle
- Device sends
errororfatallog - Server applies escalation policy:
- sets
log.level.device_override - bumps deviceVersion only
- sets
- Device fetches new config → debug enabled
- Auto fallback when:
- max launches reached or
- time window expires
- Server clears override and bumps deviceVersion again
Important Rules
- Decrement launch counters server-side
- Do not bump deviceVersion on each decrement
- Bump only when effective output changes
- APNs is optional and never required
Schema Versioning
configSchemaVersion is returned with every config response.
Purpose:
- Allows evolving response shape safely
- Prevents older clients from misinterpreting payloads
Seed Keys (Summary)
Categories:
- Branding (
brand.*) - App & Links (
app.*,web.*) - Feature flags (
feature.*) - Invitation policy (
invite.*) - Mobile & Sync (
mobile.*,sync.*) - Logging (
log.*)
All keys:
- Typed
- Validated
- Server-driven
- Nullable overrides
Implementation Order
- Seed
config_keyswith strict validation - Add version tables (global + device)
- Add in-memory cache
- Add Redis cache
- Implement POST-only config APIs
- Refactor server services to consume config
- Implement log escalation
- Update iOS to consume config
- (Optional) Add event-driven invalidation
Azure Services Guidance
Redis
Used for:
- global snapshot
- device versions
- device overrides
Recommended once services scale horizontally.
Kafka (Azure Event Hubs – Kafka Endpoint)
Optional. Used only for:
- config change events
- cache invalidation
- audit streams
Not required for config reads.
Final Sanity Check (50M+)
This system scales because:
- Versions prevent storms
- Device overrides are isolated
- Cache handles reads
- DB is protected
- Offline clients remain functional
- Ops controls exist for emergencies
This design is production-ready, evolvable, and safe for very large scale.