System design (platform)
Platform interviews ≠ product leetcode
You may design: CI platform, multi-tenant K8s, artifact registry, internal developer portal, observability pipeline, multi-region failover.
Framework (45 min)
- Requirements — users, scale (QPS, regions), latency, compliance.
- Constraints — budget, existing cloud, team size, SLO.
- High-level diagram — clients, LB, compute, data stores, async queue.
- Deep dive — pick 2 areas (scaling, consistency, security).
- Ops — deploy, monitor, backup, incident playbooks.
Example: internal CI runners on Kubernetes
- Control plane — API + queue (Redis/SQS) for jobs.
- Workers — autoscaling node pool or K8s Jobs per pipeline.
- Isolation — namespaces, network policies, ephemeral runners.
- Secrets — vault + short-lived tokens per job.
- Observability — job duration metrics, log shipping, failed job alerts.
Reliability patterns to name-drop (when relevant)
- Idempotency, retries with backoff, circuit breakers.
- Active/passive or active/active multi-AZ.
- Cache (CDN, Redis) and read replicas.
- Rate limiting and backpressure.
Show you think like SRE
- Define SLOs and how you’d test failover.
- Capacity planning — headroom for peak, cost controls.
- Game days — chaos or DR drills.
Practice
Sketch 2 designs on paper: URL shortener (classic warmup) and globally distributed object storage (harder). Then one platform design relevant to your resume.