Platform & cloud

System design (platform)

Platform interviews ≠ product leetcode

You may design: CI platform, multi-tenant K8s, artifact registry, internal developer portal, observability pipeline, multi-region failover.

Framework (45 min)

  1. Requirements — users, scale (QPS, regions), latency, compliance.
  2. Constraints — budget, existing cloud, team size, SLO.
  3. High-level diagram — clients, LB, compute, data stores, async queue.
  4. Deep dive — pick 2 areas (scaling, consistency, security).
  5. Ops — deploy, monitor, backup, incident playbooks.

Example: internal CI runners on Kubernetes

  • Control plane — API + queue (Redis/SQS) for jobs.
  • Workers — autoscaling node pool or K8s Jobs per pipeline.
  • Isolation — namespaces, network policies, ephemeral runners.
  • Secrets — vault + short-lived tokens per job.
  • Observability — job duration metrics, log shipping, failed job alerts.

Reliability patterns to name-drop (when relevant)

  • Idempotency, retries with backoff, circuit breakers.
  • Active/passive or active/active multi-AZ.
  • Cache (CDN, Redis) and read replicas.
  • Rate limiting and backpressure.

Show you think like SRE

  • Define SLOs and how you’d test failover.
  • Capacity planning — headroom for peak, cost controls.
  • Game days — chaos or DR drills.

Practice

Sketch 2 designs on paper: URL shortener (classic warmup) and globally distributed object storage (harder). Then one platform design relevant to your resume.

← All topics Browse jobs