Kubernetes
Kubernetes Interview Questions (Practical & Realistic)
Most Kubernetes interviews are not about memorizing YAML. They’re about understanding how workloads behave in production and how you debug failures under pressure.
Architecture & Core Concepts
1. Walk me through what happens when you run:
kubectl apply -f app.yaml
A strong answer usually includes:
- API server receives request
- object stored in etcd
- scheduler picks a node
- kubelet creates the pod
- container runtime starts containers
- status flows back to API server
This question quickly reveals whether someone understands Kubernetes beyond YAML syntax.
2. What are the responsibilities of:
- kube-apiserver
- etcd
- scheduler
- controller-manager
- kubelet
- kube-proxy
Interviewers care less about textbook definitions and more about whether you understand how these components interact.
3. What happens if etcd goes down?
Expected ideas:
- cluster state becomes unavailable
- scheduling/API operations fail
- existing containers may continue running temporarily
- disaster recovery importance
Bonus: Explain backup/restore strategy.
4. What does kubelet actually do on a node?
Good practical answer:
- watches pod specs
- talks to container runtime
- reports node/pod status
- handles probes and mounts
Pods, Deployments & Workloads
5. Why should you avoid running a single standalone Pod directly?
Expected direction:
- no self-healing
- no rollout strategy
- no scaling
- use Deployment/StatefulSet/DaemonSet instead
6. Explain the difference between:
- Deployment
- StatefulSet
- DaemonSet
- Job
- CronJob
Practical examples matter more than definitions.
Example:
- StatefulSet → databases
- DaemonSet → node exporters/log agents
- Job → one-time migration task
7. What’s the difference between Deployment and StatefulSet?
This gets asked constantly.
Expected topics:
- stable identity
- ordered startup/shutdown
- persistent storage
- predictable pod names
8. How do rolling updates work in Kubernetes?
Mention:
- ReplicaSets
- gradual replacement
- maxSurge
- maxUnavailable
- rollback
Bonus: How zero-downtime deployments can still fail if readiness probes are wrong.
9. What happens if a container inside a Pod crashes?
Good answer:
- restart policy
- kubelet restarts container
- CrashLoopBackOff behavior
- logs investigation
Networking & Services
10. Explain the difference between:
- ClusterIP
- NodePort
- LoadBalancer
Practical explanation > theory.
11. How does a Service find Pods?
Expected:
- labels
- selectors
- endpoints
Bonus: What happens if labels don’t match?
12. What is an Ingress?
Good practical answer:
- HTTP/HTTPS routing
- path/host-based routing
- TLS termination
- requires ingress controller
Follow-up: Difference between Ingress and LoadBalancer Service.
13. How would you debug a Service that isn’t routing traffic?
Natural troubleshooting flow:
kubectl get svc
kubectl get endpoints
kubectl describe svc myservice
kubectl get pods --show-labels
Common causes:
- selector mismatch
- pod not ready
- wrong targetPort
- NetworkPolicy
14. What is a NetworkPolicy?
Good answer:
- controls pod-to-pod traffic
- ingress/egress rules
- many clusters default to allow-all
Bonus: Explain why policies do nothing unless a network plugin supports them.
Probes & Health Checks
15. Difference between liveness and readiness probes?
This is one of the highest-frequency interview questions.
Expected:
- readiness → controls traffic
- liveness → controls restarts
Strong candidates explain production impact.
Example:
A failing readiness probe removes the pod from Service endpoints but doesn’t restart it.
16. What can happen if probes are configured badly?
Real-world issues:
- restart storms
- cascading failures
- pods marked healthy too early
- traffic sent before app initialization completes
Storage & Persistence
17. Explain:
- PersistentVolume (PV)
- PersistentVolumeClaim (PVC)
- StorageClass
Good practical answer:
- PVC requests storage
- StorageClass dynamically provisions storage
- PV is the actual volume resource
18. What are Kubernetes access modes?
Expected:
- ReadWriteOnce
- ReadOnlyMany
- ReadWriteMany
Follow-up: Why RWX storage is harder in cloud environments.
19. Why are StatefulSets commonly paired with PVCs?
Expected:
- stable storage identity
- pod rescheduling without data loss
Config & Secrets
20. Difference between ConfigMap and Secret?
Good practical answer:
- both inject config
- Secret is base64 encoded (not encrypted by default)
- avoid committing secrets to Git
21. How can applications consume ConfigMaps or Secrets?
Expected:
- environment variables
- mounted files/volumes
Bonus: Discuss secret rotation challenges.
Scheduling & Scaling
22. Why would a Pod remain in Pending state?
Very common troubleshooting question.
Expected causes:
- insufficient resources
- taints/tolerations
- node selectors
- PVC issues
- image pull delays
Commands:
kubectl describe pod mypod
kubectl get events
23. What is the HPA (Horizontal Pod Autoscaler)?
Good practical answer:
- scales replicas
- based on CPU/memory/custom metrics
Follow-up: Difference between HPA and Cluster Autoscaler.
24. What are taints and tolerations?
Practical example:
- dedicate GPU nodes
- isolate workloads
25. What is node affinity?
Expected:
- scheduling preference/requirements
- workload placement control
Cluster Operations & Reliability
26. How would you safely upgrade a Kubernetes cluster?
Strong answers mention:
- version skew compatibility
- upgrade control plane first
- cordon/drain nodes
- rolling node replacement
- workload validation
27. What does kubectl drain do?
Expected:
- evicts workloads safely
- respects PodDisruptionBudgets
- marks node unschedulable
28. What is a PodDisruptionBudget (PDB)?
Practical understanding:
- prevents too many replicas from going down simultaneously
- protects availability during maintenance
29. What would happen if all control plane nodes fail?
Good discussion areas:
- workload survival
- inability to schedule/manage
- HA control plane design
30. Why is RBAC important?
Expected:
- least privilege
- service account security
- namespace isolation
Good follow-up: Difference between Role and ClusterRole.
Troubleshooting Scenarios (Most Important)
These are the questions closest to real Kubernetes work.
31. A pod is stuck in CrashLoopBackOff. How do you debug it?
Good flow:
kubectl logs <pod>
kubectl logs <pod> --previous
kubectl describe pod <pod>
kubectl get events
Common causes:
- bad config
- failed dependency
- probe failures
- missing env vars
32. A pod is in ImagePullBackOff. What do you check?
Expected:
- image tag
- registry auth
- image existence
- network access
33. Pods are healthy, but users still get 503 errors. Why?
Excellent practical question.
Possible answers:
- readiness probe failure
- Service selector mismatch
- ingress routing issue
- TLS/backend issue
34. One node keeps failing workloads. How do you investigate?
Strong areas:
- node pressure
- disk full
- kubelet health
- network issues
- taints
- container runtime errors
Commands:
kubectl describe node
journalctl -u kubelet
35. DNS resolution inside pods is failing. What would you check?
Expected:
- CoreDNS
- kube-dns service
/etc/resolv.conf- network policies
- upstream DNS
Practical Commands You Should Be Comfortable With
kubectl get pods -A
kubectl describe pod mypod
kubectl logs mypod --previous
kubectl exec -it mypod -- sh
kubectl top pod
kubectl get events --sort-by='.lastTimestamp'
kubectl rollout status deployment/myapp
kubectl rollout undo deployment/myapp
Kubernetes interviews reward operational thinking more than memorization.
Strong candidates:
- understand how components interact
- debug methodically
- know how workloads fail in production
- can explain tradeoffs clearly
Most interviewers care less about perfect YAML and more about whether you can keep systems reliable during incidents.