Core skills

Kubernetes

Kubernetes Interview Questions (Practical & Realistic)

Most Kubernetes interviews are not about memorizing YAML. They’re about understanding how workloads behave in production and how you debug failures under pressure.


Architecture & Core Concepts

1. Walk me through what happens when you run:

kubectl apply -f app.yaml

A strong answer usually includes:

  • API server receives request
  • object stored in etcd
  • scheduler picks a node
  • kubelet creates the pod
  • container runtime starts containers
  • status flows back to API server

This question quickly reveals whether someone understands Kubernetes beyond YAML syntax.


2. What are the responsibilities of:

  • kube-apiserver
  • etcd
  • scheduler
  • controller-manager
  • kubelet
  • kube-proxy

Interviewers care less about textbook definitions and more about whether you understand how these components interact.


3. What happens if etcd goes down?

Expected ideas:

  • cluster state becomes unavailable
  • scheduling/API operations fail
  • existing containers may continue running temporarily
  • disaster recovery importance

Bonus: Explain backup/restore strategy.


4. What does kubelet actually do on a node?

Good practical answer:

  • watches pod specs
  • talks to container runtime
  • reports node/pod status
  • handles probes and mounts

Pods, Deployments & Workloads

5. Why should you avoid running a single standalone Pod directly?

Expected direction:

  • no self-healing
  • no rollout strategy
  • no scaling
  • use Deployment/StatefulSet/DaemonSet instead

6. Explain the difference between:

  • Deployment
  • StatefulSet
  • DaemonSet
  • Job
  • CronJob

Practical examples matter more than definitions.

Example:

  • StatefulSet → databases
  • DaemonSet → node exporters/log agents
  • Job → one-time migration task

7. What’s the difference between Deployment and StatefulSet?

This gets asked constantly.

Expected topics:

  • stable identity
  • ordered startup/shutdown
  • persistent storage
  • predictable pod names

8. How do rolling updates work in Kubernetes?

Mention:

  • ReplicaSets
  • gradual replacement
  • maxSurge
  • maxUnavailable
  • rollback

Bonus: How zero-downtime deployments can still fail if readiness probes are wrong.


9. What happens if a container inside a Pod crashes?

Good answer:

  • restart policy
  • kubelet restarts container
  • CrashLoopBackOff behavior
  • logs investigation

Networking & Services

10. Explain the difference between:

  • ClusterIP
  • NodePort
  • LoadBalancer

Practical explanation > theory.


11. How does a Service find Pods?

Expected:

  • labels
  • selectors
  • endpoints

Bonus: What happens if labels don’t match?


12. What is an Ingress?

Good practical answer:

  • HTTP/HTTPS routing
  • path/host-based routing
  • TLS termination
  • requires ingress controller

Follow-up: Difference between Ingress and LoadBalancer Service.


13. How would you debug a Service that isn’t routing traffic?

Natural troubleshooting flow:

kubectl get svc
kubectl get endpoints
kubectl describe svc myservice
kubectl get pods --show-labels

Common causes:

  • selector mismatch
  • pod not ready
  • wrong targetPort
  • NetworkPolicy

14. What is a NetworkPolicy?

Good answer:

  • controls pod-to-pod traffic
  • ingress/egress rules
  • many clusters default to allow-all

Bonus: Explain why policies do nothing unless a network plugin supports them.


Probes & Health Checks

15. Difference between liveness and readiness probes?

This is one of the highest-frequency interview questions.

Expected:

  • readiness → controls traffic
  • liveness → controls restarts

Strong candidates explain production impact.

Example:

A failing readiness probe removes the pod from Service endpoints but doesn’t restart it.


16. What can happen if probes are configured badly?

Real-world issues:

  • restart storms
  • cascading failures
  • pods marked healthy too early
  • traffic sent before app initialization completes

Storage & Persistence

17. Explain:

  • PersistentVolume (PV)
  • PersistentVolumeClaim (PVC)
  • StorageClass

Good practical answer:

  • PVC requests storage
  • StorageClass dynamically provisions storage
  • PV is the actual volume resource

18. What are Kubernetes access modes?

Expected:

  • ReadWriteOnce
  • ReadOnlyMany
  • ReadWriteMany

Follow-up: Why RWX storage is harder in cloud environments.


19. Why are StatefulSets commonly paired with PVCs?

Expected:

  • stable storage identity
  • pod rescheduling without data loss

Config & Secrets

20. Difference between ConfigMap and Secret?

Good practical answer:

  • both inject config
  • Secret is base64 encoded (not encrypted by default)
  • avoid committing secrets to Git

21. How can applications consume ConfigMaps or Secrets?

Expected:

  • environment variables
  • mounted files/volumes

Bonus: Discuss secret rotation challenges.


Scheduling & Scaling

22. Why would a Pod remain in Pending state?

Very common troubleshooting question.

Expected causes:

  • insufficient resources
  • taints/tolerations
  • node selectors
  • PVC issues
  • image pull delays

Commands:

kubectl describe pod mypod
kubectl get events

23. What is the HPA (Horizontal Pod Autoscaler)?

Good practical answer:

  • scales replicas
  • based on CPU/memory/custom metrics

Follow-up: Difference between HPA and Cluster Autoscaler.


24. What are taints and tolerations?

Practical example:

  • dedicate GPU nodes
  • isolate workloads

25. What is node affinity?

Expected:

  • scheduling preference/requirements
  • workload placement control

Cluster Operations & Reliability

26. How would you safely upgrade a Kubernetes cluster?

Strong answers mention:

  • version skew compatibility
  • upgrade control plane first
  • cordon/drain nodes
  • rolling node replacement
  • workload validation

27. What does kubectl drain do?

Expected:

  • evicts workloads safely
  • respects PodDisruptionBudgets
  • marks node unschedulable

28. What is a PodDisruptionBudget (PDB)?

Practical understanding:

  • prevents too many replicas from going down simultaneously
  • protects availability during maintenance

29. What would happen if all control plane nodes fail?

Good discussion areas:

  • workload survival
  • inability to schedule/manage
  • HA control plane design

30. Why is RBAC important?

Expected:

  • least privilege
  • service account security
  • namespace isolation

Good follow-up: Difference between Role and ClusterRole.


Troubleshooting Scenarios (Most Important)

These are the questions closest to real Kubernetes work.


31. A pod is stuck in CrashLoopBackOff. How do you debug it?

Good flow:

kubectl logs <pod>
kubectl logs <pod> --previous
kubectl describe pod <pod>
kubectl get events

Common causes:

  • bad config
  • failed dependency
  • probe failures
  • missing env vars

32. A pod is in ImagePullBackOff. What do you check?

Expected:

  • image tag
  • registry auth
  • image existence
  • network access

33. Pods are healthy, but users still get 503 errors. Why?

Excellent practical question.

Possible answers:

  • readiness probe failure
  • Service selector mismatch
  • ingress routing issue
  • TLS/backend issue

34. One node keeps failing workloads. How do you investigate?

Strong areas:

  • node pressure
  • disk full
  • kubelet health
  • network issues
  • taints
  • container runtime errors

Commands:

kubectl describe node
journalctl -u kubelet

35. DNS resolution inside pods is failing. What would you check?

Expected:

  • CoreDNS
  • kube-dns service
  • /etc/resolv.conf
  • network policies
  • upstream DNS

Practical Commands You Should Be Comfortable With

kubectl get pods -A
kubectl describe pod mypod
kubectl logs mypod --previous
kubectl exec -it mypod -- sh
kubectl top pod
kubectl get events --sort-by='.lastTimestamp'
kubectl rollout status deployment/myapp
kubectl rollout undo deployment/myapp

Kubernetes interviews reward operational thinking more than memorization.

Strong candidates:

  • understand how components interact
  • debug methodically
  • know how workloads fail in production
  • can explain tradeoffs clearly

Most interviewers care less about perfect YAML and more about whether you can keep systems reliable during incidents.

← All topics Browse jobs