Core skills

CI/CD & GitOps

CI/CD & GitOps Interview Questions (Practical & Production-Focused)

CI/CD interviews are usually less about tools and more about delivery reliability.

Interviewers want to know:

  • Can you ship safely?
  • Can you debug broken pipelines?
  • Can you reduce deployment risk?
  • Do you understand automation beyond “Jenkins runs tests”?

CI/CD Fundamentals

1. What’s the difference between CI and CD?

Expected understanding:

  • CI (Continuous Integration) → frequent code integration + automated validation
  • CD (Continuous Delivery/Deployment) → automated release process

Good practical answer:

CI helps catch issues early. CD ensures changes can be shipped safely and repeatedly.


2. What stages would you typically include in a CI/CD pipeline?

A realistic pipeline might include:

  1. Linting
  2. Unit tests
  3. Security scanning
  4. Build artifact/container image
  5. Integration tests
  6. Push to registry
  7. Deploy to staging
  8. Smoke tests
  9. Production deployment

Bonus: Discuss parallelization and caching.


3. Why should builds be immutable?

Strong practical answer:

  • same artifact across environments
  • avoids “works in staging but not prod”
  • easier rollback
  • reproducibility

Example:

Build Docker image once, promote the same image tag everywhere.


4. What causes flaky pipelines?

Very realistic interview topic.

Common causes:

  • timing/race conditions
  • shared test environments
  • network dependencies
  • external APIs
  • order-dependent tests

Good engineers talk about reliability, not just speed.


Git & Branching

5. Explain the difference between:

  • merge
  • rebase
  • squash merge

Interviewers care about collaboration tradeoffs.


6. What’s your preferred Git branching strategy?

Common answers:

  • trunk-based development
  • GitFlow
  • short-lived feature branches

Good practical insight:

Long-lived branches usually increase merge conflicts and deployment risk.


7. Why are small pull requests preferred?

Expected:

  • easier review
  • faster feedback
  • lower rollback risk
  • simpler debugging

8. How would you handle a bad deployment already merged to main?

Strong answers:

  • revert commit
  • rollback deployment
  • feature flag disable
  • hotfix branch if needed

Deployment Strategies

9. Explain rolling deployments.

Expected:

  • gradual pod replacement
  • reduced downtime
  • rollback capability

Follow-up: What happens if readiness probes fail?


10. Difference between:

  • rolling deployment
  • blue/green
  • canary

Practical examples matter more than textbook definitions.


11. What is a canary deployment?

Good answer:

  • small percentage of traffic first
  • validate metrics/errors
  • gradual rollout

Bonus: Mention automated rollback based on SLOs.


12. How do feature flags help deployments?

Expected:

  • decouple deploy from release
  • reduce rollback need
  • safer experimentation

GitOps Concepts

13. What is GitOps?

One of the most common modern DevOps interview questions.

Strong answer:

Git is the source of truth for infrastructure and deployments. Changes are applied automatically from version-controlled manifests.

Key concepts:

  • declarative configs
  • reconciliation loop
  • auditability
  • drift detection

14. What problem does GitOps solve?

Good practical answers:

  • config drift
  • manual deployment mistakes
  • lack of traceability
  • inconsistent environments

15. Explain pull-based vs push-based deployments.

Very important GitOps question.

Expected:

  • Push → CI system deploys directly
  • Pull → agent/operator inside cluster syncs desired state

Why pull-based is popular:

  • better security
  • cluster-local reconciliation
  • easier RBAC boundaries

16. What is reconciliation in GitOps?

Expected:

  • continuously comparing desired vs actual state
  • self-healing behavior

Example:

If someone manually changes a Deployment, ArgoCD/Flux can revert it automatically.


17. Why is Git considered the “single source of truth”?

Expected:

  • audit trail
  • rollback history
  • peer review
  • reproducibility

ArgoCD / Flux Practical Questions

18. What happens when someone manually edits a Deployment in the cluster?

Good GitOps answer:

  • drift detected
  • reconciler resets resource
  • cluster returns to desired state

19. How would you organize Git repositories for GitOps?

Common approaches:

  • app repo + infra repo
  • environment overlays
  • mono repo vs multi repo

Bonus: Mention Kustomize or Helm.


20. How do you promote changes from dev → staging → prod?

Expected ideas:

  • PR-based promotion
  • image tag updates
  • environment-specific overlays
  • approval gates

Security & Reliability

21. How should secrets be handled in CI/CD?

Expected:

  • never commit secrets
  • use secret managers
  • short-lived credentials
  • masked pipeline variables

Bonus: Mention Sealed Secrets / External Secrets.


22. What are common CI/CD security risks?

Good answers:

  • leaked credentials
  • overly privileged runners
  • untrusted pull requests
  • supply-chain attacks
  • malicious dependencies

23. Why should CI runners be isolated?

Expected:

  • prevent lateral movement
  • avoid credential leakage
  • reduce blast radius

Kubernetes + CI/CD

24. How would you deploy a Kubernetes application safely?

Strong answer:

  • rolling updates
  • readiness probes
  • health checks
  • progressive rollout
  • rollback plan

25. How do you rollback a Kubernetes deployment?

Commands:

kubectl rollout undo deployment/myapp
kubectl rollout history deployment/myapp

Bonus: Explain image immutability importance.


26. Your deployment succeeded, but traffic fails. What do you check?

Good troubleshooting areas:

  • readiness probes
  • Service selectors
  • ingress config
  • env vars
  • secrets/configmaps

Pipeline Troubleshooting Questions

27. A pipeline suddenly becomes very slow. How do you investigate?

Expected thinking:

  • dependency download time
  • cache misses
  • runner resource issues
  • parallelism bottlenecks
  • flaky retries

28. A deployment works locally but fails in CI. Why?

Classic interview question.

Possible causes:

  • missing env vars
  • different OS/runtime
  • race conditions
  • dependency versions
  • permissions

29. Production deployment failed halfway through. What now?

Strong answers prioritize:

  1. stop blast radius
  2. rollback/mitigate
  3. investigate root cause
  4. verify recovery

30. CI pipelines are passing, but production still breaks frequently. Why?

Excellent senior-level question.

Potential discussion:

  • weak test coverage
  • missing integration tests
  • environment drift
  • lack of observability
  • poor release strategy

Real-World Scenario Questions

31. How would you design a deployment pipeline for a microservices platform?

Expected areas:

  • parallel builds
  • artifact registry
  • environment promotion
  • rollback strategy
  • observability
  • GitOps integration

32. How would you reduce deployment risk for a critical service?

Great answers mention:

  • canary releases
  • feature flags
  • smoke tests
  • progressive delivery
  • automated rollback

33. A GitOps sync keeps reverting manual fixes. What should engineers do?

Expected:

  • fix desired state in Git
  • avoid manual cluster changes
  • emergency bypass only temporarily

34. What metrics would you monitor for deployment health?

Strong answers:

  • error rate
  • latency
  • saturation
  • restart count
  • deployment success rate

35. How would you debug a failed ArgoCD sync?

Expected:

  • compare desired vs live state
  • RBAC permissions
  • invalid manifests
  • CRD availability
  • namespace/resource conflicts

Commands You Should Be Comfortable With

git log --oneline
git rebase main
git revert <commit>

kubectl rollout status deployment/myapp
kubectl rollout undo deployment/myapp

argocd app sync myapp
argocd app diff myapp

Nice Closing Section for Your Blog

CI/CD interviews are rarely about memorizing Jenkins syntax or Git commands.

Interviewers are evaluating whether you understand:

  • safe delivery practices
  • rollback strategies
  • deployment reliability
  • automation design
  • operational risk reduction

Strong candidates think in terms of:

  • blast radius
  • reproducibility
  • observability
  • recovery speed
  • developer experience

The best answers usually come from explaining how you’d handle failures in production, not from reciting definitions.

← All topics Browse jobs