Linux & shell
Instead of trivia-heavy questions, focus on scenarios that actually happen in production. These are the kinds of Linux questions platform, DevOps, SRE, and backend interviews often revolve around.
Process & system debugging
1. A server suddenly becomes slow. What do you check first?
What interviewers want:
- Structured debugging
- Understanding of CPU, memory, disk, and load
Good areas to mention:
top/htopvmstatiostat- load average vs CPU usage
- OOM kills
- disk saturation
2. What’s the difference between kill, kill -15, and kill -9?
Expected understanding:
- Graceful shutdown vs forceful termination
- Cleanup handlers
- Zombie/orphan processes
Bonus: Explain why SIGKILL can leave corrupted temp files or broken locks.
3. A process keeps restarting automatically after you kill it. Why?
Natural follow-ups:
systemd- Kubernetes restart policy
- supervisor
- cron watchdogs
Commands:
systemctl status myapp
ps auxf
4. How would you identify which process is consuming the most memory?
Useful commands:
top
ps aux --sort=-%mem | head
smem
Follow-up: What’s the difference between RSS, VSZ, and cached memory?
Networking & connectivity
5. Your app cannot reach another service. How do you debug it?
A practical answer usually includes:
- DNS resolution
- Network connectivity
- Listening ports
- Firewall/security groups
- TLS/authentication
Commands:
dig api.example.com
curl -v https://api.example.com
ss -tlnp
ping
traceroute
6. How do you check which process is listening on port 8080?
ss -tlnp | grep 8080
Alternative:
lsof -i :8080
7. What happens during an SSH login?
A strong practical answer:
- TCP connection
- Key exchange
- Host verification
- Public/private key auth
authorized_keys- session creation
Bonus: Mention why agent forwarding can be risky.
Filesystem & storage
8. Disk usage suddenly jumped to 100%. How do you investigate?
Expected flow:
df -h
du -xh / | sort -h | tail
find /var -type f -size +1G
Good real-world causes:
- logs
- core dumps
- runaway temp files
- Docker images
- deleted-but-open files
Bonus command:
lsof | grep deleted
9. What’s the difference between a hard link and a soft link?
Practical explanation:
- Hard link → same inode
- Soft link → pointer/path reference
Important behavior:
- Soft links break if target moves
- Hard links survive original filename deletion
10. Why might rm not immediately free disk space?
Expected concept: Process still holding file descriptor open.
Debugging:
lsof | grep deleted
Permissions & security
11. Explain Linux file permissions using 755.
Expected:
- owner/group/others
- read/write/execute
Bonus: Difference between directory execute bit and file execute bit.
12. When would you use chmod vs chown?
Simple but surprisingly common.
13. A script works with sudo but fails without it. How do you debug?
Possible areas:
- permissions
- environment variables
- PATH differences
- SELinux/AppArmor
- file ownership
Logs & observability
14. How do you continuously monitor logs in Linux?
tail -f app.log
journalctl -u myapp -f
Follow-up: Difference between application logs and systemd journal logs.
15. An application crashes randomly. What logs would you inspect?
Good answer:
- app logs
- kernel logs
- OOM messages
- systemd logs
Commands:
dmesg
journalctl -xe
Systemd & services
16. How do you start a service automatically on boot?
systemctl enable nginx
Good follow-up — explain:
Restart=always
After=network.target
WantedBy=multi-user.target
17. A systemd service is failing repeatedly. How do you debug it?
Useful commands:
systemctl status myapp
journalctl -u myapp
Bonus: Discuss restart loops and exit codes.
Bash & scripting
18. What does set -euo pipefail do in bash?
This is asked a lot in DevOps interviews.
Expected:
- fail fast
- undefined variables
- pipeline failure handling
19. What does $? represent?
Exit code of previous command.
Bonus convention:
0= success- non-zero = failure
20. Write a small bash loop to process files in a directory.
Example:
for file in *.log; do
echo "Processing $file"
done
Interviewers usually care more about readability than cleverness.
Scenario-based questions (most valuable)
These feel closest to actual production work.
21. One Kubernetes pod is failing, but others are healthy. How would you investigate?
Expected thought process:
- logs
- resource limits
- node issue
- config drift
- dependency connectivity
22. CPU usage is low, but load average is very high. Why?
Strong candidates mention:
- blocked I/O
- uninterruptible sleep
- disk/network bottlenecks
23. A deployment happened 10 minutes ago and now users see 5xx errors. What do you do?
Interviewers want:
- rollback thinking
- blast radius assessment
- metrics/logs
- mitigation before deep debugging
24. How would you safely restart a production service?
Good practical answer:
- health checks
- draining traffic
- rolling restart
- verify metrics after restart
25. You can SSH into a server, but outbound internet access doesn’t work. What could be wrong?
Potential causes:
- DNS
- routing
- NAT gateway
- firewall rules
- proxy configuration
What interviewers are really testing
Most Linux interviews are not about memorizing commands. They are about demonstrating how you think during failures:
- Can you narrow down scope?
- Can you gather evidence quickly?
- Do you understand how systems behave under stress?
- Can you debug calmly and methodically?
Strong candidates explain why they run a command, not just the command itself.