Terraform Best Practices for Production Infrastructure
Terraform Best Practices for Production Infrastructure
Terraform is deceptively simple to start and surprisingly complex to run well at scale. These patterns come from real production experience managing infrastructure across AWS, GCP, and Azure.
State Management
Remote State (Non-Negotiable)
Never use local state in a team environment:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "prod/networking/terraform.tfstate"
region = "ap-south-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
Key decisions:
- One state file per logical component (networking, compute, database)
- State locking via DynamoDB (AWS) or GCS (GCP) to prevent concurrent modifications
- Encryption at rest always enabled
State File Organization
Structure state by environment and component:
states/
├── prod/
│ ├── networking/
│ ├── compute/
│ ├── database/
│ └── monitoring/
├── staging/
│ └── ...
└── shared/
├── dns/
└── iam/
Small state files are faster to plan, safer to modify, and easier to recover.
Module Design
Keep Modules Focused
One module = one logical thing:
modules/
├── vpc/ # networking only
├── eks-cluster/ # cluster + node groups
├── rds/ # database instance + security group
└── monitoring/ # dashboards + alerts
Input Validation
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
Output What Consumers Need
output "vpc_id" {
value = aws_vpc.main.id
description = "VPC ID for use by dependent modules"
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
description = "Private subnet IDs for workload placement"
}
CI/CD Integration
Pipeline Structure
# GitHub Actions example
plan:
- terraform init
- terraform validate
- terraform plan -out=plan.tfplan
- Post plan output as PR comment
apply:
- Only on merge to main
- terraform apply plan.tfplan
- Notify on success/failure
Safety Rules
planruns on every PR — reviewers see what changesapplyonly runs after merge (never on branch push)- Require manual approval for production applies
- Store plan file as artifact — apply exactly what was reviewed
Common Pitfalls
1. Ignoring Drift
Terraform only knows about resources it manages. Manual changes create drift:
# Check regularly
terraform plan -detailed-exitcode
# Exit code 2 = drift detected
2. Hardcoding Values
Bad:
resource "aws_instance" "web" {
ami = "ami-0abcdef1234567890"
instance_type = "t3.medium"
}
Better:
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"]
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-*-amd64-server-*"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_type
}
3. Giant Monolithic State
If terraform plan takes more than 30 seconds, your state is too large. Split it.
4. Not Using moved Blocks
When refactoring, use moved blocks to avoid destroy/recreate:
moved {
from = aws_instance.web
to = module.compute.aws_instance.web
}
5. Secrets in State
Terraform state contains sensitive values in plaintext. Always:
- Encrypt state at rest
- Restrict state bucket access
- Never commit state files to Git
- Use
sensitive = trueon outputs
Workspace vs Directory Strategy
| Approach | Best For |
|---|---|
| Workspaces | Same infra, different scale (dev/staging/prod with identical structure) |
| Directories | Different infra per environment (prod has additional security layers) |
Most mature teams use directories — environments rarely stay identical.
Testing
Validate
terraform validate # syntax check
terraform fmt -check # formatting
Plan Analysis
terraform plan -json | jq '.resource_changes[] | select(.change.actions | contains(["delete"]))'
Integration Tests
Tools like Terratest or terraform test (native, added in 1.6) verify actual infrastructure behavior.
Import Existing Resources
When inheriting un-managed infrastructure:
terraform import aws_instance.legacy i-1234567890abcdef0
Then write the corresponding HCL to match. Use terraform plan to verify zero diff before managing it.
Terraform rewards discipline. Small state files, focused modules, CI/CD gates, and regular drift checks keep production infrastructure predictable.
Practice with our Terraform interview questions or browse infrastructure roles.