How we structure Terraform modules for multi-environment deployments
At some point every team hits the same wall. You start with one Terraform directory for dev. It works. Then you need staging, so you duplicate the folder and change a few values. Then production comes along and you do it again. Six months later, dev has a module that staging doesn't, production has a hardcoded CIDR that nobody remembers changing, and every "quick fix" involves editing three directories and hoping you didn't miss one.
We've inherited enough of these setups to know that copy-paste Terraform is the default, not the exception. The fix isn't complicated, but it does require discipline upfront. Here's the module structure we use on every client engagement, and why it works.
The problem with duplicated root configs
The temptation to copy a working Terraform directory is strong because it feels safe. You have a known-good config, and you just want another one "like it but slightly different." The issue is that drift between environments is invisible until it causes an incident.
We've seen production RDS instances running without Multi-AZ because someone enabled it in the staging copy but never propagated the change forward. We've seen security groups in dev that allow 0.0.0.0/0 on port 22 because the original was locked down but the copy was "temporary." These aren't hypotheticals -- they're Tuesday.
The core principle: environments should differ only in their variable values, never in their resource definitions. If dev and prod need different instance sizes, that's a variable. If they need fundamentally different architectures, that's a conversation about whether they're really the same system.
The directory layout
Here's the structure we start every project with:
infra/
modules/
vpc/
main.tf
variables.tf
outputs.tf
eks/
main.tf
variables.tf
outputs.tf
rds/
main.tf
variables.tf
outputs.tf
environments/
dev/
main.tf
backend.tf
terraform.tfvars
staging/
main.tf
backend.tf
terraform.tfvars
prod/
main.tf
backend.tf
terraform.tfvars
Two top-level directories. That's it. modules/ contains the reusable infrastructure definitions -- one directory per logical component (VPC, EKS cluster, RDS instance, etc.). environments/ contains the thin wrapper configs that call those modules with environment-specific values.
Each module directory follows the standard three-file convention: main.tf for resources, variables.tf for inputs, outputs.tf for anything downstream modules or environments need to reference. We sometimes add a data.tf for data sources and a locals.tf if the logic warrants it, but the core three are non-negotiable.
The key insight is that the main.tf in each environment directory should be nearly identical. It's just a list of module calls:
module "vpc" {
source = "../../modules/vpc"
project = var.project
environment = var.environment
vpc_cidr = var.vpc_cidr
azs = var.availability_zones
}
module "eks" {
source = "../../modules/eks"
project = var.project
environment = var.environment
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
node_instance_type = var.eks_node_instance_type
node_desired_size = var.eks_node_desired_size
node_max_size = var.eks_node_max_size
}
If you look at the environment's main.tf and see resource blocks instead of module calls, something has gone wrong. Resources live in modules. Environments compose modules. That boundary is sacred.
Variable files keep environments in sync
The terraform.tfvars file in each environment directory is where the real differentiation happens. This is where dev gets a t3.medium and prod gets an m7g.xlarge. Where dev uses a single NAT gateway and prod uses one per AZ. Where staging gets a db.r6g.large and prod gets a db.r6g.2xlarge with Multi-AZ enabled.
# environments/dev/terraform.tfvars
project = "acme"
environment = "dev"
region = "us-east-2"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-east-2a", "us-east-2b"]
eks_node_instance_type = "t3.medium"
eks_node_desired_size = 2
eks_node_max_size = 4
rds_instance_class = "db.t4g.medium"
rds_multi_az = false
# environments/prod/terraform.tfvars
project = "acme"
environment = "prod"
region = "us-east-1"
vpc_cidr = "10.2.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
eks_node_instance_type = "m7g.xlarge"
eks_node_desired_size = 3
eks_node_max_size = 12
rds_instance_class = "db.r6g.xlarge"
rds_multi_az = true
When a new variable gets added to a module, every environment must provide a value for it. Terraform will tell you if you forget -- that's the point. No silent drift. If someone adds a feature to the VPC module, all three environments either adopt it explicitly or the plan fails. This is a feature, not a limitation.
We also keep a variables.tf in each environment directory that mirrors the tfvars keys. Some teams skip this and rely on auto-detection, but explicit variable declarations give you descriptions, type constraints, and validation rules. Worth the few extra lines.
State management: one file per environment, always remote
Each environment gets its own state file in a remote backend. We use S3 with DynamoDB locking. The backend.tf in each environment directory configures this:
terraform {
backend "s3" {
bucket = "acme-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Three rules we never break:
- One state file per environment. Never share state between dev and prod. A bad
terraform destroyin dev should never have the possibility of touching production resources. - DynamoDB locking is mandatory. Without it, two engineers running
terraform applysimultaneously can corrupt the state. It costs pennies a month and prevents a genuinely terrible day. - Encryption at rest. State files contain sensitive data -- RDS passwords, private IPs, IAM role ARNs. The
encrypt = trueflag uses SSE-S3 by default, but we typically configure a dedicated KMS key for production accounts.
A common question: why not use workspaces instead of separate directories? Workspaces work, but they hide the environment context. With separate directories, you can cd into environments/prod and see exactly what's deployed there -- the backend config, the variables, the module versions. There's no workspace switching, no mental overhead of "which workspace am I in right now?" For teams larger than one person, explicitness wins.
There's more to it
This structure is the foundation, not the full picture. Once you have clean module boundaries and per-environment state, the next questions are usually:
- CI/CD integration -- how do you run
planon pull requests andapplyon merge, with proper approval gates for production? - Policy-as-code -- how do you enforce that nobody provisions a publicly accessible RDS instance or an S3 bucket without encryption, regardless of what the tfvars say?
- Drift detection -- how do you catch when someone clicks around in the AWS console and creates resources that Terraform doesn't know about?
- Module versioning -- when do you pin module sources to git tags versus using relative paths?
Each of those deserves its own post. But none of them matter if the underlying module structure is a mess. Get the directory layout right first. Everything else builds on top of it.
If your Terraform repo has more copy-paste than you'd like to admit, or you're starting a new multi-environment setup and want to get the structure right from day one, we're happy to talk through it.
Vishwaraja Pathi
Cloud & DevOps specialist with 13+ years of experience. Founder of Adiyogi Technologies. Previously at Roku, Rocket Lawyer, and BetterPlace.