Managing your terraform across multiple environments

January 1st, 2021 - Dev, Infra, Devops, Software-Development & Tech

You’re managing your cloud infrastructure using Terraform. You’ve got your first environment up and running and you’re already reaping the benefits of a codified infrastructure. Changes are easy. But, now you need to set up a second environment (staging, prod, whatever) and you’re finding that managing this is not straight forward. There’s a bunch of arguments to remember every time you switch between environments, and your switching a lot because you want to keep them in sync. Because this is hard you tend to use auto-complete, but then sometimes you forget to change something and accidentally apply prods config to staging. Well, as in many occasions, a Makefile can probably help you there.

I think I’m with you, but do I really need this?

Terraform is a per-environment tool. i.e. you benefit from terraform by knowing that you’ve applied exactly the same configuration to each of your environments, and that this way you can always keep them in sync. This is benefit numero uno, but it means it’s down to you to figure out how to handle the dynamic parts yourself. Terraform helps at least a little by providing a way to pass in variables by environment variables, command line arguments or tfvars files.

Ok, so why not just use a `tfvars` file for each environment?

Well of course environment-based tfvars files are a good idea, but the problem is that not all things can (or should) go in them.

Some things that can’t:

state-file backend information
“dynamic” output from preceding non-terraform code
“standard” CI environment variables that need some name-coercing.

Things that shouldn’t: !1!1!anything secret!1!1!.

Let’s talk through an example of why it might be beneficial to wrap up our terraform commands in a Makefile to ensure our environments are kept both safe and distinct:

include  ../envs/$(stage).env

init:
  terraform init -reconfigure \
    -backend-config="key=$${TERRAFORM_STATE_FILE}" \
    -backend-config="bucket=$${TERRAFORM_STATE_BUCKET_NAME}" \
    -backend-config="region=$${TERRAFORM_STATE_REGION}"

and our env file looks something like:

TERRAFORM_STATE_FILE=project-name.dev.tfstate
TERRAFORM_STATE_BUCKET_NAME=project-name-state
TERRAFORM_STATE_REGION=eu-west-1

Here we have an initialisation wrapper. This sets our backend-config based on environment variables stored in an env file. If we have a different state bucket for our dev, staging & production environments, we can easily switch between each environment without the chance of getting anything wrong. Anything secret has to go into protected variables in our CI pipeline.

Why not just put all of this out into one big state file?

Well, you can’t really, otherwise you’re not applying the same Terraform resources to different environments. You could put multiple state files in the same bucket, which avoids the bucket_name & region arguments above, but then you have to figure out how to ensure access to each place. If you are (probably wisely) running your production environments in a different AWS account, then you end up having to give some runners cross-account access. Don’t even start me on which way this access should flow… Nah, just put them in their own buckets in their own environments and save yourself all of that headache. Dev runners access the dev bucket, prod runners access the prod bucket, etc.

If I’m wrapping init like this, should I do it for other commands too?

Up to you, but probably yes. Here’s how I’d set it up:

tf_plan_args = '-out=.terraform.tfplan -var-file="$(stage).tfvars"'
ifdef special_flag
	tf_plan_args += '-var="special_flag=$(special_flag)"'
endif

plan: clear_plan
	terraform plan $(shell echo $(tf_plan_args))

apply: clear_plan
	terraform apply .terraform.tfplan

clear_plan:
	rm -f .terraform.tfplan

This ends up with a few handy features:

We clear the plan file after each plan/apply, meaning there’s no way to accidentally apply a stale plan.
We get much nicer cli syntax like this:
- make stage=dev init
- make stage=dev special_flag=73 plan
- make stage=dev apply

A slight extension

One pattern I’ve come across a few times now is where you need data in your terraform plan that isn’t managed by terraform. This could be:

Some environmental/system data.
Output from legacy systems.
Even something that simply can’t be terraform-controlled.

In this kind of scenario most things output as JSON, and jq is a great tool to parse this output in your Makefile to pass it into terraform. This is another great reason to codify your terraform commands: you can make sure that this stitching always works the same in all of the places. Here’s an example of how I’d change the Makefile to handle this:

ifndef required_parameter
	$(error You must specify the required parameter.)
endif

tf_plan_args = '-out=.terraform.tfplan -var-file="$(stage).tfvars" -var="required_parameter=$(required_parameter)"'
ifdef special_flag
	tf_plan_args += '-var="special_flag=$(special_flag)"'
endif

plan: clear
    $(shell ./go-get-extra-vars-as-json.sh > .extra_vars.json)
	terraform plan $(shell echo $(tf_plan_args))  -var-file=".extra_vars.json"
	
clear:
	rm -f .terraform.tfplan .extra_vars.json

Caveat: depending on your CI setup, the extra_vars.json may simply be input from a previous step, so the file might just come in as an artifact from a previous step and then it’s down to any manual runs to provide this data manually. This obviously depends on the use-case, so take this as just one approach out of many.

Final thoughts

There are probably other ways to achieve the above. I’ve heard that Terragrunt is a thing, and whilst I haven’t dug into it I assume it’s going to solve these kinds of problems (I’ve avoided it because I dislike dependencies, and favour flexibility). To be honest, how you decide to solve this issue is totally up to you, this is just one way of doing it. However, there are a few important things in my opinion that your solution should solve:

Primarily: make sure that you do solve this issue, and that you solve it well. I mean, you have to, but don’t end up picking a poor approach that loses out on the main reason to codify your infrastructure: ensuring it’s the same in all the places.
Make sure that however you solve this, your solution lives in version control. This means that the way you interact with your environments evolves alongside your codebase and that all developers on the project run it in the same way.
Ideally this should be simple to write and simple to run everywhere. I chose Makefiles because it’s a pretty ubiquitous technology, and it’s cleaner and clearer than writing pure bash. The next step up would be a scripting language like Python, or even a full tool like Terragrunt - these both have a higher barrier to entry (in dependencies and new-things-to-learn) and in Terragrunt’s case less flexibility.

Some hints & tips

Use the makefile commands in your CI too, this way they always stay up to date, and you avoid the “it works on my machine” kind of scenarios.
If you have new, even one off commands, add them to your makefile and run them through CI. This has a few benefits: it’ll run (and maybe break) in your dev environment first, and you have a version-controlled history of what ran when.
Having to repeat arguments like stage=dev is annoying. I flip-flop between it being useful to remind you of what env you’re working on, and it being super tedious. You could make it go away with an .init tempfile that gets created during the init command. Just be sure to wipe it whenever you re-init (and be very sure what you’ve initialised when you want to before you run things).

Minimal Viable Journaling - How to use Makefiles to manage a simple journal.
Delivery Driven Developers - The reason behind this kind of approach and the priority of delivery.