A few years ago, my colleague Gregory Tomei and others at Cloud City built a platform for Vital Strategies that empowered developing nations to improve their pandemic readiness. When it was acquired by the World Health Organization, we were asked to migrate the project into the WHO’s Microsoft Azure environment.
Significant challenges we faced were navigating the permission limitations inherent in Azure and the WHO’s necessity for security. We often lacked sufficient privileges in our consultant account to even figure out what resource we needed the WHO admins to create for us.
Given the global nature of the WHO, requests for resources to be created, tweaked or (oops) restored after being deleted would take at least a day due to time differences.
The slow feedback loop made Gregory quickly realize it was necessary to run our own Azure environment. With admin permissions, we could create whatever we needed and then work out how to make it happen in the production environment.
Navigating strict permissions
Our strategy was to build the Terraform configuration to deploy to our private sandbox Cloud City Azure before requesting permissions or resources in the WHO Azure that we knew would work. This reduced the slow back-and-forth that goes along with global remote coordination.
There was one catch: Terraform doesn’t have a built-in mechanism for running the same config between two accounts with two remote backends. Unlike workspaces that can easily change state within the same account, toggling between remote backends in Terraform is not particularly easy or elegant.
Our initial approach was storing a backup of the .terraform
directory and related files. This works a few times if you need to manually switch, but when this became a part of our development workflow, we needed a better solution.
A better solution
Terraform only wants to use a single backend within any given directory. You can tell this is the case because Terraform drops a few files into the local directory where you run terraform init
.
These are .terraform/
and .terraform.lock.hcl
. These files detect changes to where the Terraform state is stored and handle backend state migrations.
This is a good thing for many reasons. Account boundaries are a good way to contain the “blast radius” of changes. We also want Terraform to detect these changes and help us out.
However, our error-prone solution of copying the .terraform directory to a backup each time we wanted to switch accounts was not reliable. We needed to make our multiple accounts process into a first-class part of our Terraform configuration.
In this example, we are using Azure CLI (this would be almost the same in AWS, except AWS uses the word “profile”).
When first running Terraform with the new account, Terraform can no longer access the remote backend on the old account. The configuration for the remote backend is specified in the main.tf and, much to my and many other developers’ chagrin, the backend configuration does not allow variables (GitHub thread). This means you cannot use environment variables to change the backend.
Reading that thread, you’ll find many suggestions that range from “you’re doing it wrong” to “use Terragrunt” (an external tool built to solve some of the problems in Terraform). There are (incorrect for our needs) suggestions to use workspaces and a (viable but more complicated) solution that passes all the remote backend information on the command line.
The most elegant solution I found for us is known as the “main module pattern.” It has the benefit of declaratively containing all the complexity of our multiple-account configuration within the repository.
An added benefit is that we can continue to use “vanilla” Terraform, without any additional tools or complicated scripts that would require extra documentation.
First-class multi-account Terraform directory structure
A simple Terraform directory might look like this:
basic
├── .terraform/
├── .terraform.lock.hcl
├── app.tf
├── db.tf
├── main.tf
├── network.tf
├── outputs.tf
└── variables.tf
Our challenge is to apply this same configuration to another account.
The “main module pattern”
With the main module pattern, you take the main directory where you usually run Terraform (the one that contains .terraform
) and turn it into a module called “main”. Then you call that module from two or more separate working directories that declare their own backend before loading the main module.
You can call the main module with whatever variables differ between accounts.
Similar to how workspaces allow multiple runs within the same backend, using multiple directories is the simplest solution I’ve found for applying identical infrastructure on multiple accounts with different remote backends.
Once we change our Terraform configuration to use the main module pattern, the directory would look more like this:
main_module_pattern_example
├── accounts
│ ├── clientname
│ │ ├── .env
│ │ ├── .terraform/
│ │ ├── .terraform.lock.hcl
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ └── cloudcity
│ ├── .env
│ ├── .terraform/
│ ├── .terraform.lock.hcl
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
└── main
├── app.tf
├── db.tf
├── main.tf
├── network.tf
├── outputs.tf
└── variables.tf
An example of accounts/clientname/main.tf
that includes the main module:
terraform {
backend "s3" {
bucket = "mybucket"
key = "path/to/my/key"
region = "us-east-1"
}
}
module "main" {
source = "../../main"
app_name = var.app_name
prefix = "clientname" # Used to avoid name collisions on global resources.
}
The file accounts/clientname/main.tf
would look very similar but set a different backend and a unique prefix.
Each account directory provides a place to plan and apply the Terraform run for the respective account. In our example, you might run terraform apply
like so. We used a .env file in each directory to automatically set all the secret environment variables that were used by Terraform:
cd env/cloudcity
source .env
terraform apply
Once the changes are working in our private Azure, we apply them to the client in the same way.
cd env/clientname
source .env
terraform apply
Since the directories are separate, Terraform can continue to use its local directory approach to managing state. By working with, instead of against, the way Terraform uses the working directory, we can use modules and standard Terraform commands to manage multiple accounts.
This pattern also allowed us a further benefit. We could automate the configuration of some of the “handmade” resources provided by our client. The Cloud City main.tf can create other resources before calling the main module, such as resource groups or networking basics. This allowed us to stand up our “blank” Azure environment to mimic the client’s environment.
If you add modules to this pattern, I suggest using a separate modules directory to keep most of the Terraform configuration organized as expected. This highlights the unique role of the main
directory.
pattern_with_modules
├── accounts
│ ├── clientname
│ │ ├── .env
│ │ ├── .terraform
│ │ ├── .terraform.lock.hcl
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ └── cloudcity
│ ├── .env
│ ├── .terraform
│ ├── .terraform.lock.hcl
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── main
│ ├── main.tf
│ ├── network.tf
│ ├── outputs.tf
│ └── variables.tf
└── modules
├── app
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
└── db
├── main.tf
├── outputs.tf
└── variables.tf
The main module pattern helps reduce risks of mistakes, uses plain-old Terraform and is more easily understood than solutions with complexity hidden in separate scripts, external tools or hidden configurations.
Creating abstractions that are readily visible in the repository and directly expose the multiple account structure ensures that new engineers will be able to understand quickly and apply the configuration safely and successfully.