Hackfoofery

Alson Kemp

Organizing Terraform Projects

At Teckst, we use Terraform for all configuration and management of infrastructure.  The tidal boundary at the intersection of infrastructure and application configuration is largely determined by which kinds of applications will be deployed on which kinds of infrastructure.  Standing up a bunch of customized EC2 instances (e.g.Cassandra)?  Probably something that Ansible or Chef is better suited to as they’re built to step into a running instance and create and/or update its configuration (though, certainly, this can be done via Terraform and EC2 User Data scripts).

Teckst uses 100% AWS services and 100% Lambda for compute so we have a much more limited need.  We need Lambda Functions, API Gateways, SQS Queues, S3 Buckets, IAM Users, etc to be created and wired together; thereafter, our Lambda Function are uploaded by our CI system and run over the configured AWS resources.  In this case, Terraform is perfect for us as it walks our infrastructure right up to the line at which our Lambda Functions take over.

Terraform’s documentation provides little in the way of guidance on structuring larger Terraform projects.  The docs do talk about modules and outputs, but no fleshed-out examples are provided for how you should structure your project.  That said, many guides are available on the web ([1][2][3] are the top three Google results as of this writing).

Terraform Modules

Terraform Modules allow you to create modules of infrastructure which accept/require specific Variables and yield specific Outputs.  Besides being a great way to utilize third-party scripts (e.g. a script you find on Github to build a fully configured EC2 instance with Nginx fronting a Django application), Modules allow a clean, logical separation between environments (e.g. Production and Staging).  A good example of organizing a project using Terraform Modules is given in this blog post.  Initially, we approached organizing scripts similarly:

/prod/main.tf
/prod/vpc.tf - production configs for VPC module 
/staging/main.tf
/staging/vpc.tf - staging configs for VPC module
/modules/vpc/main.tf - contains staging configs for VPC module
/modules/vpc/variables.tf
/modules/vpc/outputs.tf

Now all of our prod configuration values are separate from our staging configuration values.  The prod and staging scripts could reference our generic vpc Module.  Initially, this seemed like a huge win.  Follow on to find out how it might not be a win for in-house-defined infrastructure.

Terraform Modules and First-Party Scripts

While Terraform Modules are a great way to use third party Terraform scripts to build your infrastructure, they felt awkward when using only first party scripts/definitions.  Reviewing other ways of organizing the scripts, they seemed to rely on Copy-and-Paste and that’s a Bad Thing, right?  But what if Copy-and-Paste is unavoidable?  With Terraform Modules, each module defines a set of Variables it accepts and it defines a list of Outputs it outputs.  In order to utilize Modules in first-party circumstances, you will need to determine a naming scheme for the Variables and Outputs (when using a third-party module, you must adopt their naming scheme).  Any mismatches between the names of Variables or Outputs will yield baffling error messages.  So a lot of time is spent threading Variables up into sub-Modules only to grab and thread Outputs from those sub-Modules down and then up into other sub-Modules via other Variables.  The best example of this is VPC information: nearly every Module will need to have the VPC.id threaded up into the module.  The safest way to make sure your Variable declarations line up with the Module’s Variable requirements is to Copy-and-Paste the names to-from the Modules.  Consider your VPC and Route53 scripts:

/main.tf
/terraform.tfvars
/vpc/main.tf
/vpc/outputs.tf
/vpc/variables.tf
/route53/main.tf
/route53/outputs.tf
/route53/variables.tf

We’ll need to:

  1. In main.tf: reference the vpc Module, passing in appropriate Variables.
  2. In /vpc/variables.tf: define Variables this Module will accept.
  3. In /vpc/outputs.tf: map the Module’s configuration values to the Module’s Outputs.
  4. In /vpc/main.tf: create the VPC using the Variables specified in /main/variables.tf.
  5. In /main.tf: reference the route53 Module, passing in as Variables the Outputs retrieved from the vpc Module.
  6. Then do Steps 2-5 for the /route53 Module.

The above has three significant issues:

  1. It’s tedious.  That’s something like 11 steps to create a VPC and Route53 domain.  That’s about 8 lines in a single Terraform script…  Here we’ve got 2-3 times that scattered across 7 files.
  2. It’s error-prone.  Didn’t get your naming convention right for one of the Outputs of some Module?  That’s going to happen.  And the Terraform errors aren’t too helpful in finding the issue.
  3. It’s based on Copy-and-Paste!  The root of all evil!  Sure, you don’t need to Copy-and-Paste but you’re going to do so as, for the hundredth time, you write into a Variable declaration the name of a Module’s Output…  And as you mispell the Output‘s name…

Less-Bad Terraform Organization

So I’m not sure we can avoid Copy-and-Paste.  If we can’t avoid it, can we organize our Terraform scripts in such a way that we minimize likely harm?  We approached it as follows, attempting to:

  • Strictly limit the surface area of the production and staging configurations so that it was obvious which was which.
  • Eliminate the differences in everything outside of the above differences in production and staging configurations areas.
  • Make clear how to migrate staging infrastructure alterations forward to production infrastructure.
  • Make clear where Copy-and-Paste is allowed.
  • Simplify our usage of Terraform.

We wound up with the following (very simplified) organization:

/prod/vpc.tf
/prod/route53.tf
/prod/variables.tf
/staging/vpc.tf
/staging/route53.tf
/staging/variables.tf

That’s it.  What’s this get us:

  • Very simple Terraform usage.  No need to trace Variables and Outputs up/down through Modules.  route53.tf can directly reference aws_vpc.main.id.
  • Very limited surface area for configuration.  All environment specific configuration values go in variables.tf.  It’s easy to compare the prod and staging variables.tf to see new or changed variables.
  • Simple migration to prod.  Just meld (or diff) staging and production.  We know the variables.tf will differ, so, with suitable review, the content of staging can be quickly and easily pulled into production.

An example of the content of /staging/variables.tf:

########################
# ACCOUNT SETTINGS
########################
variable "environment"    {default = "staging"}
variable "account_id"     {default = "acct-staging"}
variable "account_number" {default = 123456789 }
variable "aws_region"     {default = "us-east-1"}

########################
# DOMAIN SETTINGS
########################
variable "root_domain"         {default = "not-real.com"}
variable "root_private_domain" {default = "not-real.internal"}

Additional Considerations/Practices

Is it that simple?  Not quite, but it’s close.  We have some development resources in our staging environment that don’t exist in our production environment (looking at you crazy MS SQL Server instance) but it’s pretty obvious to anyone involved in the infrastructure that those resources are staging only.

Currently, one or two people do all of the infrastructure development, so we haven’t really wrestled with conflicts in the terraform.tfstate but that will be an issue regardless of the organization of Terraform files.

Written by alson

February 14th, 2018 at 11:26 am

Posted in Turbinado

without comments

Leave a Reply