The Ultimate Comparison: Terraform vs. CloudFormation

If you, like myself, have spent countless hours on the internet searching for a clear winner between CloudFormation and Terraform as your next infrastructure-as-code (IaC) solution, you’re not alone. I was in the same boat for quite a while. However, having now gained substantial experience using both, I feel equipped to make an informed comparison and choose the right tool for different situations.

TL;DR

For your IaC project on AWS, choose CloudFormation, because:

  1. CloudFormation makes a distinction between code (i.e., templates) and instantiations of the code (i.e., stacks). In Terraform, there is no such distinction. More on this in the next section.
  2. Terraform doesn't handle basic dependency management very well. More on that in a later section.

Comparing Code and Instantiations

One key distinction between CloudFormation and Terraform lies in how they handle the relationship between code and its actual implementation.

In CloudFormation, a stack represents the realized infrastructure defined by a template. A single template can be used repeatedly to create identical stacks by the same user, across multiple accounts, or even by different users entirely.

Terraform lacks this concept, instead requiring a one-to-one mapping between code and its instantiation. Imagine having to duplicate your web server’s source code for every server instance you want to run or copying your entire application code every time you need to execute it instead of just running a compiled binary.

This difference might seem trivial initially, but it can quickly become a significant bottleneck as your infrastructure scales. Terraform requires you to duplicate code every time you want to create a new stack from existing definitions. This copy-pasting of script files introduces a high risk of errors, potentially impacting resources you didn’t intend to modify.

Terraform’s lack of a stack concept, unlike CloudFormation, highlights its inherent design of a one-to-one relationship between code and managed resources. The introduction of environments (now called “workspaces”) offered a partial solution, but their implementation makes accidental deployments to the wrong environment all too easy. This is because you must explicitly select the target workspace using terraform workspace select before deployment; forgetting this step defaults to the previously selected workspace, which may not be your intended target.

While Terraform modules help alleviate this issue to some extent, they still require a considerable amount of boilerplate code. This problem was so significant that a wrapper tool, Terragrunt, was created to address it specifically.

State Management and Permissions

Another crucial difference lies in how CloudFormation and Terraform handle state management and permissions.

CloudFormation manages stack states automatically, offering no configuration options. While this may seem restrictive, my experience with CloudFormation stack states has been consistently reliable. Furthermore, CloudFormation allows users with limited permissions to manage stacks without needing all the permissions required by the stack’s resources. This is possible because CloudFormation can leverage the permissions granted to a service role associated with the stack, rather than relying solely on the user’s permissions.

Terraform, on the other hand, requires you to configure a backend for state management. The default option, a local file, is far from ideal due to its limitations:

  1. The integrity of your state file is directly tied to the reliability of the machine storing it.
  2. Collaboration becomes extremely difficult.

Therefore, a robust and shared state mechanism is necessary. On AWS, this is typically achieved using an S3 bucket for storing the state file and a DynamoDB table for handling concurrency.

This means you need to manually create and configure an S3 bucket and DynamoDB table for each stack you instantiate. You also need to manage permissions for these resources to restrict access for less privileged users. While manageable for a handful of stacks, this process becomes increasingly cumbersome as the number of stacks grows.

Furthermore, Terraform workspaces do not support per-workspace DynamoDB tables. This means that granting an IAM user minimal permissions for deployments allows them to potentially interfere with locks across all workspaces, as DynamoDB permissions cannot be scoped to the item level.

Dependency Management

Both CloudFormation and Terraform can present challenges when it comes to managing dependencies. Changing a resource’s logical ID (essentially its name) in either tool will usually result in the old resource being destroyed and a new one created. Therefore, it’s generally inadvisable to change logical IDs, especially for nested stacks in CloudFormation.

As mentioned earlier, Terraform struggles with basic dependency management. Unfortunately, the Terraform developers haven’t prioritized addressing the long-standing issue, despite the evident lack of workarounds.

Considering that robust dependency management is paramount for any IaC tool, such shortcomings in Terraform raise concerns about its suitability for critical operations, particularly deployments to production environments. CloudFormation conveys a more professional and dependable experience, reflecting AWS’s commitment to providing production-ready tools. Throughout my years of using CloudFormation, I haven’t encountered any issues related to dependency management.

CloudFormation allows a stack to export specific output variables, which can be utilized by other stacks. However, this functionality is somewhat limited, as you can only instantiate one stack per region due to the inability to export multiple variables with the same name and the lack of namespaces for exported variables.

Terraform lacks such features, leaving you with less desirable workarounds. You can import another stack’s state, but this grants access to all information within that stack, including potentially sensitive secrets stored in the state. Alternatively, a stack can export variables through a JSON file stored in an S3 bucket. This approach introduces additional complexity, requiring you to choose an S3 bucket, configure appropriate permissions, and write custom code to handle the export and import processes.

One advantage Terraform offers is its support for data sources. This allows Terraform to query resources not under its direct management. However, in practice, this has limited relevance when aiming for generic templates, as you cannot make assumptions about the target account. CloudFormation’s equivalent is to introduce more template parameters, potentially increasing repetition and the possibility of errors. However, in my experience, this has never presented a significant problem.

Returning to Terraform’s dependency management issues, another example is encountering an error when updating a load balancer’s configuration:

1
2
3
Error: Error deleting Target Group: ResourceInUse: Target group 'arn:aws:elasticloadbalancing:us-east-1:723207552760:targetgroup/strategy-api-default-us-east-1/14a4277881e84797' is currently in use by a listener or a rule

    status code: 400, request id: 833d8475-f702-4e01-aa3a-d6fa0a141905

Ideally, Terraform should recognize that the target group is a dependency for another resource that isn’t being deleted and avoid attempting to delete it—without throwing an error.

Operations

Despite being a command-line tool, Terraform’s interactive nature suggests it’s primarily designed for human interaction. While batch mode execution (e.g., from a script) is possible, it requires additional command-line arguments. This design choice for an IaC tool focused on automation is perplexing.

Terraform can be difficult to debug. Error messages are often quite basic and lack sufficient information for troubleshooting. In such cases, you’re forced to run Terraform with TF_LOG=debug, generating a massive amount of output to sift through. To complicate matters further, Terraform occasionally makes API calls to AWS that fail due to external factors unrelated to Terraform itself. In contrast, CloudFormation provides relatively clear error messages with enough detail to pinpoint the root cause.

Consider this example of a Terraform error message:

1
2
3
Error: error reading S3 bucket Public Access Block: NoSuchBucket: The specified bucket does not exist

    status code: 404, request id: 19AAE641F0B4AC7F, host id: rZkgloKqxP2/a2F6BYrrkcJthba/FQM/DaZnj8EQq/5FactUctdREq8L3Xb6DgJmyKcpImipv4s=

This message, while appearing to indicate an error, doesn’t accurately reflect the underlying problem, which in this case was a permissions issue.

Furthermore, this error highlights how Terraform can sometimes get stuck. For example, if you create an S3 bucket and an aws_s3_bucket_public_access_block resource for that bucket, and then modify your Terraform code in a way that necessitates deleting and recreating the bucket (e.g., due to the “change implies delete and create” behavior mentioned earlier), Terraform will repeatedly try and fail to load the aws_s3_bucket_public_access_block resource. The expected behavior would be for Terraform to update or delete the aws_s3_bucket_public_access_block resource accordingly.

Lastly, you cannot leverage CloudFormation helper scripts with Terraform. This might be an inconvenience, especially if you intend to use cfn-signal, which informs CloudFormation when an EC2 instance has finished initializing and is ready to handle requests.

Syntax, Community, and Rollbacks

One notable advantage of Terraform’s syntax over CloudFormation is its support for loops. However, my experience suggests that this feature can introduce risks. Loops are typically used to create multiple identical resources; however, updating the stack with a different loop count might require linking old and new resources. For example, you might need to use zipmap() to combine values from two arrays of different sizes (one with the old loop size and the other with the new one). While this issue can occur without loops, their presence tends to obfuscate the problem.

The preference between Terraform’s and CloudFormation’s syntax is subjective. While CloudFormation initially only supported JSON, which can be difficult to read, it now also supports the more readable YAML format with comments. However, CloudFormation’s syntax can be quite verbose.

Terraform uses HCL, a JSON-like configuration language with its own idiosyncrasies. It offers a wider range of functions compared to CloudFormation, and they are generally easier to understand. Therefore, one could argue that Terraform has a slight edge in this regard.

Another advantage for Terraform is its extensive collection of community-maintained modules, simplifying template creation. However, these modules might not always meet the security requirements of certain organizations. Therefore, reviewing these modules (including future updates) might be necessary for security-conscious organizations.

Generally speaking, Terraform modules offer greater flexibility than CloudFormation nested stacks. Nested stacks tend to obscure their internal details. When performing an update operation from the parent stack, CloudFormation will indicate that the nested stack requires an update but won’t provide specifics about the changes within the nested stack.

Finally, it’s worth noting that CloudFormation attempts to roll back deployments that encounter errors. While this is a valuable feature, it can be time-consuming. For instance, CloudFormation might take up to three hours to determine that a deployment to Elastic Container Service has failed. In contrast, Terraform simply halts execution upon encountering an error. Whether a rollback mechanism is beneficial is debatable; however, I’ve come to appreciate the effort to maintain a working stack whenever possible, even if it means a longer wait time.

Arguments for Terraform over CloudFormation

Terraform does offer some advantages over CloudFormation. A key benefit, in my opinion, is its comprehensive display of all proposed changes before applying an update, including drilling down into utilized modules. In contrast, CloudFormation, when using nested stacks, only indicates that a nested stack needs updating without providing detailed insights into the specific changes within. This lack of transparency can be frustrating, as such information is crucial before initiating a deployment.

Both CloudFormation and Terraform support extensions. CloudFormation allows managing “custom resources” using custom AWS Lambda functions. However, Terraform extensions are much simpler to develop and integrate directly into the code, giving Terraform an advantage in this area.

Terraform’s support for multiple cloud vendors allows it to unify deployments across different cloud platforms. For instance, if you have a single workload spanning both AWS and Google Cloud Platform (GCP), you would typically manage the AWS portion using CloudFormation and the GCP portion using GCP’s Cloud Deployment Manager. With Terraform, you can use a single script to manage both stacks, simplifying deployments and management.

Invalid Arguments in the Terraform vs. CloudFormation Debate

Several misconceptions persist around the internet regarding the Terraform vs. CloudFormation debate. One common argument is that Terraform’s multi-cloud nature makes it a single tool for all deployments, regardless of the cloud platform. While technically true, this argument loses its weight when dealing with typical single-cloud projects. The reality is that there’s often a near one-to-one mapping between resources defined in CloudFormation and their equivalents in Terraform. Since you need to understand cloud-specific resources either way, the primary difference boils down to syntax, which isn’t the most significant challenge in infrastructure management.

Another claim is that Terraform helps avoid vendor lock-in. This argument is flawed because using Terraform locks you into HashiCorp (Terraform’s creator), just as using CloudFormation ties you to AWS, and so on for other platforms.

While Terraform modules are generally easier to use, I believe this is a minor factor. Firstly, AWS likely avoids hosting a centralized repository for community-contributed CloudFormation templates to mitigate the risk of security vulnerabilities and compliance issues arising from user-generated content.

Personally, I recognize the value of libraries in software development, where they often comprise tens of thousands of code lines. However, IaC codebases tend to be significantly smaller, with modules typically consisting of a few dozen lines. In this context, copy-pasting code snippets is an acceptable practice, as it allows for easy customization and avoids compatibility issues and reliance on third-party security.

While generally discouraged in software development, copy-pasting code snippets in IaC allows for tailoring code to your specific needs without needing to create generic libraries, which can be time-consuming to maintain. The maintenance overhead for these snippets is usually minimal unless your code is replicated across numerous templates (e.g., a dozen or more). In such cases, refactoring the code into nested stacks is a better approach, and the benefits of reducing code duplication outweigh the inconvenience of limited visibility into nested stack updates during deployments.

CloudFormation vs. Terraform: Concluding Thoughts

With CloudFormation, AWS aims to provide its customers with a robust and reliable IaC tool that consistently performs as expected. Terraform’s team strives for the same; however, it seems that a critical aspect, dependency management, hasn’t received the attention it deserves.

Terraform might be a valuable asset for projects with multi-cloud architectures, as it can unify resource management across different cloud providers. Even then, you could mitigate Terraform’s downsides by using it solely to orchestrate stacks already implemented using their respective cloud-specific IaC tools.

Overall, when comparing CloudFormation vs. Terraform, CloudFormation emerges as the more professional and dependable choice despite its imperfections. I highly recommend it for any project that isn’t explicitly multi-cloud.

Licensed under CC BY-NC-SA 4.0