· 11 min read

AWS Cost Optimization for Terraform Users: Catch Waste Before It Deploys

Most AWS cost problems are baked into infrastructure before anyone notices. A developer picks m5.xlarge because it was in the last project’s template. Another leaves gp2 as the default volume type because that’s what the Terraform docs showed three years ago. A staging environment gets the same instance sizes as production because nobody questioned the defaults.

By the time these choices show up in your AWS bill, they’ve been running for weeks. The fix is straightforward: catch cost waste at the Terraform layer, before terraform apply ever runs.

This guide covers practical techniques for embedding cost awareness into your Terraform workflow, from HCL patterns to CI/CD integration to policy enforcement.

Why Cost Problems Start in Terraform

Terraform codifies infrastructure decisions. That’s the whole point. But it also codifies cost decisions, and most teams don’t treat them that way.

Here are the patterns that quietly inflate AWS bills:

Copy-paste instance sizing. Someone provisions an m5.2xlarge for a service that peaks at 15% CPU. The next team copies that module. Now you have four services running on oversized instances.

Stale defaults. Terraform’s aws_ebs_volume resource defaults to gp2 if you don’t specify a type. GP2 costs $0.10/GB-month. GP3 costs $0.08/GB-month with better baseline performance (3,000 IOPS and 125 MB/s included). That’s a 20% markup for worse performance, applied to every volume that uses the old default.

No cost visibility in reviews. Pull requests show HCL diffs. They don’t show that changing instance_type = "t3.medium" to instance_type = "r5.4xlarge" adds $750/month. Without cost context, reviewers can’t make informed decisions.

Previous-generation instances. Teams using m5, c5, or t3 families are paying 20-40% more per unit of compute than current-generation Graviton alternatives. An m7g.large costs $0.0816/hr and delivers better performance than an m5.large at $0.096/hr.

Infracost for Pre-Deploy Cost Estimation

Infracost parses terraform plan output, maps resources to AWS pricing, and generates a cost breakdown. It supports over 1,100 Terraform resources across AWS, Azure, and Google Cloud.

Basic usage

# Generate a Terraform plan
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > plan.json

# Get cost breakdown
infracost breakdown --path plan.json

This produces output showing monthly cost estimates per resource. More useful is the diff command, which compares costs between your current state and proposed changes:

infracost diff --path plan.json

The diff output shows exactly which resources cost more or less, with dollar amounts. This is what should appear in every pull request.

What Infracost catches

A real example. Say you have this change in a PR:

resource "aws_instance" "api" {
-  instance_type = "t3.medium"
+  instance_type = "r5.4xlarge"
   ami           = data.aws_ami.ubuntu.id
}

Infracost will flag this as a jump from roughly $30/month to $730/month. Without Infracost, that change looks like a one-line diff. With it, the cost impact is impossible to miss.

Infracost with usage estimates

For resources with usage-based pricing (S3 requests, Lambda invocations, data transfer), Infracost supports a usage.yml file:

version: 0.1
resource_usage:
  aws_lambda_function.api:
    monthly_requests: 10000000
    request_duration_ms: 200
  aws_s3_bucket.logs:
    monthly_storage_gb: 500
    monthly_put_requests: 1000000

This gives you realistic estimates instead of just the fixed infrastructure costs.

Terraform Cost-Aware Patterns

Some cost wins can be encoded directly in your HCL. These patterns should be standard in every Terraform module.

GP3 over GP2

Every EBS volume should use gp3 unless you have a specific reason not to. The savings are automatic: 20% cheaper storage, better baseline IOPS, and configurable throughput.

resource "aws_ebs_volume" "data" {
  availability_zone = "us-east-1a"
  size              = 100
  type              = "gp3"
  iops              = 3000    # included free, up to 16,000
  throughput        = 125     # included free, up to 1,000 MB/s

  tags = {
    Name        = "app-data"
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

For volumes that need high IOPS, GP3 saves even more. Provisioning 10,000 IOPS on GP3 costs $35/month. Getting the same IOPS on GP2 requires a 3,334 GB volume at $333/month. That’s a 90% reduction.

Graviton instances

Graviton-based instances (M7g, C7g, R7g, T4g families) deliver 20-40% better price-performance than x86 equivalents. If your workload runs on Linux and doesn’t depend on x86-specific binaries, Graviton should be the default.

variable "instance_type" {
  description = "EC2 instance type. Prefer Graviton (g suffix) for cost efficiency."
  type        = string
  default     = "m7g.large"  # $0.0816/hr vs m5.large at $0.096/hr

  validation {
    condition     = can(regex("^(t4g|m7g|c7g|r7g|m6g|c6g|r6g)", var.instance_type))
    error_message = "Use Graviton instance types for better price-performance."
  }
}

That validation block is aggressive on purpose. It forces a conversation if someone tries to use x86 instances. Remove it if you have legitimate x86 requirements, but make the Graviton path the default.

Spot instances for fault-tolerant workloads

Spot instances offer up to 90% savings over on-demand pricing. They work well for batch jobs, CI/CD runners, stateless API workers, and dev/test environments.

resource "aws_autoscaling_group" "workers" {
  desired_capacity = 4
  max_size         = 8
  min_size         = 2

  mixed_instances_policy {
    instances_distribution {
      on_demand_base_capacity                  = 1
      on_demand_percentage_above_base_capacity = 25
      spot_allocation_strategy                 = "capacity-optimized"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.worker.id
        version            = "$Latest"
      }

      override {
        instance_type = "m7g.large"
      }
      override {
        instance_type = "m6g.large"
      }
      override {
        instance_type = "c7g.large"
      }
    }
  }
}

This configuration keeps one on-demand instance as a baseline and runs 75% of additional capacity on Spot. Multiple instance type overrides improve Spot availability.

S3 lifecycle rules

S3 Standard is the most expensive storage tier. If your bucket holds logs, backups, or any data with declining access frequency, lifecycle rules cut costs significantly.

resource "aws_s3_bucket_lifecycle_configuration" "logs" {
  bucket = aws_s3_bucket.logs.id

  rule {
    id     = "archive-old-logs"
    status = "Enabled"

    transition {
      days          = 30
      storage_class = "STANDARD_IA"   # ~45% cheaper
    }

    transition {
      days          = 90
      storage_class = "GLACIER"        # ~80% cheaper
    }

    expiration {
      days = 365
    }
  }
}

Using terraform plan to Catch Expensive Changes

terraform plan is a cost review tool if you treat it as one. Before applying any change, inspect the plan output for these patterns:

Instance type changes. Any change to instance_type should trigger a cost check. Going from t3.medium to t3.xlarge doubles your compute cost.

Volume size increases. EBS volumes only grow, never shrink. A change from 100 GB to 500 GB is permanent and adds $32/month (GP3).

New NAT Gateways. Each NAT Gateway costs $32.40/month plus $0.045/GB for data processing. A plan that adds NAT Gateways across three AZs adds $97/month in fixed costs alone.

RDS Multi-AZ changes. Enabling Multi-AZ doubles your RDS cost. On an db.r6g.xlarge, that’s an extra $540/month.

# Save plan output for review
terraform plan -out=tfplan

# Show the plan in human-readable format
terraform show tfplan

# For automated parsing
terraform show -json tfplan | jq '.resource_changes[] |
  select(.change.actions[] == "update" or .change.actions[] == "create") |
  {address: .address, actions: .change.actions}'

Pair this with Infracost’s diff command to get dollar amounts alongside the resource changes.

Policy-as-Code for Cost Guardrails

Cost awareness in reviews helps. Automated enforcement prevents mistakes from reaching production.

Open Policy Agent (OPA)

OPA is an open-source, CNCF-graduated policy engine. Combined with Infracost, you can enforce cost limits on every Terraform change.

# policy/cost.rego
package terraform.cost

deny[msg] {
  input.totalMonthlyCost > 5000
  msg := sprintf("Total monthly cost $%.2f exceeds $5,000 limit", [input.totalMonthlyCost])
}

deny[msg] {
  r := input.projects[_].breakdown.resources[_]
  r.monthlyCost > 1000
  msg := sprintf("Resource %s costs $%.2f/month, exceeds $1,000 single-resource limit",
    [r.name, r.monthlyCost])
}

deny[msg] {
  input.diffTotalMonthlyCost / input.pastTotalMonthlyCost > 0.15
  msg := "Cost increase exceeds 15% of current baseline. Requires team lead approval."
}

Run with:

infracost breakdown --path plan.json --format json > cost.json
opa eval --data policy/cost.rego --input cost.json "data.terraform.cost.deny"

tflint for resource-level rules

tflint catches specific HCL anti-patterns. Custom rules can enforce cost hygiene:

# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.34.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "aws_instance_previous_type" {
  enabled = true
}

rule "aws_db_instance_previous_type" {
  enabled = true
}

These rules flag previous-generation instance types (m4, c4, t2, etc.) that cost more for less performance.

HashiCorp Sentinel

For teams using HCP Terraform (formerly Terraform Cloud), Sentinel provides native policy enforcement:

# sentinel/restrict-instance-size.sentinel
import "tfplan/v2" as tfplan

max_monthly_cost = 500

ec2_instances = filter tfplan.resource_changes as _, rc {
  rc.type is "aws_instance" and
  rc.change.actions contains "create"
}

instance_size_check = rule {
  all ec2_instances as _, instance {
    instance.change.after.instance_type not in [
      "m5.4xlarge", "m5.8xlarge", "m5.12xlarge",
      "r5.4xlarge", "r5.8xlarge", "r5.12xlarge",
    ]
  }
}

main = rule { instance_size_check }

Right-Sizing Resources in HCL

The best cost optimization happens at the module level. Build sensible defaults into your Terraform modules so teams start with right-sized resources.

# modules/api-service/variables.tf

variable "environment" {
  type = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

variable "instance_type" {
  type    = string
  default = null  # Computed from environment
}

locals {
  # Environment-appropriate defaults
  instance_type_map = {
    dev     = "t4g.small"     # $0.0168/hr
    staging = "t4g.medium"    # $0.0336/hr
    prod    = "m7g.large"     # $0.0816/hr
  }

  effective_instance_type = coalesce(
    var.instance_type,
    local.instance_type_map[var.environment]
  )
}

This pattern prevents developers from accidentally deploying production-sized instances in dev. The t4g.small for dev costs $12/month. If someone were to use the production default m7g.large, that’s $60/month per instance. Across 20 dev instances, that’s a $960/month difference.

The coalesce function lets teams override the default when they genuinely need more capacity, while keeping the cost-efficient option as the default path.

Tagging Strategy for Cost Allocation

Tags are the bridge between Terraform resources and your AWS bill. Without consistent tags, you can’t attribute costs to teams, services, or environments. Cost allocation reports become useless.

Enforce tags at the Terraform level with a default_tags block:

provider "aws" {
  region = var.region

  default_tags {
    tags = {
      Environment = var.environment
      Team        = var.team
      Service     = var.service_name
      ManagedBy   = "terraform"
      CostCenter  = var.cost_center
    }
  }
}

Then enforce their presence with a validation:

variable "cost_center" {
  type        = string
  description = "Cost center code for billing allocation. Required."

  validation {
    condition     = length(var.cost_center) > 0
    error_message = "cost_center is required for cost allocation."
  }
}

For tflint, add a rule that requires specific tags on expensive resources:

# .tflint.hcl
rule "aws_resource_missing_tags" {
  enabled = true
  tags    = ["Environment", "Team", "CostCenter"]
}

Consistent tagging combined with AWS Cost Explorer’s tag-based filtering lets you answer questions like “how much does the payments team spend on RDS in staging?” Without tags, you’re guessing.

CI/CD Pipeline Integration

The highest-leverage move is making cost checks automatic. Here’s a GitHub Actions workflow that runs Infracost on every pull request:

# .github/workflows/terraform-cost.yml
name: Terraform Cost Check

on:
  pull_request:
    paths:
      - 'infra/**'

jobs:
  cost-check:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write

    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.9

      - name: Terraform Init
        run: terraform init
        working-directory: infra/

      - name: Terraform Plan
        run: terraform plan -out=tfplan.binary
        working-directory: infra/
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

      - name: Convert Plan to JSON
        run: terraform show -json tfplan.binary > plan.json
        working-directory: infra/

      - name: Setup Infracost
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}

      - name: Generate Cost Diff
        run: |
          infracost diff \
            --path=infra/plan.json \
            --format=json \
            --out-file=/tmp/infracost.json

      - name: Post Cost Comment
        uses: infracost/actions/comment@v3
        with:
          path: /tmp/infracost.json
          behavior: update

This posts a cost breakdown directly in the pull request. Reviewers see the dollar impact of every infrastructure change without leaving GitHub.

Adding a cost gate

To block merges that exceed a cost threshold, add an OPA evaluation step after the Infracost diff:

- name: Cost Policy Check
  run: |
    opa eval \
      --data policy/cost.rego \
      --input /tmp/infracost.json \
      --format pretty \
      "data.terraform.cost.deny" | tee /tmp/policy-result.txt

    if grep -q "deny" /tmp/policy-result.txt; then
      echo "Cost policy violation detected. See Infracost comment for details."
      exit 1
    fi

This turns cost limits from suggestions into hard gates. Changes that exceed your thresholds can’t merge without explicit override.

Drift Detection

Terraform manages the desired state of your infrastructure. But the actual state can diverge. Someone changes an instance type in the console. An Auto Scaling event adds instances that aren’t in state. A manual security group rule opens an expensive data transfer path.

Drift has direct cost implications. An RDS instance manually upgraded from db.r6g.xlarge to db.r6g.4xlarge in the console adds roughly $1,600/month, and Terraform doesn’t know about it.

Detecting drift

# Compare actual infrastructure against Terraform state
terraform plan -detailed-exitcode

# Exit code 2 means drift detected
# Exit code 0 means no changes
# Exit code 1 means error

Run this on a schedule in CI:

# .github/workflows/drift-check.yml
name: Drift Detection

on:
  schedule:
    - cron: '0 6 * * 1' # Every Monday at 6 AM

jobs:
  drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3

      - name: Check for Drift
        id: drift
        run: |
          terraform init
          terraform plan -detailed-exitcode 2>&1 | tee drift-output.txt
          echo "exit_code=$?" >> $GITHUB_OUTPUT
        continue-on-error: true
        working-directory: infra/

      - name: Alert on Drift
        if: steps.drift.outputs.exit_code == '2'
        run: |
          curl -X POST "${{ secrets.SLACK_WEBHOOK }}" \
            -H 'Content-Type: application/json' \
            -d '{"text":"Infrastructure drift detected. Review drift-output.txt in the latest workflow run."}'

Drift detection catches the cost waste that happens between deployments. But it only tells you about resources Terraform manages. For a full picture of running costs, including resources created outside Terraform, you need runtime visibility.

CostPatrol scans your live AWS environment and catches waste regardless of how resources were created. It flags previous-generation instances, oversized EBS volumes, idle resources, and other cost anomalies across your entire account, not just what’s in your Terraform state.

Terraform Cost Hygiene Checklist

Run through this list before every infrastructure PR:

  • All EBS volumes use gp3 unless io2 is specifically required
  • Instance types use current-generation families (M7g, C7g, R7g, T4g)
  • Graviton (arm64) instances are used where workloads support it
  • Dev and staging environments use smaller instance types than production
  • S3 buckets have lifecycle rules for infrequently accessed data
  • NAT Gateway usage is minimized (VPC endpoints for S3, DynamoDB)
  • Spot instances are used for fault-tolerant workloads
  • All resources have Environment, Team, and CostCenter tags
  • Infracost diff shows no unexpected cost increases
  • terraform plan output reviewed for resource count and type changes
  • No previous-generation instance types (m4, c4, r4, t2, m3)
  • RDS Multi-AZ is only enabled in production
  • CloudWatch log retention is set (not infinite)
  • Unused Elastic IPs, load balancers, and snapshots are cleaned up

CostPatrol: Runtime Complement to Terraform

Terraform controls what gets deployed. But cost optimization doesn’t end at deploy time.

Resources drift. Usage patterns change. A service that needed an r6g.2xlarge at launch might average 8% memory utilization three months later. Savings Plans eligibility shifts as your workload mix evolves. New instance families launch with better price-performance ratios.

This is where runtime cost scanning fills the gap. CostPatrol continuously monitors your AWS environment and surfaces optimization opportunities that Terraform can’t see at plan time:

Utilization-based right-sizing. CostPatrol analyzes CloudWatch metrics to find instances where actual CPU and memory usage is consistently below capacity. A Terraform module can set sensible defaults, but only runtime data tells you if those defaults are still right.

Cross-resource anomaly detection. A sudden spike in NAT Gateway data processing costs or an unexpected jump in DynamoDB read capacity doesn’t show up in terraform plan. CostPatrol detects these anomalies and alerts before they compound.

Full-account coverage. Not everything lives in Terraform. Manually created resources, legacy infrastructure, and resources from other IaC tools all contribute to your bill. CostPatrol scans everything in the account.

The workflow looks like this: Terraform and Infracost prevent cost waste at deploy time. CostPatrol catches the waste that emerges after deployment. Use both, and cost surprises get rare.

CostPatrol runs a free scan that covers the most common optimization checks across EC2, RDS, EBS, Lambda, S3, DynamoDB, CloudWatch Logs, and NAT Gateway. If you’re already using Terraform for cost-aware deployments, adding runtime scanning closes the last gap in your cost optimization pipeline.

See what CostPatrol finds in your AWS account

Free scan shows your total savings. Upgrade to Pro for full findings, fix commands, and daily Slack alerts.