GCP Instance Scheduler using Terraform

GCP Instance Scheduler using Terraform

Managing cloud resources efficiently is crucial for optimizing costs and ensuring that resources are only utilized when needed. One common requirement is to automatically shut down virtual machines (VMs) during non-working hours to save costs. In this blog post, we will walk through how to set up a Google Cloud Platform (GCP) Instance Scheduler using Terraform, which will automatically turn off labeled VMs at a specified time every day.

Introduction

Google Cloud Scheduler, Pub/Sub, and Cloud Functions can be orchestrated together to create an automated instance scheduler. With Terraform, this setup becomes manageable and reproducible. Here, we will define Terraform configurations to:

  1. Create a Pub/Sub topic.

  2. Set up a Cloud Scheduler job.

  3. Deploy a Cloud Function to stop VMs based on labels.

  4. Manage IAM roles and permissions.

Terraform Configuration

Let's break down the Terraform files used in this setup:

1. Variables Definition (variables.tf)

variable "gcp_project" {
  default = "your gcp project_id here"
}

variable "cron_pattern" {
  default = "59 23 * * *" # set to every day, at 23:59
}

variable "scheduler_function_bucket" {
  default = "your bucket name here"
}

variable "label_key" {
  default = "instance-scheduler"
}

variable "label_value" {
  default = "enabled"
}

In the variables.tf file, we define variables to hold values for the GCP project ID, the cron pattern for scheduling, the name of the storage bucket, and the labels used to identify the VMs to be managed.

2. Provider Configuration (provider.tf)

provider "google" {
  project = var.gcp_project
  region  = "us-central1"
}

This configures the Google provider with the specified project ID and region.

3. Main Terraform Configuration (main.tf)

Pub/Sub Topic

resource "google_pubsub_topic" "topic" {
  name = "instance-scheduler-topic"
}

Creates a Pub/Sub topic which the Cloud Scheduler job will publish messages to.

Cloud Scheduler Job

resource "google_cloud_scheduler_job" "cr_job" {
  name        = "instance-scheduler"
  description = "Cloud Scheduler to turn off labeled VMs."
  schedule    = var.cron_pattern

  pubsub_target {
    topic_name = google_pubsub_topic.topic.id
    data       = base64encode("foo, bar..")
  }
}

Sets up a Cloud Scheduler job that triggers according to the cron pattern defined, publishing a message to the Pub/Sub topic.

Storage Bucket for Cloud Function

resource "google_storage_bucket" "bucket" {
  name = var.scheduler_function_bucket
}

resource "google_storage_bucket_object" "archive" {
  name   = "function.zip"
  bucket = google_storage_bucket.bucket.name
  source = "gcp_function/function.zip"
}

Defines a Google Cloud Storage bucket and uploads the Cloud Function code as a ZIP file.

Service Account and IAM Roles

resource "google_service_account" "svc_acc" {
  account_id   = "instance-scheduler-svc-acc"
  display_name = "instance-scheduler-svc-acc"
}

resource "google_project_iam_custom_role" "svc_acc_custom_role" {
  role_id     = "instance.scheduler"
  title       = "Instance Scheduler Role"
  description = "Ability to turn off instances with a specific label."
  permissions = [
    "compute.instances.list",
    "compute.instances.stop",
    "compute.zones.list",
  ]
}

resource "google_project_iam_member" "svc_acc_iam_member" {
  project = var.gcp_project
  role    = "projects/${var.project}/roles/${google_project_iam_custom_role.svc_acc_custom_role.role_id}"
  member  = "serviceAccount:${google_service_account.svc_acc.email}"

  depends_on = [
    google_service_account.svc_acc
  ]
}

Creates a service account and assigns it a custom role with permissions to list and stop instances, and list zones.

Cloud Function

resource "google_cloudfunctions_function" "instance_scheduler_function" {
  name                  = "instance-scheduler-function"
  available_memory_mb   = 128
  source_archive_bucket = google_storage_bucket.bucket.name
  source_archive_object = google_storage_bucket_object.archive.name
  runtime               = "python38"
  description           = "Cloud function to do the instance scheduling."

  event_trigger {
    event_type = "google.pubsub.topic.publish"
    resource   = google_pubsub_topic.topic.name
    failure_policy {
      retry = false
    }
  }

  timeout               = 180
  entry_point           = "instance_scheduler_start"
  service_account_email = google_service_account.svc_acc.email

  environment_variables = {
    PROJECT     = var.gcp_project
    LABEL_KEY   = var.label_key
    LABEL_VALUE = var.label_value
  }

  depends_on = [
    google_service_account.svc_acc
  ]
}

Deploys a Cloud Function that is triggered by messages published to the Pub/Sub topic. It uses the service account and stops instances based on the specified labels.

Python Cloud Function Code (main.py)

The Cloud Function is implemented in Python to authenticate with the GCP API, list instances, and stop those that match the specified labels.

import base64
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import os

API_VERSION = 'v1'
RESOURCE_TYPE = 'compute'

def authenticate():
    try:
        credentials = GoogleCredentials.get_application_default()
        service = discovery.build(RESOURCE_TYPE, API_VERSION, credentials=credentials, cache_discovery=False)
        return service
    except Exception as error:
        return error

def gather_zones(project, service):
    try:
        zones = service.zones().list(project=project).execute()
        zone_list = [zone['name'] for zone in zones['items']]
        return zone_list
    except Exception as error:
        return error

def turn_instance_off(project, service, instance, zone):
    try:
        service.instances().stop(project=project, zone=zone, instance=instance).execute()
        print(f"Successfully turned off VM {instance} in project {project}, zone {zone}.")
    except Exception as error:
        return error

def locate_instances(project, service, zones, label_key, label_value):
    try:
        for zone in zones:
            instances = service.instances().list(project=project, zone=zone, filter=f"labels.{label_key}={label_value}").execute()
            if 'items' in instances:
                for instance in instances['items']:
                    if instance['status'] == "RUNNING":
                        turn_instance_off(project, service, instance['name'], zone)
    except Exception as error:
        return error

def instance_scheduler_start(event, context):

    project = os.environ.get('PROJECT')
    label_key = os.environ.get('LABEL_KEY')
    label_value = os.environ.get('LABEL_VALUE')

    service = authenticate()

    zones = gather_zones(project, service)

    locate_instances(project, service, zones, label_key, label_value)

Explanation of the Python Cloud Function Code

  1. Authentication: The authenticate function sets up authentication using default application credentials.

  2. Gather Zones: The gather_zones function retrieves a list of zones in the project.

  3. Turn Instance Off: The turn_instance_off function stops a VM instance.

  4. Locate Instances: The locate_instances function lists instances in each zone and stops those with the specified label.

  5. Entry Point: The instance_scheduler_start function is the entry point for the Cloud Function, retrieving environment variables and coordinating the process.

Conclusion

This Terraform setup, combined with the Cloud Function, creates a robust solution to automatically manage VM instances in GCP. By scheduling shutdowns of labeled instances, you can optimize your resource usage and reduce costs. The use of Terraform ensures that the configuration is version-controlled and easily reproducible.

For the full code and detailed implementation, you can visit the GCP Instance Scheduler using Terraform repository on GitHub. This repository contains all the Terraform configuration files and the Python Cloud Function code discussed in this blog post. You can clone the repository and customize it according to your needs to automate the scheduling of your GCP instances.

Did you find this article valuable?

Support Mikaeel Khalid by becoming a sponsor. Any amount is appreciated!