Substrate

Terraform Provider

The Substrate Terraform provider lets you manage GPU compute infrastructure as code. Define instances, networks, and clusters in HCL and apply changes with standard Terraform workflows.

Provider configuration

Add the Substrate provider to your Terraform configuration. The API key can also be set via the SUBSTRATE_API_KEY environment variable.

terraform {
  required_providers {
    substrate = {
      source  = "onsubstrate/substrate"
      version = "~> 1.0"
    }
  }
}

provider "substrate" {
  api_key = var.substrate_api_key
  region  = "us-east-1"
}

Resource: substrate_compute

Manages a GPU compute instance. Substrate composes hardware to match the exact resource configuration you define.

Attributes

AttributeTypeRequiredDescription
gpu_coresnumberYesNumber of GPU cores
vram_gbnumberYesGPU memory in GB
ram_gbnumberYesSystem memory in GB
storage_gbnumberYesNVMe storage in GB
namestringNoInstance display name
regionstringNoDeployment region (default: provider region)

Read-only attributes

AttributeDescription
idInstance ID (e.g. inst_abc123)
statusCurrent status (provisioning, running, stopped)
endpointSSH/API endpoint hostname

Example

resource "substrate_compute" "training" {
  name       = "training-node-1"
  gpu_cores  = 4
  vram_gb    = 24
  ram_gb     = 64
  storage_gb = 100
  region     = "us-east-1"
}

Resource: substrate_network

Manages a virtual network for connecting compute instances. Provides private networking between instances within the same region.

Attributes

AttributeTypeRequiredDescription
vpc_idstringYesVPC identifier for the network
subnet_cidrstringYesCIDR block for the subnet (e.g. 10.0.0.0/24)
enable_public_ipboolNoAssign public IPs to instances (default: false)

Example

resource "substrate_network" "training_vpc" {
  vpc_id           = "vpc-training-cluster"
  subnet_cidr      = "10.0.1.0/24"
  enable_public_ip = true
}

Data sources

substrate_regions

Lists all available deployment regions with their identifiers and display names.

data "substrate_regions" "available" {}

output "regions" {
  value = data.substrate_regions.available.regions
}

# Returns:
# [
#   { id = "us-east-1", name = "US East (Virginia)" },
#   { id = "us-west-2", name = "US West (Oregon)" },
#   { id = "eu-west-1", name = "EU West (Ireland)" },
#   { id = "ap-southeast-1", name = "Asia Pacific (Singapore)" },
# ]

substrate_gpu_availability

Queries real-time GPU availability by region and configuration. Use this to check capacity before provisioning.

data "substrate_gpu_availability" "us_east" {
  region    = "us-east-1"
  gpu_cores = 4
  vram_gb   = 24
}

output "available" {
  value = data.substrate_gpu_availability.us_east.available
}

# Returns:
# {
#   available      = true
#   available_units = 12
#   region          = "us-east-1"
#   gpu_cores       = 4
#   vram_gb         = 24
# }

Complete example: Multi-instance training cluster

This example provisions a distributed training cluster with three GPU compute nodes connected over a private network. Each node is configured with identical resources for data-parallel training.

terraform {
  required_providers {
    substrate = {
      source  = "onsubstrate/substrate"
      version = "~> 1.0"
    }
  }
}

variable "substrate_api_key" {
  type      = string
  sensitive = true
}

variable "cluster_size" {
  type    = number
  default = 3
}

provider "substrate" {
  api_key = var.substrate_api_key
  region  = "us-east-1"
}

# Check GPU availability before provisioning
data "substrate_gpu_availability" "check" {
  region    = "us-east-1"
  gpu_cores = 8
  vram_gb   = 48
}

# Private network for inter-node communication
resource "substrate_network" "cluster_network" {
  vpc_id           = "vpc-training-cluster"
  subnet_cidr      = "10.0.1.0/24"
  enable_public_ip = false
}

# Training nodes
resource "substrate_compute" "worker" {
  count = var.cluster_size

  name       = "training-worker-${count.index}"
  gpu_cores  = 8
  vram_gb    = 48
  ram_gb     = 128
  storage_gb = 500
  region     = "us-east-1"
}

output "worker_endpoints" {
  value = [for w in substrate_compute.worker : w.endpoint]
}

output "worker_ids" {
  value = [for w in substrate_compute.worker : w.id]
}

output "network_id" {
  value = substrate_network.cluster_network.id
}

Deploy the cluster with:

terraform init
terraform plan -var="substrate_api_key=sk_live_..."
terraform apply -var="substrate_api_key=sk_live_..."

To tear down the entire cluster and clean up all resources:

terraform destroy -var="substrate_api_key=sk_live_..."