Terraform Provider

The Substrate Terraform provider lets you manage GPU compute infrastructure as code. Define instances, networks, and clusters in HCL and apply changes with standard Terraform workflows.

Provider configuration

Add the Substrate provider to your Terraform configuration. The API key can also be set via the SUBSTRATE_API_KEY environment variable.

terraform {
  required_providers {
    substrate = {
      source  = "onsubstrate/substrate"
      version = "~> 1.0"
    }
  }
}

provider "substrate" {
  api_key = var.substrate_api_key
  region  = "us-east-1"
}

Resource: `substrate_compute`

Manages a GPU compute instance. Substrate composes hardware to match the exact resource configuration you define.

Attributes

Attribute	Type	Required	Description
`gpu_cores`	number	Yes	Number of GPU cores
`vram_gb`	number	Yes	GPU memory in GB
`ram_gb`	number	Yes	System memory in GB
`storage_gb`	number	Yes	NVMe storage in GB
`name`	string	No	Instance display name
`region`	string	No	Deployment region (default: provider region)

Read-only attributes

Attribute	Description
`id`	Instance ID (e.g. inst_abc123)
`status`	Current status (provisioning, running, stopped)
`endpoint`	SSH/API endpoint hostname

Example

resource "substrate_compute" "training" {
  name       = "training-node-1"
  gpu_cores  = 4
  vram_gb    = 24
  ram_gb     = 64
  storage_gb = 100
  region     = "us-east-1"
}

Resource: `substrate_network`

Manages a virtual network for connecting compute instances. Provides private networking between instances within the same region.

Attributes

Attribute	Type	Required	Description
`vpc_id`	string	Yes	VPC identifier for the network
`subnet_cidr`	string	Yes	CIDR block for the subnet (e.g. 10.0.0.0/24)
`enable_public_ip`	bool	No	Assign public IPs to instances (default: false)

Example

resource "substrate_network" "training_vpc" {
  vpc_id           = "vpc-training-cluster"
  subnet_cidr      = "10.0.1.0/24"
  enable_public_ip = true
}

Data sources

`substrate_regions`

Lists all available deployment regions with their identifiers and display names.

data "substrate_regions" "available" {}

output "regions" {
  value = data.substrate_regions.available.regions
}

# Returns:
# [
#   { id = "us-east-1", name = "US East (Virginia)" },
#   { id = "us-west-2", name = "US West (Oregon)" },
#   { id = "eu-west-1", name = "EU West (Ireland)" },
#   { id = "ap-southeast-1", name = "Asia Pacific (Singapore)" },
# ]

`substrate_gpu_availability`

Queries real-time GPU availability by region and configuration. Use this to check capacity before provisioning.

data "substrate_gpu_availability" "us_east" {
  region    = "us-east-1"
  gpu_cores = 4
  vram_gb   = 24
}

output "available" {
  value = data.substrate_gpu_availability.us_east.available
}

# Returns:
# {
#   available      = true
#   available_units = 12
#   region          = "us-east-1"
#   gpu_cores       = 4
#   vram_gb         = 24
# }

Complete example: Multi-instance training cluster

This example provisions a distributed training cluster with three GPU compute nodes connected over a private network. Each node is configured with identical resources for data-parallel training.

terraform {
  required_providers {
    substrate = {
      source  = "onsubstrate/substrate"
      version = "~> 1.0"
    }
  }
}

variable "substrate_api_key" {
  type      = string
  sensitive = true
}

variable "cluster_size" {
  type    = number
  default = 3
}

provider "substrate" {
  api_key = var.substrate_api_key
  region  = "us-east-1"
}

# Check GPU availability before provisioning
data "substrate_gpu_availability" "check" {
  region    = "us-east-1"
  gpu_cores = 8
  vram_gb   = 48
}

# Private network for inter-node communication
resource "substrate_network" "cluster_network" {
  vpc_id           = "vpc-training-cluster"
  subnet_cidr      = "10.0.1.0/24"
  enable_public_ip = false
}

# Training nodes
resource "substrate_compute" "worker" {
  count = var.cluster_size

  name       = "training-worker-${count.index}"
  gpu_cores  = 8
  vram_gb    = 48
  ram_gb     = 128
  storage_gb = 500
  region     = "us-east-1"
}

output "worker_endpoints" {
  value = [for w in substrate_compute.worker : w.endpoint]
}

output "worker_ids" {
  value = [for w in substrate_compute.worker : w.id]
}

output "network_id" {
  value = substrate_network.cluster_network.id
}

Deploy the cluster with:

terraform init
terraform plan -var="substrate_api_key=sk_live_..."
terraform apply -var="substrate_api_key=sk_live_..."

To tear down the entire cluster and clean up all resources:

terraform destroy -var="substrate_api_key=sk_live_..."

Terraform Provider

Provider configuration

Resource: substrate_compute

Attributes

Read-only attributes

Example

Resource: substrate_network

Attributes

Example

Data sources

substrate_regions

substrate_gpu_availability

Complete example: Multi-instance training cluster

Resource: `substrate_compute`

Resource: `substrate_network`

`substrate_regions`

`substrate_gpu_availability`