Terraform Provider
The Substrate Terraform provider lets you manage GPU compute infrastructure as code. Define instances, networks, and clusters in HCL and apply changes with standard Terraform workflows.
Provider configuration
Add the Substrate provider to your Terraform configuration. The API key can also be set via the SUBSTRATE_API_KEY environment variable.
terraform {
required_providers {
substrate = {
source = "onsubstrate/substrate"
version = "~> 1.0"
}
}
}
provider "substrate" {
api_key = var.substrate_api_key
region = "us-east-1"
}Resource: substrate_compute
Manages a GPU compute instance. Substrate composes hardware to match the exact resource configuration you define.
Attributes
| Attribute | Type | Required | Description |
|---|---|---|---|
gpu_cores | number | Yes | Number of GPU cores |
vram_gb | number | Yes | GPU memory in GB |
ram_gb | number | Yes | System memory in GB |
storage_gb | number | Yes | NVMe storage in GB |
name | string | No | Instance display name |
region | string | No | Deployment region (default: provider region) |
Read-only attributes
| Attribute | Description |
|---|---|
id | Instance ID (e.g. inst_abc123) |
status | Current status (provisioning, running, stopped) |
endpoint | SSH/API endpoint hostname |
Example
resource "substrate_compute" "training" {
name = "training-node-1"
gpu_cores = 4
vram_gb = 24
ram_gb = 64
storage_gb = 100
region = "us-east-1"
}Resource: substrate_network
Manages a virtual network for connecting compute instances. Provides private networking between instances within the same region.
Attributes
| Attribute | Type | Required | Description |
|---|---|---|---|
vpc_id | string | Yes | VPC identifier for the network |
subnet_cidr | string | Yes | CIDR block for the subnet (e.g. 10.0.0.0/24) |
enable_public_ip | bool | No | Assign public IPs to instances (default: false) |
Example
resource "substrate_network" "training_vpc" {
vpc_id = "vpc-training-cluster"
subnet_cidr = "10.0.1.0/24"
enable_public_ip = true
}Data sources
substrate_regions
Lists all available deployment regions with their identifiers and display names.
data "substrate_regions" "available" {}
output "regions" {
value = data.substrate_regions.available.regions
}
# Returns:
# [
# { id = "us-east-1", name = "US East (Virginia)" },
# { id = "us-west-2", name = "US West (Oregon)" },
# { id = "eu-west-1", name = "EU West (Ireland)" },
# { id = "ap-southeast-1", name = "Asia Pacific (Singapore)" },
# ]substrate_gpu_availability
Queries real-time GPU availability by region and configuration. Use this to check capacity before provisioning.
data "substrate_gpu_availability" "us_east" {
region = "us-east-1"
gpu_cores = 4
vram_gb = 24
}
output "available" {
value = data.substrate_gpu_availability.us_east.available
}
# Returns:
# {
# available = true
# available_units = 12
# region = "us-east-1"
# gpu_cores = 4
# vram_gb = 24
# }Complete example: Multi-instance training cluster
This example provisions a distributed training cluster with three GPU compute nodes connected over a private network. Each node is configured with identical resources for data-parallel training.
terraform {
required_providers {
substrate = {
source = "onsubstrate/substrate"
version = "~> 1.0"
}
}
}
variable "substrate_api_key" {
type = string
sensitive = true
}
variable "cluster_size" {
type = number
default = 3
}
provider "substrate" {
api_key = var.substrate_api_key
region = "us-east-1"
}
# Check GPU availability before provisioning
data "substrate_gpu_availability" "check" {
region = "us-east-1"
gpu_cores = 8
vram_gb = 48
}
# Private network for inter-node communication
resource "substrate_network" "cluster_network" {
vpc_id = "vpc-training-cluster"
subnet_cidr = "10.0.1.0/24"
enable_public_ip = false
}
# Training nodes
resource "substrate_compute" "worker" {
count = var.cluster_size
name = "training-worker-${count.index}"
gpu_cores = 8
vram_gb = 48
ram_gb = 128
storage_gb = 500
region = "us-east-1"
}
output "worker_endpoints" {
value = [for w in substrate_compute.worker : w.endpoint]
}
output "worker_ids" {
value = [for w in substrate_compute.worker : w.id]
}
output "network_id" {
value = substrate_network.cluster_network.id
}Deploy the cluster with:
terraform init
terraform plan -var="substrate_api_key=sk_live_..."
terraform apply -var="substrate_api_key=sk_live_..."To tear down the entire cluster and clean up all resources:
terraform destroy -var="substrate_api_key=sk_live_..."