§ Entry
Provisioning Proxmox VMs and Bare-Metal Nodes with Terraform
The compute layer
The homelab overview covered the reasoning behind running everything as code and the overall layer layout. This post drops into one of those layers, 02-compute, and traces a single machine from its Terraform declaration to an active, configured deployment.
The standing rule applies throughout: if a resource is not defined in the repository, it does not exist. Every VM, container, and node here is declared, cloned, and provisioned from git.
The manual prerequisite: the golden image
Before Terraform can clone a VM, a base template must exist on the host. Building this template is a manual step by design.
I download a standard Debian cloud image (which includes cloud-init and the QEMU guest agent), import it as a disk, attach a cloud-init drive, and convert the resource to a template with a fixed ID (9000). This process happens once and changes rarely; automating it via Terraform adds complexity for minimal return.
Instead, the exact qm CLI commands required to recreate the template are documented in a runbook stored within the repository, directly alongside the Terraform configuration. If the template requires recreation or migration, the process is a direct copy-paste operation rather than a reverse-engineering task.
Everything downstream of this template is managed via code.
Provider configuration
I use the bpg/proxmox provider. The configuration requires two distinct access channels to the Proxmox host:
provider "proxmox" {
endpoint = local.identity.proxmox_api_host
api_token = local.identity.proxmox_api_token
insecure = true
ssh {
agent = true
username = "root"
private_key = local.identity.proxmox_ssh_pk
node {
address = local.identity.proxmox_gateway_ip
name = "pve"
}
}
}
The api_token manages standard API operations: provisioning VMs, configuring CPU or memory allocations, and cloning disks. However, certain operations, such as uploading custom cloud-init snippet files to node storage, are not supported by the Proxmox API directly. The provider uses the ssh block to execute these tasks directly on the host CLI. Without the SSH configuration, custom cloud-init provisioning fails.
All credentials (local.identity.*) are injected from the foundational 00-creds layer via remote state, keeping this file safe for version control.
Declaring a VM
The block below represents the standard VM structural pattern used across the cluster:
resource "proxmox_virtual_environment_vm" "core" {
name = "core"
node_name = "pve"
vm_id = 100
tags = ["infra", "docker", "proxy"]
clone {
vm_id = 9000 # The golden-image template ID
}
agent {
enabled = true # Enables communication via the QEMU guest agent
}
cpu {
cores = 2
type = "host"
}
memory {
dedicated = 4096
}
network_device {
bridge = "vmbr0"
}
startup {
order = 1
up_delay = 30
down_delay = 30
}
initialization {
user_data_file_id = proxmox_virtual_environment_file.core_user_data.id
ip_config {
ipv4 { address = "dhcp" }
}
}
started = true
}
Two specific configurations here prevent operational issues:
The startup block dictates boot sequencing on the host. The core VM uses order = 1 because downstream infrastructure depends on its reverse proxy and DNS services. It boots first, and subsequent nodes queue behind it with defined delays, preventing dependency failures during a cold boot.
The initialization block integrates cloud-init. The network configuration delegates addressing to DHCP, while user_data_file_id targets a locally generated snippet.
Cloud-init generation and transfer
Cloud-init handles host localization: assigning the hostname, creating the primary user, injecting public SSH keys, and disabling password authentication. The configuration is rendered inline and transferred to Proxmox:
locals {
core_cloud_init = <<-EOT
#cloud-config
hostname: core
manage_etc_hosts: true
ssh_pwauth: false
package_update: true
package_upgrade: true
users:
- name: pushkar
groups: [sudo]
shell: /bin/bash
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
ssh_authorized_keys:
- ${local.identity.ssh_public_key}
EOT
}
resource "local_file" "core_user_data" {
filename = "${path.module}/snippets/core.yaml"
content = local.core_cloud_init
}
resource "proxmox_virtual_environment_file" "core_user_data" {
content_type = "snippets"
datastore_id = "local"
node_name = "pve"
source_file {
path = local_file.core_user_data.filename
file_name = "core.yaml"
}
}
Generating the file locally before uploading it as a snippet requires the provider to have direct SSH access to the host, as configured in the provider block. Enforcing ssh_pwauth: false ensures that VMs boot with password authentication disabled from their first initialization cycle, eliminating unsecured default configurations.
Resolving dynamic IP addresses
Because the VM obtains its IP via DHCP, Terraform cannot predict the network address during the planning phase. However, subsequent provisioning steps require an SSH target address.
The QEMU guest agent resolves this constraint. By setting agent { enabled = true } and ensuring the agent binary is baked into the base template, the booted VM reports its runtime network state back to the Proxmox host. The provider exposes this data, allowing the specific IPv4 address to be filtered:
locals {
core_ipv4_address = try(
[for ip in distinct(flatten(proxmox_virtual_environment_vm.core.ipv4_addresses))
: ip if ip != "127.0.0.1"][0],
null
)
}
The agent returns a structured but nested list of all addresses across all interfaces. This local block flattens the data structure, excludes the loopback address, selects the primary active IP, and applies a try function to prevent validation failures on initial runs when the VM does not yet exist. The resulting local.core_ipv4_address variable acts as the network target for post-boot configuration.
Post-boot provisioning
Tasks not covered by base virtualization or cloud-init (injecting repository deploy keys, writing internal SSH configurations, and installing the Docker runtime) are executed via a targeted null_resource using SSH connections.
resource "null_resource" "core_config" {
triggers = {
vm_id = proxmox_virtual_environment_vm.core.id
content = local.identity.github_deploy_key_private_key
}
connection {
type = "ssh"
user = "pushkar"
private_key = local.identity.ssh_private_key
host = local.core_ipv4_address
}
provisioner "file" {
content = local.identity.github_deploy_key_private_key
destination = "/home/pushkar/.ssh/github_deploy_key"
}
provisioner "remote-exec" {
inline = [
"chmod 600 /home/pushkar/.ssh/github_deploy_key",
"ssh-keyscan github.com >> /home/pushkar/.ssh/known_hosts",
]
}
}
Terraform provisioners break standard declarative patterns by executing imperative steps sequentially. While documentation advises using them as a last resort, the alternatives at this scale require either manual intervention per node or introducing a complex configuration management framework for limited post-boot tasks.
The triggers block tracks dependencies. If the underlying VM ID changes or the GitHub deploy key rotates, Terraform identifies the mismatch and executes the script again. If the dependencies match the state, Terraform skips execution, ensuring predictable applies.
flowchart TD
Tmpl["Template 9000"] -->|clone| VM["VM created"]
VM -->|cloud-init| Init["Identity and hardening"]
Init -->|DHCP| Agent["Guest agent reports IP"]
Agent -->|null_resource over SSH| Prov["Deploy keys and runtime"]
Prov --> Ready["Managed host ready"]
Integrating bare-metal nodes
Part of the compute fleet runs on bare-metal hardware, specifically Raspberry Pi nodes handling lighter, decentralized services. Terraform cannot instantiate physical boards, but it manages their lifecycle post-initialization.
The initial bootstrap is performed using the Raspberry Pi Imager tool to flash the OS and pre-inject the required administration SSH keys. The nodes are assigned static IP addresses via DHCP reservations on the network router. Once online, Terraform attaches them using null_resource definitions:
resource "null_resource" "pi_docker" {
triggers = {
script_hash = filesha256("${path.module}/scripts/install-docker.sh")
}
connection {
type = "ssh"
user = "pushkar"
host = local.pi_ip
private_key = local.identity.ssh_private_key
}
provisioner "file" {
source = "${path.module}/scripts/install-docker.sh"
destination = "/tmp/install-docker.sh"
}
provisioner "remote-exec" {
inline = [
"chmod +x /tmp/install-docker.sh",
"sudo /tmp/install-docker.sh",
"sudo usermod -aG docker pushkar",
]
}
}
The trigger tracks the SHA256 hash of the script itself. Modifying the installation script forces execution during the next apply sequence; otherwise, the host state remains untouched. A secondary resource installs the agent required to join the node to the automated GitOps deployment pipeline.
This establishes two entry points into the infrastructure fleet: cloning a VM from the Proxmox base template, or adopting a bare-metal Pi via an SSH handshake. Both methods lead to an identical result: an automated, version-controlled node ready to receive containerized workloads.
Trade-offs and operational results
Mixing declarative infrastructure states with imperative shell scripts introduces risk: provisioners do not support native rollbacks, and an interrupted execution leaves a VM in an inconsistent state. Writing idempotent shell scripts and targeting explicit resource triggers mitigates this.
The trade-off is justified by the recovery speed. If a virtual instance fails, running terraform apply recreates an identical node with all keys and runtimes configured. Scaling the cluster requires duplicating an existing resource block and altering the host parameters. Every configuration setting, boot order dependency, and template generation step is preserved in version control rather than residing in volatile memory.
Next steps
The compute layer provides the execution environment for the infrastructure, but these nodes rely on shared core services. The most critical component is internal DNS resolution, which currently introduces a single point of failure. The next post in this series will cover how this layout handles highly available local DNS without creating circular infrastructure dependencies during a cold bootstrap.