HAHA — Home Assistant High Availability. The acronym was unintentional but too fitting to change. Three Proxmox nodes running Home Assistant, Mosquitto, Zigbee2MQTT, ESPHome, and Node-RED in a Pacemaker cluster, because a smart home that dies when one machine fails isn’t really smart.

Overview

AspectDetails
Nodes3 Proxmox hosts
ClusteringPacemaker + Corosync
Storage3.6TB DRBD dual-primary + OCFS2
Network2x10Gbps LACP per node
ServicesHome Assistant, Mosquitto, Zigbee2MQTT, ESPHome, Node-RED

Technology Stack

Clustering

  • Pacemaker + Corosync with STONITH fencing (3-node cluster)
  • Floating virtual IP for seamless failover
  • Quorum-based decision making

Storage

  • 3.6TB DRBD dual-primary replication
  • OCFS2 cluster filesystem shared via NFS
  • KINGSTON SFYRD4000G NVMe drives with PCI passthrough

Networking

  • LACP etherchannel (2x10Gbps per node)
  • Cisco WS-C3850-12X48U switch
  • Network-attached IoT peripherals (no USB dependencies)

Dockerized Services

  • Home Assistant (core automation)
  • Mosquitto (MQTT broker)
  • Zigbee2MQTT (Zigbee gateway)
  • ESPHome (ESP device management)
  • Node-RED (flow-based automation)

IoT Peripherals

  • Ethernet Zigbee: TubesZB CC2652P7
  • Ethernet Bluetooth: Olimex ESP32-POE-ISO
  • USB-free design for clean failover

AI Integration

  • Local LLM (Ollama) on RTX 3090 Ti
  • Private voice control and inference
  • No cloud dependency

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Floating VIP (Active)                         │
└───────────────────────────┬─────────────────────────────────────┘
    ┌───────────────────────┼───────────────────────┐
    │                       │                       │
    ▼                       ▼                       ▼
┌─────────┐           ┌─────────┐           ┌─────────┐
│ pve01   │           │ pve02   │           │ pve03   │
│ Active  │◄─────────►│ Standby │◄─────────►│Arbitrator│
│         │  DRBD     │         │  DRBD     │         │
└────┬────┘  Sync     └────┬────┘  Sync     └────┬────┘
     │                     │                     │
     │    Corosync Ring    │                     │
     └─────────────────────┴─────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    DRBD + OCFS2 + NFS                           │
│              (3.6TB Replicated Storage)                         │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    Docker Services                               │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐            │
│  │Home Assistant│ │  Mosquitto   │ │ Zigbee2MQTT  │            │
│  └──────────────┘ └──────────────┘ └──────────────┘            │
│  ┌──────────────┐ ┌──────────────┐                              │
│  │   ESPHome    │ │   Node-RED   │                              │
│  └──────────────┘ └──────────────┘                              │
└─────────────────────────────────────────────────────────────────┘
         ┌──────────────────┼──────────────────┐
         ▼                  ▼                  ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Ethernet Zigbee │ │Ethernet Bluetooth│ │   Local LLM    │
│ TubesZB CC2652P7│ │ Olimex ESP32-POE │ │  RTX 3090 Ti   │
└─────────────────┘ └─────────────────┘ └─────────────────┘

What I Got Right (and Wrong)

The USB-to-Ethernet migration was the best decision in this project. Zigbee coordinator → TubesZB CC2652P7 (Ethernet), Bluetooth → Olimex ESP32-POE-ISO (Ethernet). USB devices can’t fail over between physical hosts; network devices don’t care which host they’re talking to. Re-pairing dozens of Zigbee devices was painful, but the result is a setup where the smart home stack moves between nodes without caring about peripheral hardware.

Pacemaker tuning took longer than expected. Quorum settings, failure detection timeouts, resource ordering — the defaults are never right for a specific workload, and the documentation explains what the options do but rarely why you’d choose one value over another. DRBD dual-primary with OCFS2 had its own lock contention issues under I/O load that took time to stabilize.

Performance Benchmarks

For IoT workloads, targets are: 50k IOPS, 200 MiB/s bandwidth, 5000 µs latency.

HAHA delivers:

  • Random Read: 58.2k IOPS, 227 MiB/s, 4392.34 µs
  • Sequential Write: 6.8k IOPS, 852 MiB/s, 9317.05 µs

Meeting or exceeding all targets despite OCFS2/DRBD/NFS network overhead.

Is It Working?

I’ve deliberately failed nodes, pulled network cables, and forced unclean shutdowns. The lights respond, the automations run, the voice assistant answers. Whether a three-node Pacemaker cluster for a smart home is proportionate is a question I prefer not to examine too closely.


Read the full build story in the blog post .