Skip to content
HAHA IoT Cluster Topology

Home Assistant High Availability (HAHA)

HAHA — Home Assistant High Availability. The acronym was unintentional but too fitting to change. Three Proxmox nodes running Home Assistant, Mosquitto, Zigbee2MQTT, ESPHome, and Node-RED in a Pacemaker cluster, because a smart home that dies when one machine fails isn’t really smart. Overview Aspect Details Nodes 3 Proxmox hosts Clustering Pacemaker + Corosync Storage 3.6TB DRBD dual-primary + OCFS2 Network 2x10Gbps LACP per node Services Home Assistant, Mosquitto, Zigbee2MQTT, ESPHome, Node-RED Technology Stack Clustering Pacemaker + Corosync with STONITH fencing (3-node cluster) Floating virtual IP for seamless failover Quorum-based decision making Storage 3.6TB DRBD dual-primary replication OCFS2 cluster filesystem shared via NFS KINGSTON SFYRD4000G NVMe drives with PCI passthrough Networking LACP etherchannel (2x10Gbps per node) Cisco WS-C3850-12X48U switch Network-attached IoT peripherals (no USB dependencies) Dockerized Services Home Assistant (core automation) Mosquitto (MQTT broker) Zigbee2MQTT (Zigbee gateway) ESPHome (ESP device management) Node-RED (flow-based automation) IoT Peripherals Ethernet Zigbee: TubesZB CC2652P7 Ethernet Bluetooth: Olimex ESP32-POE-ISO USB-free design for clean failover AI Integration Local LLM (Ollama) on RTX 3090 Ti Private voice control and inference No cloud dependency Architecture ┌─────────────────────────────────────────────────────────────────┐ │ Floating VIP (Active) │ └───────────────────────────┬─────────────────────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ pve01 │ │ pve02 │ │ pve03 │ │ Active │◄─────────►│ Standby │◄─────────►│Arbitrator│ │ │ DRBD │ │ DRBD │ │ └────┬────┘ Sync └────┬────┘ Sync └────┬────┘ │ │ │ │ Corosync Ring │ │ └─────────────────────┴─────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ DRBD + OCFS2 + NFS │ │ (3.6TB Replicated Storage) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Docker Services │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │Home Assistant│ │ Mosquitto │ │ Zigbee2MQTT │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ ESPHome │ │ Node-RED │ │ │ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Ethernet Zigbee │ │Ethernet Bluetooth│ │ Local LLM │ │ TubesZB CC2652P7│ │ Olimex ESP32-POE │ │ RTX 3090 Ti │ └─────────────────┘ └─────────────────┘ └─────────────────┘ What I Got Right (and Wrong) The USB-to-Ethernet migration was the best decision in this project. Zigbee coordinator → TubesZB CC2652P7 (Ethernet), Bluetooth → Olimex ESP32-POE-ISO (Ethernet). USB devices can’t fail over between physical hosts; network devices don’t care which host they’re talking to. Re-pairing dozens of Zigbee devices was painful, but the result is a setup where the smart home stack moves between nodes without caring about peripheral hardware. ...

April 1, 2025 · 3 min · Kyriakos Papadopoulos
Nextcloud HA Architecture

Nextcloud High Availability (NCHA)

A self-hosted Nextcloud cluster spread across multiple Proxmox nodes, built so that losing a node doesn’t take down file storage, Talk, Collabora, or Whiteboard. Same clustering approach as the HAHA project — Pacemaker, DRBD, the works — applied to a different problem. Overview Aspect Details Load Balancing HAProxy (Layer 7) + DNS Round Robin Database Galera-MariaDB cluster Cache Redis Sentinel Storage DRBD + OCFS2 shared volumes Authentication FreeIPA LDAP Services Nextcloud, Talk, Collabora Code, Whiteboard Technology Stack Orchestration & HA Corosync + Pacemaker for VIP management HAProxy for Layer 7 load balancing DNS Round Robin for geographic distribution Storage Backend DRBD + OCFS2 shared volumes Clustered Galera-MariaDB for database HA Redis Sentinel for distributed caching Networking & Security FreeIPA-based LDAP authentication SSL offloading at HAProxy Health checks for automatic failover Nextcloud Services Frontend nodes (web servers) Backend services: Talk, Collabora Code, Imaginary, Whiteboard WebDAV for file access Infrastructure Two NVMe-equipped nodes with PCI passthrough Third arbitrator node for quorum Architecture ┌─────────────────────────────────────────────────────────────────┐ │ DNS Round Robin │ │ (nextcloud.domain.tld) │ └───────────────────────────┬─────────────────────────────────────┘ │ ┌──────────────────┴──────────────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ HAProxy 1 │ │ HAProxy 2 │ │ (Active) │◄───────────────►│ (Standby) │ │ Floating VIP │ Pacemaker │ │ └────────┬────────┘ └────────┬────────┘ │ │ └──────────────────┬────────────────┘ │ Health Checks ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Nextcloud 1 │ │ Nextcloud 2 │ │ Nextcloud 3 │ │ Frontend │ │ Frontend │ │ (Backup) │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └────────────────────┼────────────────────┘ │ ┌───────────────────────────┴───────────────────────────┐ │ Backend Services │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ Talk │ │ Collabora │ │ Whiteboard│ │ │ └───────────┘ └───────────┘ └───────────┘ │ └───────────────────────────────────────────────────────┘ │ ┌───────────────────────────┴───────────────────────────┐ │ Data Layer │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Galera-MariaDB │ │ Redis Sentinel │ │ │ │ Cluster │ │ (3 nodes) │ │ │ └─────────────────┘ └─────────────────┘ │ │ ┌─────────────────────────────────────────┐ │ │ │ DRBD + OCFS2 (NFS Export) │ │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────┐ │ FreeIPA │ │ (LDAP Authentication) │ └───────────────────────────────────────────────────────┘ What Made This Tricky The challenge wasn’t any single component — it was getting Galera, Redis Sentinel, DRBD, OCFS2, and HAProxy to all agree on what “healthy” means at the same time. Each layer has its own idea of quorum, its own failure detection, and its own timeout semantics. ...

December 1, 2024 · 3 min · Kyriakos Papadopoulos