HAHA IoT Cluster Topology

Home Assistant High Availability (HAHA)

HAHA is a homelab project to build a fault-tolerant, high-performance smart home stack. Hosted on three Proxmox nodes, it ensures uninterrupted operation of key IoT services: Home Assistant, Mosquitto, Zigbee2MQTT, ESPHome, and Node-RED — all running in a high-availability cluster. Overview Aspect Details Nodes 3 Proxmox hosts Clustering Pacemaker + Corosync Storage 3.6TB DRBD dual-primary + OCFS2 Network 2x10Gbps LACP per node Services Home Assistant, Mosquitto, Zigbee2MQTT, ESPHome, Node-RED Technology Stack Clustering Pacemaker + Corosync with STONITH fencing (3-node cluster) Floating virtual IP for seamless failover Quorum-based decision making Storage 3.6TB DRBD dual-primary replication OCFS2 cluster filesystem shared via NFS KINGSTON SFYRD4000G NVMe drives with PCI passthrough Networking LACP etherchannel (2x10Gbps per node) Cisco WS-C3850-12X48U switch Network-attached IoT peripherals (no USB dependencies) Dockerized Services Home Assistant (core automation) Mosquitto (MQTT broker) Zigbee2MQTT (Zigbee gateway) ESPHome (ESP device management) Node-RED (flow-based automation) IoT Peripherals Ethernet Zigbee: TubesZB CC2652P7 Ethernet Bluetooth: Olimex ESP32-POE-ISO USB-free design for clean failover AI Integration Local LLM (Ollama) on RTX 3090 Ti Private voice control and inference No cloud dependency Architecture ┌─────────────────────────────────────────────────────────────────┐ │ Floating VIP (Active) │ └───────────────────────────┬─────────────────────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ pve01 │ │ pve02 │ │ pve03 │ │ Active │◄─────────►│ Standby │◄─────────►│Arbitrator│ │ │ DRBD │ │ DRBD │ │ └────┬────┘ Sync └────┬────┘ Sync └────┬────┘ │ │ │ │ Corosync Ring │ │ └─────────────────────┴─────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ DRBD + OCFS2 + NFS │ │ (3.6TB Replicated Storage) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Docker Services │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │Home Assistant│ │ Mosquitto │ │ Zigbee2MQTT │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ ESPHome │ │ Node-RED │ │ │ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Ethernet Zigbee │ │Ethernet Bluetooth│ │ Local LLM │ │ TubesZB CC2652P7│ │ Olimex ESP32-POE │ │ RTX 3090 Ti │ └─────────────────┘ └─────────────────┘ └─────────────────┘ Key Features Seamless failover via Pacemaker/Corosync High-speed, redundant storage with DRBD + NVMe passthrough USB-free Zigbee/Bluetooth to support clean failover Fully local voice assistant, no cloud dependency Challenges Overcome Fine-tuned Pacemaker quorum and timeouts to avoid split-brain Stable dual-primary DRBD sync with OCFS2 lock handling Migrated from USB to Ethernet peripherals for failover stability Balanced low-latency storage and high throughput using LACP + passthrough NVMe Performance Benchmarks For IoT workloads, targets are: 50k IOPS, 200 MiB/s bandwidth, 5000 µs latency. ...

April 1, 2025 · 3 min · Kyriakos Papadopoulos
Nextcloud HA Architecture

Nextcloud High Availability (NCHA)

Designed and deployed a fully self-hosted, fault-tolerant Nextcloud cluster across multiple Proxmox nodes. This setup ensures continuous access to file storage and collaboration services—including Talk, Collabora, and Whiteboard—via high-availability mechanisms at both the compute and storage layers. Overview Aspect Details Load Balancing HAProxy (Layer 7) + DNS Round Robin Database Galera-MariaDB cluster Cache Redis Sentinel Storage DRBD + OCFS2 shared volumes Authentication FreeIPA LDAP Services Nextcloud, Talk, Collabora Code, Whiteboard Technology Stack Orchestration & HA Corosync + Pacemaker for VIP management HAProxy for Layer 7 load balancing DNS Round Robin for geographic distribution Storage Backend DRBD + OCFS2 shared volumes Clustered Galera-MariaDB for database HA Redis Sentinel for distributed caching Networking & Security FreeIPA-based LDAP authentication SSL offloading at HAProxy Health checks for automatic failover Nextcloud Services Frontend nodes (web servers) Backend services: Talk, Collabora Code, Imaginary, Whiteboard WebDAV for file access Infrastructure Two NVMe-equipped nodes with PCI passthrough Third arbitrator node for quorum Architecture ┌─────────────────────────────────────────────────────────────────┐ │ DNS Round Robin │ │ (nextcloud.domain.tld) │ └───────────────────────────┬─────────────────────────────────────┘ │ ┌──────────────────┴──────────────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ HAProxy 1 │ │ HAProxy 2 │ │ (Active) │◄───────────────►│ (Standby) │ │ Floating VIP │ Pacemaker │ │ └────────┬────────┘ └────────┬────────┘ │ │ └──────────────────┬────────────────┘ │ Health Checks ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Nextcloud 1 │ │ Nextcloud 2 │ │ Nextcloud 3 │ │ Frontend │ │ Frontend │ │ (Backup) │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └────────────────────┼────────────────────┘ │ ┌───────────────────────────┴───────────────────────────┐ │ Backend Services │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ Talk │ │ Collabora │ │ Whiteboard│ │ │ └───────────┘ └───────────┘ └───────────┘ │ └───────────────────────────────────────────────────────┘ │ ┌───────────────────────────┴───────────────────────────┐ │ Data Layer │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Galera-MariaDB │ │ Redis Sentinel │ │ │ │ Cluster │ │ (3 nodes) │ │ │ └─────────────────┘ └─────────────────┘ │ │ ┌─────────────────────────────────────────┐ │ │ │ DRBD + OCFS2 (NFS Export) │ │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────┐ │ FreeIPA │ │ (LDAP Authentication) │ └───────────────────────────────────────────────────────┘ Key Features Active-active frontend nodes with load-balanced HTTPS access Shared and resilient storage using DRBD-backed OCFS2 volumes Highly available clustered database and in-memory cache layers Integrated LDAP for centralized authentication Modular auxiliary service deployment with failover support Challenges Solved Implemented cluster-aware database and NFS storage with automated failover Tuned HAProxy with backend health checks for traffic failover and load distribution Optimized service orchestration across multiple LXCs and VMs Avoided SPOFs across authentication, web, and application layers Results Achieved a robust private cloud platform with continuous uptime during simulated node failures. Cluster gracefully tolerates network disruptions and node restarts with no data loss or service interruption. ...

December 1, 2024 · 3 min · Kyriakos Papadopoulos
Home Assistant HA architecture diagram

Home Assistant High Availability (HAHA)

The problem with smart homes There is a certain irony in building a smart home that becomes useless the moment a single Raspberry Pi decides to fail. I had been running Home Assistant on a standalone VM for years, and while it worked, I found myself increasingly aware of just how fragile the setup was. A failed disk, a corrupted SD card, an unfortunate kernel panic during a firmware update — any of these would leave me fumbling for light switches like it was 1995. ...

April 15, 2025 · 5 min · Kyriakos Papadopoulos