Infrastructure

Agentic ChatOps Platform Architecture v2

Agentic ChatOps: AI-Powered Infrastructure Operations for a Solo Operator

Agentic ChatOps is a production system run by one person across 310+ infrastructure objects and six sites. Three AI subsystems triage alerts, investigate root causes, and propose fixes; a human approves every change. The genuinely novel part is a causal world-model of the infrastructure — a dependency graph that predicts the consequences of a proposed action before it can be approved, then verifies the outcome against that prediction in code. No remediation reaches a human’s approval without a machine-computed prediction attached, and the operator stays ring 0. ...

IPoUGRS: IP over Unanswered GSM Ring Signals

IPoUGRS is a protocol for transmitting IP datagrams over the GSM ring signal. Eight phones ring in parallel — a ring is a 1, silence is a 0 — producing one octet per time slot at a theoretical maximum of 1.6 bits per second. The caller hangs up before anyone answers. This is not considered rude. It is the physical layer. The protocol is specified as an IETF Internet-Draft (draft-papadopoulos-ipougrs-00 ) in the tradition of RFC 1149 (IP over Avian Carriers) and RFC 2324 (HTCPCP). ...

MeshSat: Meshtastic-to-Iridium Satellite Bridge

An open-source gateway that bridges Meshtastic LoRa mesh networks to the Iridium satellite constellation via Short Burst Data (SBD). One gateway device provides satellite backhaul for an unlimited number of mesh nodes — instead of paying per device like Garmin and ACR charge. Overview Aspect Details Function Bidirectional Meshtastic ↔ Iridium SBD bridge Hardware Raspberry Pi 5, Lilygo T-Echo, RockBLOCK 9603 Backend Go 1.24 (chi router, SQLite) Frontend Vue.js 3 (SSE, Leaflet maps) Deployment Docker multi-arch (ARM64 + x86_64) License GPLv3 Repository github.com/cubeos-app/meshsat Why MeshSat Consumer satellite communicators like Garmin inReach and ACR Bivy Stick MESH charge per person or cap at 12 devices. A 20-person expedition pays €300/month for individual Garmin subscriptions. MeshSat replaces that with a single €15/month satellite link shared across the entire mesh network. ...

Multi-Site Kubernetes BGP Anycast Architecture

Multi-Site Kubernetes Cluster with BGP Anycast

A Kubernetes cluster spread across four countries (Netherlands, Switzerland, Norway, Greece) with my own ASN and BGP anycast routing. The excuse was disaster recovery. The real reason was wanting to understand how the internet works at the routing level. Overview Aspect Details Countries Netherlands, Switzerland, Norway, Greece Architecture Hybrid (on-premises + cloud VPS) Routing BGP anycast with personal ASN IPv6 /48 prefix via RIPE LIR Nodes 13 across 2 operational sites (NL, GR) Key Technical Components BGP & Networking Personal ASN registration via RIPE LIR with /48 IPv6 prefix eBGP peering with two upstream transit providers for anycast redundancy iBGP mesh over IPsec/WireGuard tunnels using Cilium BGP control plane NAT64 edge translation enabling IPv6 ingress to IPv4 core infrastructure Site Connectivity Full mesh encrypted tunnels between on-premises (Cisco ASA) and cloud sites (strongSwan) Geo-distributed edge nodes for latency optimization and DDoS resilience IPsec tunnels with automatic failover Kubernetes Platform Cilium CNI with eBPF dataplane BGP control plane for LoadBalancer services Cross-cluster service mesh via Cilium Cluster Mesh Architecture ┌─────────────────────────────────┐ │ IPv6 Anycast Traffic │ │ (Personal ASN + /48) │ └───────────────┬─────────────────┘ │ ┌───────────────┴───────────────┐ │ Transit Providers │ │ (eBGP - Redundant Path) │ └───────────────┬───────────────┘ │ ┌────────────────┬───────────────┼───────────────┬────────────────┐ │ │ │ │ │ ▼ ▼ │ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ ┌───────────────┐ ┌───────────────┐ │ Switzerland │ │ Norway │ │ │ Netherlands │ │ Greece │ │ (iFog VPS) │ │(Gigahost VPS) │ │ │ (On-Prem) │ │ (On-Prem) │ │ strongSwan │ │ strongSwan │ │ │ Cisco ASA │ │ Cisco ASA │ │ NAT64+eBGP │ │ NAT64+eBGP │ │ │ 5508-X │ │ 5508-X │ └───────┬───────┘ └───────┬───────┘ │ └───────┬───────┘ └───────┬───────┘ │ │ │ │ │ │ │ │ │ │ └─────────────────┴──────────────┴──────────────┴─────────────────┘ │ ┌────────────────────┴────────────────────┐ │ IPsec Full Mesh (All 4 Sites) │ │ CH ←→ NO ←→ NL ←→ GR ←→ CH ←→ NL... │ └────────────────────┬────────────────────┘ │ ┌──────────────────────────┴──────────────────────────┐ │ │ ▼ ▼ ┌─────────────────────────────────┐ ┌─────────────────────────────────┐ │ Netherlands (NL) │ │ Greece (GR) │ │ nllei01k8s │ │ grskg01k8s │ │ ┌───────────────────────────┐ │ │ ┌───────────────────────────┐ │ │ │ 3x Control Plane (HA) │ │ │ │ 3x Control Plane (HA) │ │ │ │ 4x Worker Nodes │ │ │ │ 4x Worker Nodes │ │ │ │ Cilium CNI + iBGP │ │ │ │ Cilium CNI + iBGP │ │ │ │ Proxmox VE │ │ │ │ Proxmox VE │ │ │ └───────────────────────────┘ │ │ └───────────────────────────┘ │ │ PRIMARY SITE │◄───────►│ DR/HA SITE │ │ 192.168.85.0/24 │ Cluster │ 192.168.58.0/24 │ └─────────────────────────────────┘ Mesh └─────────────────────────────────┘ Technology Stack Networking BGP: Personal ASN with /48 IPv6 prefix Transit: Dual upstream providers for redundancy Tunneling: IPsec (Cisco ASA) + strongSwan (cloud) Edge: NAT64 for IPv6→IPv4 translation Kubernetes Version: v1.34.2 CNI: Cilium with eBPF dataplane Mesh: Cilium Cluster Mesh for cross-site connectivity Ingress: NGINX with BGP-advertised VIPs Infrastructure On-premises: Proxmox VE, Cisco ASA 5508-X Cloud: iFog (Switzerland), Gigahost (Norway) Storage: SeaweedFS with cross-site replication Status Current: Netherlands and Greece operational (primary + DR) Transit: Switzerland (iFog) and Norway (Gigahost) operational as edge/transit nodes ...

GitOps Homelab: Full-Stack Infrastructure as Code

Everything in the homelab — network devices, VMs, Kubernetes workloads, Docker services, backups — is managed through GitLab CI/CD, Atlantis, and Argo CD. Nothing gets configured via SSH anymore. What It Covers Platform vs. application split — Atlantis/OpenTofu handles the infrastructure layer (nodes, networking, storage), Argo CD handles application workloads. Both triggered by merge requests, both with auto-sync. Kubernetes — 7-node HA cluster (v1.34.2) with Cilium CNI replacing kube-proxy via eBPF. BGP peering with the Cisco ASA for LoadBalancer services. Dual storage: NFS for shared volumes, Synology iSCSI CSI for block storage. ...

Home Assistant High Availability (HAHA)

HAHA — Home Assistant High Availability. The acronym was unintentional but too fitting to change. Three Proxmox nodes running Home Assistant, Mosquitto, Zigbee2MQTT, ESPHome, and Node-RED in a Pacemaker cluster, because a smart home that dies when one machine fails isn’t really smart. Overview Aspect Details Nodes 3 Proxmox hosts Clustering Pacemaker + Corosync Storage 3.6TB DRBD dual-primary + OCFS2 Network 2x10Gbps LACP per node Services Home Assistant, Mosquitto, Zigbee2MQTT, ESPHome, Node-RED Technology Stack Clustering Pacemaker + Corosync with STONITH fencing (3-node cluster) Floating virtual IP for seamless failover Quorum-based decision making Storage 3.6TB DRBD dual-primary replication OCFS2 cluster filesystem shared via NFS KINGSTON SFYRD4000G NVMe drives with PCI passthrough Networking LACP etherchannel (2x10Gbps per node) Cisco WS-C3850-12X48U switch Network-attached IoT peripherals (no USB dependencies) Dockerized Services Home Assistant (core automation) Mosquitto (MQTT broker) Zigbee2MQTT (Zigbee gateway) ESPHome (ESP device management) Node-RED (flow-based automation) IoT Peripherals Ethernet Zigbee: TubesZB CC2652P7 Ethernet Bluetooth: Olimex ESP32-POE-ISO USB-free design for clean failover AI Integration Local LLM (Ollama) on RTX 3090 Ti Private voice control and inference No cloud dependency Architecture ┌─────────────────────────────────────────────────────────────────┐ │ Floating VIP (Active) │ └───────────────────────────┬─────────────────────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ pve01 │ │ pve02 │ │ pve03 │ │ Active │◄─────────►│ Standby │◄─────────►│Arbitrator│ │ │ DRBD │ │ DRBD │ │ └────┬────┘ Sync └────┬────┘ Sync └────┬────┘ │ │ │ │ Corosync Ring │ │ └─────────────────────┴─────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ DRBD + OCFS2 + NFS │ │ (3.6TB Replicated Storage) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Docker Services │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │Home Assistant│ │ Mosquitto │ │ Zigbee2MQTT │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ ESPHome │ │ Node-RED │ │ │ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Ethernet Zigbee │ │Ethernet Bluetooth│ │ Local LLM │ │ TubesZB CC2652P7│ │ Olimex ESP32-POE │ │ RTX 3090 Ti │ └─────────────────┘ └─────────────────┘ └─────────────────┘ What I Got Right (and Wrong) The USB-to-Ethernet migration was the best decision in this project. Zigbee coordinator → TubesZB CC2652P7 (Ethernet), Bluetooth → Olimex ESP32-POE-ISO (Ethernet). USB devices can’t fail over between physical hosts; network devices don’t care which host they’re talking to. Re-pairing dozens of Zigbee devices was painful, but the result is a setup where the smart home stack moves between nodes without caring about peripheral hardware. ...

Nextcloud High Availability (NCHA)

A self-hosted Nextcloud cluster spread across multiple Proxmox nodes, built so that losing a node doesn’t take down file storage, Talk, Collabora, or Whiteboard. Same clustering approach as the HAHA project — Pacemaker, DRBD, the works — applied to a different problem. Overview Aspect Details Load Balancing HAProxy (Layer 7) + DNS Round Robin Database Galera-MariaDB cluster Cache Redis Sentinel Storage DRBD + OCFS2 shared volumes Authentication FreeIPA LDAP Services Nextcloud, Talk, Collabora Code, Whiteboard Technology Stack Orchestration & HA Corosync + Pacemaker for VIP management HAProxy for Layer 7 load balancing DNS Round Robin for geographic distribution Storage Backend DRBD + OCFS2 shared volumes Clustered Galera-MariaDB for database HA Redis Sentinel for distributed caching Networking & Security FreeIPA-based LDAP authentication SSL offloading at HAProxy Health checks for automatic failover Nextcloud Services Frontend nodes (web servers) Backend services: Talk, Collabora Code, Imaginary, Whiteboard WebDAV for file access Infrastructure Two NVMe-equipped nodes with PCI passthrough Third arbitrator node for quorum Architecture ┌─────────────────────────────────────────────────────────────────┐ │ DNS Round Robin │ │ (nextcloud.domain.tld) │ └───────────────────────────┬─────────────────────────────────────┘ │ ┌──────────────────┴──────────────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ HAProxy 1 │ │ HAProxy 2 │ │ (Active) │◄───────────────►│ (Standby) │ │ Floating VIP │ Pacemaker │ │ └────────┬────────┘ └────────┬────────┘ │ │ └──────────────────┬────────────────┘ │ Health Checks ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Nextcloud 1 │ │ Nextcloud 2 │ │ Nextcloud 3 │ │ Frontend │ │ Frontend │ │ (Backup) │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └────────────────────┼────────────────────┘ │ ┌───────────────────────────┴───────────────────────────┐ │ Backend Services │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ Talk │ │ Collabora │ │ Whiteboard│ │ │ └───────────┘ └───────────┘ └───────────┘ │ └───────────────────────────────────────────────────────┘ │ ┌───────────────────────────┴───────────────────────────┐ │ Data Layer │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Galera-MariaDB │ │ Redis Sentinel │ │ │ │ Cluster │ │ (3 nodes) │ │ │ └─────────────────┘ └─────────────────┘ │ │ ┌─────────────────────────────────────────┐ │ │ │ DRBD + OCFS2 (NFS Export) │ │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────┐ │ FreeIPA │ │ (LDAP Authentication) │ └───────────────────────────────────────────────────────┘ What Made This Tricky The challenge wasn’t any single component — it was getting Galera, Redis Sentinel, DRBD, OCFS2, and HAProxy to all agree on what “healthy” means at the same time. Each layer has its own idea of quorum, its own failure detection, and its own timeout semantics. ...

Unified GPU Homelab: AI, Media & Gaming

One Dell Precision 3680 with an RTX 3090 Ti passed through to an Ubuntu VM. It runs local LLMs, transcodes media, and streams PC games — all from the same GPU, depending on what I need at the time. Overview Aspect Details Host Dell Precision 3680 CPU Intel Core i9-14900K RAM 128 GB GPU NVIDIA RTX 3090 Ti (PCI-e passthrough) Hypervisor Proxmox VE 8.4.5 VM 16 vCPUs, 40 GB RAM, Ubuntu 24.04.3 LTS Workload Breakdown AI Inference & Language Tasks Ollama - Local LLM inference Whisper / Faster-Whisper - Speech-to-text Piper TTS - Text-to-speech LibreTranslate - Machine translation Stable Diffusion - Image generation Viseron - AI-powered video analysis Immich - AI-powered photo management Ollama API Integrations Open WebUI (ChatGPT-style interface) Nextcloud AI features Perplexica (AI search) n8n workflow automation Home Assistant voice control Media Streaming & Transcoding Jellyfin - Media server with hardware transcoding Viseron - NVR with AI object detection Immich - Photo/video library with ML features Cloud Gaming Stack Sunshine - Game streaming server Xorg - Display server Steam with Proton + Wine Tested: Cyberpunk 2077 Clients: Moonlight (Android TV), Xbox controllers Architecture ┌─────────────────────────────────────────────────────────────────┐ │ Dell Precision 3680 │ │ i9-14900K | 128 GB RAM │ │ Proxmox VE 8.4.5 │ └─────────────────────────────────────────────────────────────────┘ │ │ PCI-e Passthrough ▼ ┌─────────────────────────────────────────────────────────────────┐ │ GPU VM (Ubuntu 24.04.3) │ │ 16 vCPUs | 40 GB RAM │ │ RTX 3090 Ti (24GB VRAM) │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ AI Workloads │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Ollama │ │ Whisper │ │ Piper │ │ Stable │ │ │ │ │ │ LLM │ │ STT │ │ TTS │ │Diffusion │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Media Services │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Jellyfin │ │ Viseron │ │ Immich │ │ │ │ │ │ (NVENC) │ │ (NVR) │ │ (Photos) │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Cloud Gaming │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Sunshine │ │ Steam │ │ Xorg │ │ │ │ │ │(Streamer)│ │ (Proton) │ │(Display) │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ Network ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Clients │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Open WebUI │ │Home Assistant│ │ Moonlight │ │ │ │ (AI Chat) │ │(Voice Control)│ │(Game Stream) │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────────┘ What It Does The GPU is shared across three types of work: ...

Portfolio site architecture: GitLab CI pipeline to dual-cluster DMZ deployment

Portfolio Site: Hugo + Live Chaos Engineering

This is the site you’re looking at. It started as a Hugo portfolio deployed to DMZ nginx containers via GitLab CI and AWX. Then I added a status page that shows the VPN mesh in real-time. Then I added the ability to kill tunnels from the browser. Then I added a log terminal that streams syslog from the firewalls while you watch. It kept going. Try it on the status page – click tunnel edges on the graph, hit Kill, and watch what happens. ...

Home Assistant High Availability (HAHA)

The problem with smart homes There is a certain irony in building a smart home that becomes useless the moment a single Raspberry Pi decides to fail. I had been running Home Assistant on a standalone VM for years, and while it worked, I found myself increasingly aware of just how fragile the setup was. A failed disk, a corrupted SD card, an unfortunate kernel panic during a firmware update — any of these would leave me fumbling for light switches like it was 1995. ...