Designed and deployed a fully self-hosted, fault-tolerant Nextcloud cluster across multiple Proxmox nodes. This setup ensures continuous access to file storage and collaboration services—including Talk, Collabora, and Whiteboard—via high-availability mechanisms at both the compute and storage layers.

Overview

AspectDetails
Load BalancingHAProxy (Layer 7) + DNS Round Robin
DatabaseGalera-MariaDB cluster
CacheRedis Sentinel
StorageDRBD + OCFS2 shared volumes
AuthenticationFreeIPA LDAP
ServicesNextcloud, Talk, Collabora Code, Whiteboard

Technology Stack

Orchestration & HA

  • Corosync + Pacemaker for VIP management
  • HAProxy for Layer 7 load balancing
  • DNS Round Robin for geographic distribution

Storage Backend

  • DRBD + OCFS2 shared volumes
  • Clustered Galera-MariaDB for database HA
  • Redis Sentinel for distributed caching

Networking & Security

  • FreeIPA-based LDAP authentication
  • SSL offloading at HAProxy
  • Health checks for automatic failover

Nextcloud Services

  • Frontend nodes (web servers)
  • Backend services: Talk, Collabora Code, Imaginary, Whiteboard
  • WebDAV for file access

Infrastructure

  • Two NVMe-equipped nodes with PCI passthrough
  • Third arbitrator node for quorum

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      DNS Round Robin                             │
│                   (nextcloud.domain.tld)                        │
└───────────────────────────┬─────────────────────────────────────┘
         ┌──────────────────┴──────────────────┐
         ▼                                     ▼
┌─────────────────┐                 ┌─────────────────┐
│    HAProxy 1    │                 │    HAProxy 2    │
│   (Active)      │◄───────────────►│   (Standby)     │
│   Floating VIP  │   Pacemaker     │                 │
└────────┬────────┘                 └────────┬────────┘
         │                                   │
         └──────────────────┬────────────────┘
                            │ Health Checks
         ┌──────────────────┼──────────────────┐
         ▼                  ▼                  ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│ Nextcloud 1 │      │ Nextcloud 2 │      │ Nextcloud 3 │
│  Frontend   │      │  Frontend   │      │  (Backup)   │
└──────┬──────┘      └──────┬──────┘      └──────┬──────┘
       │                    │                    │
       └────────────────────┼────────────────────┘
┌───────────────────────────┴───────────────────────────┐
│                    Backend Services                    │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐           │
│  │   Talk    │ │ Collabora │ │ Whiteboard│           │
│  └───────────┘ └───────────┘ └───────────┘           │
└───────────────────────────────────────────────────────┘
┌───────────────────────────┴───────────────────────────┐
│                    Data Layer                          │
│  ┌─────────────────┐  ┌─────────────────┐            │
│  │ Galera-MariaDB  │  │  Redis Sentinel │            │
│  │    Cluster      │  │    (3 nodes)    │            │
│  └─────────────────┘  └─────────────────┘            │
│  ┌─────────────────────────────────────────┐         │
│  │        DRBD + OCFS2 (NFS Export)        │         │
│  └─────────────────────────────────────────┘         │
└───────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────┐
│                      FreeIPA                          │
│              (LDAP Authentication)                    │
└───────────────────────────────────────────────────────┘

Key Features

  • Active-active frontend nodes with load-balanced HTTPS access
  • Shared and resilient storage using DRBD-backed OCFS2 volumes
  • Highly available clustered database and in-memory cache layers
  • Integrated LDAP for centralized authentication
  • Modular auxiliary service deployment with failover support

Challenges Solved

  • Implemented cluster-aware database and NFS storage with automated failover
  • Tuned HAProxy with backend health checks for traffic failover and load distribution
  • Optimized service orchestration across multiple LXCs and VMs
  • Avoided SPOFs across authentication, web, and application layers

Results

Achieved a robust private cloud platform with continuous uptime during simulated node failures. Cluster gracefully tolerates network disruptions and node restarts with no data loss or service interruption.

Skills Demonstrated

  • Galera cluster administration
  • Pacemaker/Corosync HA
  • HAProxy load balancing
  • DRBD replication
  • OCFS2 cluster filesystem
  • Redis Sentinel
  • MariaDB administration
  • LDAP/FreeIPA integration
  • SSL certificate management
  • Proxmox virtualization
  • Nextcloud administration

Inspired by Matthias Wobben’s “Guide to Nextcloud Cluster Design” for architectures supporting up to 500k users