Designed and deployed a fully self-hosted, fault-tolerant Nextcloud cluster across multiple Proxmox nodes. This setup ensures continuous access to file storage and collaboration services—including Talk, Collabora, and Whiteboard—via high-availability mechanisms at both the compute and storage layers.
Overview
| Aspect | Details |
|---|---|
| Load Balancing | HAProxy (Layer 7) + DNS Round Robin |
| Database | Galera-MariaDB cluster |
| Cache | Redis Sentinel |
| Storage | DRBD + OCFS2 shared volumes |
| Authentication | FreeIPA LDAP |
| Services | Nextcloud, Talk, Collabora Code, Whiteboard |
Technology Stack
Orchestration & HA
- Corosync + Pacemaker for VIP management
- HAProxy for Layer 7 load balancing
- DNS Round Robin for geographic distribution
Storage Backend
- DRBD + OCFS2 shared volumes
- Clustered Galera-MariaDB for database HA
- Redis Sentinel for distributed caching
Networking & Security
- FreeIPA-based LDAP authentication
- SSL offloading at HAProxy
- Health checks for automatic failover
Nextcloud Services
- Frontend nodes (web servers)
- Backend services: Talk, Collabora Code, Imaginary, Whiteboard
- WebDAV for file access
Infrastructure
- Two NVMe-equipped nodes with PCI passthrough
- Third arbitrator node for quorum
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ DNS Round Robin │
│ (nextcloud.domain.tld) │
└───────────────────────────┬─────────────────────────────────────┘
│
┌──────────────────┴──────────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ HAProxy 1 │ │ HAProxy 2 │
│ (Active) │◄───────────────►│ (Standby) │
│ Floating VIP │ Pacemaker │ │
└────────┬────────┘ └────────┬────────┘
│ │
└──────────────────┬────────────────┘
│ Health Checks
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Nextcloud 1 │ │ Nextcloud 2 │ │ Nextcloud 3 │
│ Frontend │ │ Frontend │ │ (Backup) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────────┼────────────────────┘
│
┌───────────────────────────┴───────────────────────────┐
│ Backend Services │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Talk │ │ Collabora │ │ Whiteboard│ │
│ └───────────┘ └───────────┘ └───────────┘ │
└───────────────────────────────────────────────────────┘
│
┌───────────────────────────┴───────────────────────────┐
│ Data Layer │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Galera-MariaDB │ │ Redis Sentinel │ │
│ │ Cluster │ │ (3 nodes) │ │
│ └─────────────────┘ └─────────────────┘ │
│ ┌─────────────────────────────────────────┐ │
│ │ DRBD + OCFS2 (NFS Export) │ │
│ └─────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────┐
│ FreeIPA │
│ (LDAP Authentication) │
└───────────────────────────────────────────────────────┘
Key Features
- Active-active frontend nodes with load-balanced HTTPS access
- Shared and resilient storage using DRBD-backed OCFS2 volumes
- Highly available clustered database and in-memory cache layers
- Integrated LDAP for centralized authentication
- Modular auxiliary service deployment with failover support
Challenges Solved
- Implemented cluster-aware database and NFS storage with automated failover
- Tuned HAProxy with backend health checks for traffic failover and load distribution
- Optimized service orchestration across multiple LXCs and VMs
- Avoided SPOFs across authentication, web, and application layers
Results
Achieved a robust private cloud platform with continuous uptime during simulated node failures. Cluster gracefully tolerates network disruptions and node restarts with no data loss or service interruption.
Skills Demonstrated
- Galera cluster administration
- Pacemaker/Corosync HA
- HAProxy load balancing
- DRBD replication
- OCFS2 cluster filesystem
- Redis Sentinel
- MariaDB administration
- LDAP/FreeIPA integration
- SSL certificate management
- Proxmox virtualization
- Nextcloud administration
Inspired by Matthias Wobben’s “Guide to Nextcloud Cluster Design” for architectures supporting up to 500k users
