Building a Self-Hosted Platform: Architecture Overview
In the previous post I explained why I’m moving my family off Google Workspace and onto self-hosted infrastructure. This post is about how it’s built.
The stack
The platform runs on Exoscale, a Swiss cloud provider with data centers in Frankfurt and Zurich. I picked them for jurisdiction: Swiss and EU privacy law, no CLOUD Act, no Five Eyes entanglement. Their compute and storage are comparable to what you’d get from a mid-size cloud provider, and the object storage is S3-compatible, so existing tooling mostly just works.
Infrastructure provisioning is Terraform. Configuration management is Ansible. Services run as Docker containers. Secrets are encrypted with SOPS using GPG keys and committed to the repository. The mail server is Stalwart, which handles SMTP, IMAP, JMAP, CalDAV, and CardDAV in a single binary. SSO uses Authelia backed by lldap, and the mesh network is Headscale with Tailscale clients.
For managed services, Exoscale provides DBaaS: PostgreSQL for data, Valkey (the Redis fork) for caching, and OpenSearch for full-text search. The blog you’re reading is built with Hugo and served as static files by nginx behind Traefik.
Nothing exotic. I wanted boring, well-understood tools that I can debug at 2 AM without having to re-read documentation.
Infrastructure design
Terraform is split into numbered layers, each with its own state file:
terraform/layers/
├── 00-bootstrap/ # State bucket (run once, local state)
├── 01-foundation/ # SSH keys, base security groups
├── 02-data/ # PostgreSQL, Valkey, OpenSearch, SOS buckets
├── 10-email/ # Stalwart compute instances (mx1, mx2)
├── 20-sso/ # SSO + VPN compute (sso): Authelia, lldap, Headscale
└── 30-apps/ # General app server (apps1): blog, Forgejo, Vaultwarden, Homepage, Tududi
The numbering is the dependency order. Layer 00 creates the S3-compatible bucket that stores Terraform state for everything else. Layer 01 sets up SSH keys and firewall rules. Layer 02 provisions the databases and object storage. Layer 10 creates the mail server instances. Layer 20 hosts the identity stack and the mesh VPN control plane. Layer 30 is the general-purpose app server for the blog and the application services (Forgejo, Vaultwarden, Homepage, Tududi).
Layers don’t reference each other directly. If layer 10 needs a security group from layer 01, it uses a Terraform data source to look it up by name. This keeps state files independent and means I can apply layers separately without cascading changes.
On the Ansible side, each service gets its own role. A condensed view:
ansible/roles/
├── common/ # Base OS setup, packages, UFW, fail2ban, sysctl
├── docker/ # Docker engine on the encrypted volume
├── luks/ # LUKS2 disk encryption
├── traefik/ # Reverse proxy + Let's Encrypt TLS
├── traefik_config/ # Routing config in Valkey KV
├── tailscale/ # Mesh client (per host)
├── tailscale_sidecar/ # Per-container sidecar pattern
├── headscale/ # VPN control server
├── headplane/ # Headscale web UI
├── lldap/ # LDAP directory
├── authelia/ # OIDC SSO + 2FA
├── stalwart/ # Mail server
├── roundcube/ # Webmail
├── fetchmail/ # Provider-mail import
├── blog/ # nginx for static Hugo output
├── homepage/ # Dashboard at start.cumps.eu
├── vaultwarden/ # Password manager
├── forgejo/ # Git forge
└── tududi/ # Task manager
Deploy is per-playbook rather than a single site-wide entrypoint. Numbered playbooks set ordering: 00-bootstrap.yml hardens SSH, 10-base.yml lays down LUKS + Docker, 20-traefik.yml brings up the proxy, 25-headscale.yml/26-tailscale.yml build the mesh, 30-lldap.yml/31-authelia.yml start SSO, then 40-stalwart.yml and onwards. Application services (50-blog, 51-homepage, 52-vaultwarden, 53-forgejo, 54-tududi) sit on top of that stack. Tags give additional granularity within roles.
Networking uses Tailscale as a mesh overlay. Every compute instance runs a Tailscale sidecar container, and internal traffic goes over the mesh. Traefik terminates TLS on the edge and routes to backend services over Tailscale. The blog container doesn’t need to be directly internet-exposed; Traefik handles that part.
Security in depth
Security is layered. The general idea is that each layer assumes the one above it might be compromised.
Exoscale security groups restrict inbound traffic to specific ports (25, 80, 443, 465, 993 for email; 80, 443 for web). UFW on each host adds a second firewall. fail2ban watches for brute-force attempts.
Every data volume is LUKS-encrypted. The OS volume boots normally and runs Docker, but the data volume at /mnt/data requires a passphrase to unlock. Services won’t start until the volume is mounted, which means a reboot requires manual intervention. The LUKS passphrase is stored in SOPS. I’ve thought about automating the unlock, but for now I prefer the manual step: it forces me to notice reboots.
Stalwart adds another layer on top of the disk encryption: OpenPGP encryption for email at rest. Even if you get database access, the mail content is encrypted with per-user keys. Two separate encryption layers, each independently useful.
TLS everywhere in transit. Let’s Encrypt certificates managed by Traefik for web services, by Stalwart’s built-in ACME client for mail. Traefik is configured with explicit cipher suites; no TLS 1.0 or 1.1.
SOPS handles secrets. GPG encrypts them before they’re committed to git, Ansible decrypts at runtime using the community.sops lookup plugin. The private key never touches a server. It stays on my workstation.
High availability
Email can’t go down. My family won’t tolerate “the mail server is being updated, try again later.” So Stalwart runs active-active across two compute instances: mx1 in Exoscale’s DE-FRA-1 zone and mx2 in CH-DK-2.
DNS round-robin distributes connections across both nodes. If one goes down, clients retry on the other. Both nodes share the same PostgreSQL database (managed DBaaS with built-in replication), the same Valkey cache, and the same SOS blob storage. No split-brain risk because the shared backends are the single source of truth.
I’ve tested this by deliberately shutting down one node at a time. Email continues to flow. When I get monitoring deployed (Uptime Kuma, it’s next on the list), I’ll get alerted when a node goes offline. For now I check manually, but the service stays up regardless.
The blog and other non-critical services don’t have HA. They run on a single instance. If apps1 goes down, the blog is offline until I fix it. For a personal blog, that’s fine.
Tradeoffs
I went with managed DBaaS over self-hosted PostgreSQL. Exoscale handles backups, patching, failover, connection pooling. I pay more per month and give up some control. For a platform where I’m the only operator, reduced maintenance wins. I can migrate to self-hosted later if I want to; the application layer doesn’t care where the database lives.
Hugo over Ghost or WriteFreely for the blog. No database to back up, no runtime to patch, no admin panel to secure. Publishing requires a local build and deploy step, but since I’m the only author and I live in a terminal anyway, that’s fine.
The mail server gets dedicated instances because email is critical. Everything else shares apps1. If something on apps1 starts consuming too many resources, I can move it to its own instance by adding a Terraform layer and changing the Ansible inventory target. The roles are already structured for that.
A real problem: the healthcheck that wasn’t
During the blog deployment, I ran into an issue with Tailscale sidecar containers on Alpine Linux. The healthcheck was configured to ping localhost. On Alpine, it didn’t work. The container reported unhealthy, Traefik refused to route traffic to it, and the blog was unreachable.
The fix: change localhost to 127.0.0.1. Alpine’s minimal /etc/hosts doesn’t always resolve localhost the way Debian-based images do. It works fine on Ubuntu, works fine in local testing, and then breaks in production with no useful error message.
I lost about an hour before I thought to check actual DNS resolution inside the container. If a healthcheck fails, check what it’s resolving, not just whether the service is listening.
And one more: Exoscale’s S3-compatible storage uses a different lock file name for Terraform state locking than what the Terraform docs describe. The docs say .terraform.lock.info. Exoscale SOS actually uses terraform.tfstate.tflock. I found this by reading the SOS API logs when state locking appeared to silently do nothing. If you’re setting up Terraform state on Exoscale and locking seems broken, check the actual object names in your bucket.
What’s next
The mail server is running. The blog is deployed. The security layers are in place. Still ahead:
- Monitoring with Uptime Kuma, because manually checking services is not a long-term strategy
- Automated backups with restic, both to Exoscale SOS and an offsite Hetzner Storage Box
- Migrating fifteen years of data out of Google Workspace
- Self-hosted Git with Forgejo
- Password management with Vaultwarden
I’ll write about these as I get to them. If you’re building something similar and have questions, my email runs on the infrastructure described above.
You May Also Like
Why I'm Self-Hosting Everything
A few months ago my nine-year-old asked me why YouTube always seems to …