OpenBSD CARP Firewalls in Front of a Private Kubernetes Cluster

The Problem

We had a Kubernetes cluster running in the cloud, but in a completely private network (VPC). Only the firewall nodes expose public IPs to the internet. The cluster needed a stateful front-end that could:

Handle HA — a single firewall failing would kill everything
Forward traffic to Kubernetes NodePorts without cloud magic
Survive failover without dropping established connections
Load-balance across multiple worker nodes

Cloud load balancers (AWS ALB, NLB, etc.) felt like overkill — we’d be paying for managed services when we wanted full control over traffic rules and failover behavior. We looked at HAProxy in HA mode (Keepalived), but Keepalived is stateless. Every connection would drop on failover.

We settled on OpenBSD running CARP (Common Address Redundancy Protocol) with pfsync state synchronization. It’s been in production for years and remains the pragmatic choice for stateful, sub-second failover.

Status: Still running, proven stable, no plans to change.

Why OpenBSD + CARP

OpenBSD’s CARP is not a new protocol, but it solves a specific problem well: stateful, synchronized failover between two firewalls.

Here’s what makes it different from Keepalived:

CARP — handles virtual IP failover (master/backup), with the backup taking over automatically
pfsync — synchronizes firewall state (all connections, NAT tables, etc.) in real-time
Together — when master fails, backup inherits all active connections without dropping a single packet

The alternative (HAProxy + Keepalived) would require application-level reconnect logic or connection pooling. With stateful firewall sync, TCP connections just keep working.

Other options we considered and rejected:

Cisco/Juniper — overkill, expensive, requires vendor support
pfSense — built on OpenBSD, but pricey; OpenBSD itself was free
Keepalived + HAProxy — stateless failover, requires app-level reconnect handling
Cloud load balancers — not an option for on-prem

OpenBSD was the only thing that gave us stateful sync without paying enterprise fees.

The Architecture

Two OpenBSD VMs in separate availability zones (for fault isolation).

Internet
   ↓
[Firewall-Primary] ← CARP Master - public IP
[Firewall-Backup]  ← CARP Backup - public IP
   ↓ (both have CARP VIP - public)
Private Network (VPC)
   ↓
[Kubernetes Cluster]
   ↓
[Traefik Ingress Controller]

Network Design:

External interface — public IP, handles internet traffic
Internal interface — private network to K8s cluster
Dedicated sync network — high-speed link between firewalls for pfsync (critical for low-latency state sync)
CARP VIP (external) — 72.X.X.100, load balancing incoming traffic
CARP VIP (internal) — 10.X.X.1, cluster access

The sync network was crucial. If pfsync packets got queued behind regular traffic, state sync would lag, and failover wouldn’t be clean.

pf Configuration (simplified):

# CARP VIP for external HTTP/HTTPS traffic
pass in on egress proto tcp to <vip-public> port { 80, 443 } \
  rdr-to <worker-pool> port { 80, 443 }

# NAT: all outbound traffic from cluster to public IP
pass out on egress from <cluster-net> nat-to <vip-public>

The firewall does simple port forwarding: traffic on public IPs (72.X.X.100) gets redirected to private IPs in the cluster. Inside the cluster, Traefik (Kubernetes ingress controller) handles routing. This separation of concerns is clean:

pf (firewall): Public ↔ Private IP translation, HA failover, NAT
Traefik (cluster): HTTP routing, TLS termination, rate limiting, path-based routing

No load balancing logic at the firewall level. The rdr-to pool distributes new connections, but Traefik is what actually routes requests to backend services.

Real-Time Failover Demo

Here’s what happens when the master firewall is shut down mid-connection. Notice: zero packet loss.

The demo shows:

Before failover: pings going through master firewall (RTT ~5ms)
Failover happens: master goes down, backup takes over in <100ms
After failover: pings continue, new RTT shows traffic now goes through backup
No failures: not a single ping was dropped

This works because pfsync keeps the backup’s state table synchronized. When the backup becomes master, all existing connections are already in its state table.

Port Forwarding Strategy

We forward public ports to private cluster ports:

80 → cluster port 80 (HTTP)
443 → cluster port 443 (HTTPS)

The firewall does simple 1:1 forwarding. Traefik inside the cluster routes requests to backend services.

This separation is clean: the firewall doesn’t care what service is running. It just translates public IPs to private IPs. Traefik handles everything else.

Load Balancing at the Firewall Level

The pf rdr-to rule includes a pool of backend worker IPs. pf distributes new connections using a hash of source/destination (not pure round-robin).

rdr-to <worker-pool> port 80

This spreads traffic across workers, but Traefik does the actual routing. pf just ensures the connection gets to one worker; Traefik then handles:

Layer-7 routing (by hostname, path, headers)
TLS termination
Service discovery

Traefik Redundancy and Failover

Traefik itself runs redundantly across multiple worker nodes. We deploy Traefik with multiple replicas (typically 3+), so if one Traefik pod crashes or a node fails, other Traefik instances immediately take over routing.

This creates a complete HA stack:

Firewall layer: CARP handles public IP failover (OpenBSD VMs)
Ingress layer: Traefik replicas handle traffic distribution (Kubernetes)
Application layer: App pods run with multiple replicas

If a single component fails at any layer, the others absorb the traffic. The firewall doesn’t know (or care) which Traefik pod handles a request—it just forwards to any worker running Traefik.

Sticky Sessions with Traefik

For stateful applications that need session affinity, Traefik uses sticky session cookies. Services configured in Kubernetes can enable sticky cookies via annotations:

apiVersion: v1
kind: Service
metadata:
  annotations:
    traefik.ingress.kubernetes.io/service.sticky.cookie: "true"
    traefik.ingress.kubernetes.io/service.sticky.cookie.httponly: "true"
    traefik.ingress.kubernetes.io/service.sticky.cookie.samesite: "none"
    traefik.ingress.kubernetes.io/service.sticky.cookie.secure: "true"
  name: my-service
spec:
  ports:
    - port: 80
  selector:
    app: my-service

Traefik sets a cookie with a session ID. Subsequent requests from the same client automatically route to the same backend pod. No application changes needed.

So: pf distributes to workers, Traefik distributes to pods (with session stickiness if needed).

Updating OpenBSD VMs: Trivial

One of the best parts: updates are dead simple.

OpenBSD’s syspatch command patches the OS in ~30 seconds. Reboot takes ~1 minute. When the primary firewall reboots:

CARP automatically fails over to the backup
pfsync keeps all connection state in sync
Existing connections continue without interruption
New connections route through the backup

Zero downtime. No coordination needed. Just reboot and move on.

Then update the backup. Done.

This simplicity is a huge win over cloud load balancers, which often require scheduled maintenance windows or blue-green deployments for updates.

What We Learned

1. State synchronization is non-negotiable for failover

Stateless failover (Keepalived) is fine if your application handles reconnects. Most didn’t. pfsync solved this by keeping state synchronized, so the backup could take over transparently.

2. Dedicated sync networks matter

If pfsync shares bandwidth with data traffic, state sync gets queued and lags. A fast, direct link between firewalls is worth the extra networking.

3. OpenBSD/pf is simple but powerful

pf syntax is clearer than iptables. Configuration lives in one file. Rules are easier to audit and modify. We didn’t need a GUI — pfctl and a text editor were enough.

4. CARP works, but requires planning

CARP VIPs work great, but you need to think about:

Which interface is the VIP on?
What’s the priority (master vs backup)?
How fast do you want failover? (advskew tuning)
Is your sync network fast enough?

Getting this wrong means either failover doesn’t happen, or the master and backup both think they’re the master (split-brain).

5. Load balancing at the firewall has limits

Round-robin distribution across NodePorts worked fine for HTTP/HTTPS, but it’s not session-aware. Sticky sessions had to happen at the application layer (cookies, JSession, etc.) or via Kubernetes ingress rules.

The Trade-off

What we gave up:

Automatic scaling (firewall is fixed capacity)
Cloud flexibility (bound to on-prem hardware)
SLA guarantees (no vendor support — we own the code)
Automatic updates (OpenBSD stable releases, manual patching)

What we gained:

Full control over traffic rules
No cloud egress charges
Stateful failover without dropped connections
Simplicity (two files: pf.conf and CARP config)
Cost (free OS, standard hardware)

For a private cluster serving internal users, this trade-off made sense. For a public SaaS platform, cloud load balancers might be worth the cost.

Would We Do It Again?

Absolutely. This setup has been running for years in production. Zero regrets.

CARP + pfsync is the right tool for stateful HA failover when you control the infrastructure. The setup is straightforward, the failover is sub-second, and the operational overhead is minimal. Updating firewalls is easier than updating most managed cloud services.

The trade-offs are favorable:

✅ Full control over traffic rules
✅ No cloud egress charges
✅ Sub-second stateful failover (unbeatable for connection continuity)
✅ Simple to operate and update
❌ You own it (no vendor support)
❌ Bound to your infrastructure provider’s network

For a private cluster where you already own the infrastructure, OpenBSD CARP is the pragmatic choice. It works. It keeps working. Updates are trivial. Failover is transparent.

If you’re considering this: You need comfort with:

Network architecture (VIPs, multicast, ARP)
Command-line firewall management (pf syntax is learnable)
Monitoring for split-brain scenarios (rare but possible)
Testing failover regularly

If you have those skills and control your own infrastructure, stop paying for cloud load balancers. CARP is simpler and better for your use case.

Menu