Routes and policy: giving the mesh the keys, one team at a time
An authenticated peer that can't reach anything is useless. This part turns the gateway into a routing peer, advertises the internal network, pushes internal DNS, and replaces 'connected = full access' with default-deny, group-based policy.
Real work, anonymized — generic subnets/domains throughout.
After Part 3 a remote user can install the client, authenticate through our SSO with MFA, and join the mesh. And then… reach nothing internal, because their tunnel only connects them to other peers, not to the office network. This part fixes that — and uses the opportunity to fix something the old VPN got wrong: treating “connected” as “allowed everywhere.”
The mental model: a routing peer is a deliberate on-ramp¶
In a mesh, peers talk to peers. To reach a plain server that isn’t running the client (a database, an internal web app), some peer has to volunteer to forward traffic onto the LAN on everyone’s behalf. That’s a routing peer:
Why one peer advertises the whole subnet
Most internal hosts will never run a mesh client — you don’t install agents on every
appliance and database. So you designate a peer that lives on the internal network (the
gateway VM) as a routing peer and have it advertise the internal subnets. Now any
authenticated peer can reach 10.0.0.0/24 through the gateway, exactly the way OpenVPN
pushed a route to the whole network — except now it’s one explicit, revocable on-ramp you
can reason about, not a property of merely holding a cert.
With masquerade left on (the default), return traffic is NAT’d back through the routing peer, so the internal hosts need no route back toward mesh clients — one less thing the edge firewall has to know about.
Advertise the routes, then push internal DNS¶
In the dashboard, add the gateway as the routing peer and advertise the internal subnets it
fronts (10.0.0.0/24, plus any others). Then replicate the other thing OpenVPN used to do for
free — internal name resolution:
Why peers need a nameserver group
OpenVPN pushed an internal DNS server so clients could resolve *.int.example.com. The mesh
needs the same: a nameserver group that says “for the int.example.com domain, use the
internal resolver at 10.0.0.20.” Without it, a peer can reach 10.0.0.x by IP but can’t
resolve app.int.example.com — and every internal link and bookmark breaks. Routing and
resolution are two separate jobs; you have to wire both.
# from a remote test client, after routes + DNS are pushed:
ssh user@host.int.example.com # reaches an internal host through the routing peer
dig +short app.int.example.com # resolves via the internal nameserver
Both worked from a remote client: internal hosts were reachable through the routing peer,
and *.int.example.com names resolved via the pushed nameserver — the same reach OpenVPN
gave, now arriving over WireGuard.
Default-deny, group-based policy — the actual upgrade¶
The old VPN’s implicit rule was “if you connected, you can reach everything.” That’s the part I specifically don’t want to recreate. The directory groups we imported into Keycloak (Part 2) flow as claims into NetBird groups, and policy is written against those:
Why default-deny with per-team policies
“Connected = full access” means one compromised laptop sees the whole network. Instead, start from default-deny and open only what each team needs: engineering reaches the engineering subnet and dev hosts; ops reaches the management plane; staff reach the handful of apps they actually use. Because the groups come from the directory via SSO, access is automatically correct as people join, change teams, or leave — the policy follows identity, not a file. This is the practical, unglamorous version of “zero trust”: stop trusting the network, start trusting the authenticated, group-scoped identity.
The acceptance bar for this part: a remote peer has the same internal reach OpenVPN gave them — but scoped to their team, and a peer in another group is provably denied the things that aren’t theirs.
Both halves checked out: a peer in the engineering group reached its team’s hosts, and a peer outside that group was cleanly denied the resources that weren’t theirs — default-deny doing its job.
Honest limits¶
- The routing peer is a chokepoint. All peer-to-LAN traffic for non-mesh hosts flows through the gateway — fine at our scale, but it’s a capacity and availability consideration I’m noting, not hand-waving. (Direct peer-to-peer traffic still goes direct.)
- Policy is only as good as the group data. This leans entirely on the directory groups being accurate. Part of the payoff of the identity spine is that there’s now one place to get that right.
What’s next¶
Routes work, DNS works, and access is scoped by team. Everything is in place — but not a single real user has moved yet, and the old VPN is still carrying everyone. Part 5 is the part that takes nerve: cutting over in waves, with OpenVPN live as the safety net the whole time, and the backups-and-rollback discipline that makes “build it alongside” actually true.