21 minutes
DNS Anycast: Using BGP for DNS High-Availability
DNS has a number of mechanisms for redundancy and high availability. More often than not, clients will have a primary and secondary nameserver to talk to. However, if the primary nameserver fails for whatever reason, then the queries to the primary usually need to timeout before attempting queries to the secondary.
Also the speed of general web browsing can often be dictated by how long it takes to receive a valid DNS response to the query. If you are going to multiple sites one after the other, then you are likely to need to wait briefly while DNS does its thing.
To get around this, there is a mechanism known as Anycast. This allows multiple servers to use the same IP, and then routing takes care of which server to go to. This has a couple of notable benefits: -
- Requests to an Anycast IP are not dependant on the availability of a single server
- Requests can be forwarded to the “closest” server with the Anycast IP
The term “closest” means shortest in terms of routing. You might find that the “closest” in terms of how a packet is routed is not physically the closest server with said IP.
Typically though, providers who serve DNS requests (e.g. Google’s 8.8.8.8
, CloudFlare’s 1.1.1.1
) will have enough presence internationally to place DNS servers close to the users.
BGP
The routing protocol most often used for Anycast (and for routing on the Internet generally) is the Border Gateway Protocol (or BGP). For those who do not know, a routing protocol is used to dynamically advertise and receive routes between neighbouring devices. BGP is one such protocol.
I won’t go into an in-depth discussion about BGP, but if you would like to know more about it, I would refer you to the Beginner’s Guide to Understanding BGP.
Anycast IP?
The Wikipedia definition of Anycast is as such: -
Anycast is a network addressing and routing methodology in which a single destination address has multiple routing paths to two or more endpoint destinations.
Routers will select the desired path on the basis of number of hops, distance, lowest cost, latency measurements or based on the least congested route.
An Anycast IP is no different from any other IP address. They are not allocated from a specific range like multicast (224.0.0.0/4
).
What makes an IP anycast is it being configured on multiple servers and using a routing protocol to advertise it. Technically you could also do this with static routes (rather than a routing protocol), but I wouldn’t advise it!
How does it work?
To demonstrate Anycast, I’m going to go through a lab with: -
- Two nameservers, one running BIND9, the other running Unbound
- Two client machines, configured to use the Anycast IP for DNS requests
- Two VyOS routers acting as a gateway to the client machines, and BGP peers to the nameservers
One of the main points to note is that to provide Anycast services, you need to run a routing protocol on the nameservers directly, not just on the routers. Without this, you are reliant on BGP timeouts or interfaces going down to see if a server has gone down.
The diagram below shows the setup: -
Nameserver Preparation
I chose to use BIND9 and Unbound, partly to show that the DNS software running doesn’t matter, but also because I had never used Unbound before. Both servers are running Debian Buster.
Install DNS Software
To install BIND9 in Debian, run sudo apt-get install bind9
. After this is done, BIND9 should be running already: -
I also tend to install dnsutils
to give access to dig
and other useful tools.
I have configured the following options for BIND, to ensure it responds to DNS requests for hosts not on its local subnet. This is configured in /etc/bind/named.conf.options
: -
To check if this works, run dig yetiops.net @127.0.0.1
: -
To install unbound
instead, do sudo apt-get install unbound
instead. Again, it should start straight away once installed: -
The configuration for Unbound, using multiple forwarders, looks like the below: -
This configuration was taken from this Unbound DNS Tutorial.
Again, testing should give a similar result: -
The reason for doing the tests to 169.254.0.1
is that Unbound appears to respond on the physical interface IP, rather than the interface the query was received upon. I shall do a follow up on Unbound when I have used it more, but for now this serves the purpose that we need.
Network interface configuration
The network interface configuration on Debian will require a “loopback” interface. Rather than applying the Anycast IP directly to a physical interface, it is applied to a logical interface instead (the loopback).
This has benefits, in that you can use multiple physical interfaces as links to multiple routers, but advertising the same anycast IP (rather than being tied to a physical interface). Also, it means that you only have to use a host route (i.e. a /32
IP address), and cut down on your IP address usage. If you are using private address space, this probably isn’t much of a concern, but public IPv4 addresses are scarce (IPv6 is another matter entirely, but most clients still talk IPv4).
To apply this configuration on a Debian machine, you will need to add it into /etc/network/interfaces
like so: -
The above is the configuration on ns-01
. The configuration on ns-02
will be the same, except that the IP address of eth2
would be 10.21.2.3/31
·
FRR
FRR, or Free Range Routing, is a notable fork of Quagga that provides a number of routing protocols (and other useful network protocols, like VRRP and LDP) on Linux. It also has the vtysh
shell package, which allows you to configure, verify and monitor using very Cisco-like syntax.
To install on Debian or Ubuntu (or other Debian-like distributions), go to the FRR Debian Repository page. For other systems, please see the FRR documentation.
Once installed, the only changes I make are to enable the BGP daemon, and to add my user to the frr
and frrvty
groups. This allows me to administer FRR without requiring escalated privileges.
To enable the BGP daemon, open up /etc/frr/daemons
, find the line which says bgpd=no
, and change it to bgpd=yes
. After reloading (systemctl reload frr
), the BGP daemon should be available: -
A couple of error messages appear, but this is because BGP is not already running when the reload is performed. After this, future reloads shouldn’t show the same.
After running sudo usermod -aG frr $MY-USER
and sudo usermod -aG frrvty $MY-USER
, I should now be able to access to the vtysh
shell and start BGP: -
No BGP neighbours were found, but none have been configured, so this is expected behaviour.
Nameserver Routing Protocol Configuration
To setup BGP between the Nameservers and the VyOS routers, you’ll need to choose some Autonomous System numbers (ASNs). The private ranges (i.e. those that anyone can use, and should never be seen on the public internet) are 64512-65534 (for 2-byte ASNs) and 4200000000-4294967294 (for 4-byte ASNs). I’m going to use both, to show that none of this is dependent on the type used.
- ns-01 - BGP ASN 64520
- ns-02 - BGP ASN 64530
- VyOS Routers - BGP ASN 4290001234
FRR
The following configuration will be applied via vtysh
: -
ns-01
ns-02
For anyone who has configured a Cisco router, switch or similar, the syntax should be very familiar.
The main thing to notice is the network 169.254.0.1/32
statement. The same statement is configured on both Nameservers, because they are going to advertise the same IP (the Anycast IP). The network statement imports the route into BGP, and allows it to be advertised out to it’s peers.
VyOS BGP Configuration
VyOS configuration looks like a mixture of Juniper’s JunOS and Cisco’s IOS. It can look a little odd if you are heavily in either of the Cisco or Juniper camps, but it doesn’t take too long to get used to.
vyos-01
vyos-02
The configuration does not apply until you commit it (like JunOS and Cisco IOS-XR), and also if you do not save it, it will not be there on reboot.
The network statements are to ensure that the Nameservers know about the IP ranges of the clients.
Verification
Check Routing
After this, we should be able to see the Anycast IP appear in the routing tables of both VyOS routers: -
vyos-01
vyos-02
The last line on each route shows where it was received from. For vyos-01, this was received from 10.21.2.1
(the physical IP of ns-01). For vyos-02, this was received from 10.21.2.3
(the physical IP of ns-02).
This is the basis of Anycast, the same IP originating from multiple origins.
Test a DNS query
Testing DNS from the clients should show responses: -
client-01
client-02
Interestingly, we get different responses based upon whether we are hitting BIND (ns-01) or Unbound (ns-02), however they are running different forwarders so this would explain it.
How to prove that traffic is going to ns-01 or ns-02? tcpdump
of course!
ns-01
ns-02
So as we can see, client-01 (which is in the 192.168.2.10 subnet) is getting a response from ns-01, whereas client-02 is getting a response from ns-02. The destination address of the requests is 169.254.0.1
, but vyos-01 and vyos-02 have different routes for the IP address, therefore they arrive on different servers.
What if one server goes away?
We have already seen that DNS queries are being routed to the closest nameserver. In our scenario, this means that queries travel from the Client, to its connected router, and then to the nameserver connected to the same router.
What happens if say, the BGP peering failed to ns-01, or the server failed? Lets see!
ns-01
Now lets check the routing tables on vyos-01
vyos-01
Now vyos-01 thinks that 169.254.0.1
is available via vyos-02. Lets run another packet capture on ns-02, and see if DNS queries from client-01 and client-02 reach it: -
ns-02
Success! We will no longer be waiting for DNS queries to timeout to the first nameserver the client attempts, instead routing to the next closest server.
What happens if the DNS software stops working?
Rather than shutting down the server, this time we will just take down BIND on ns-01
ns-01
Lets test from client-01 and client-02
client-01
Well that isn’t good.
client-02
client-02 still works though. Why is this?
FRR is a routing daemon, and is used to provide routing updates from servers (or Linux-based network hardware). It does not track the state of the applications running, and whether they are health or not. This isn’t a limitation of FRR, but merely what FRR is designed to do (or where you would typically use it).
If you are using FRR to provide connectivity to a machine over several Layer 3 links (rather than using LACP/bonded interfaces), FRR would shine here. It also can be used to provide unnumbered neighbour relationships, but this is a topic for another day.
How do we track the DNS software?
One of the best examples of a routing daemon that can also react to the application state is ExaBGP, written by Exa Networks.
What ExaBGP does is periodically runs a script, and checks the output of said script. This script could be a BASH one-liner, or it could be a full application that checks an API for responses, or anything in between.
It has an inbuilt healthcheck tool (useful for BASH one-liners) or you can check the results of STDOUT
on running some form of script.
ExaBGP is written in Python, and can be installed using PIP: -
First, I create a script to check the DNS response from the local server: -
We are checking the output status of the command, and if it is anything other than 0
, then we withdraw the route. If the command succeeds (i.e. output status of 0
), then we will announce the route.
The ExaBGP configuration looks like the below: -
So we are running a BGP peering session to 10.21.2.1 (i.e. vyos-01), and then running a process. The process in question is the script created previously, ExaBGP takes the results from it, and turns them into BGP messages.
In this case, we are doing simple route announcement and withdrawal (with a next-hop set). However you could also add other parameters like Local Preference or MED (Multi-Exit Discriminator)), extend the AS-Path, or apply BGP Communities. All of this is beyond the scope of this article (I’ll probably do a bit of a BGP deep dive in a future post).
To ensure ExaBGP runs as a service, the following SystemD unit file was created: -
So now lets follow the same process as before.
Verification
Check routing
vyos-01
vyos-02
Packet captures
ns-01
ns-02
Taking down BIND
Lets take down BIND, and see if the routing changes at all: -
ns-01
vyos-01
Oh! It changed. Lets see what ExaBGP had to say: -
ns-01
And lets prove it with a packet capture
ns-02
There we go, both clients made it!
Summary
There is a lot to process here, especially if you are new to BGP and Anycast. The main things to take away from it though are: -
- Anycast is just an IP that exists in multiple places
- It is not from a reserved range or anything similar
- Using a routing daemon (e.g. FRR) directly on a server is preferable to make it work
- Failover at a basic level can be achieved quite easily (i.e. server failure)
- To track application state, you need to look at something like ExaBGP
Hopefully this will help in understanding, and getting people to play with Anycast more. It can be used for just about anything you want to make highly available. UDP applications work best (due to their connectionless nature), but it is quite possible to use this for TCP. I have seen ExaBGP used to make a RabbitMQ cluster anycast, rather than using DNS or other forms of service discovery.