6 minutes
Prometheus: Discover services with DNS
In a previous post I covered how to use Consul for service discovery, allowing Prometheus to automatically discover what services to monitor.
There are some cases where either setting up Consul (or similar) is not viable, or adds complexity that is not required. If you are already running your own DNS nameservers, you could make use of DNS SRV records.
Common DNS record types
The most common DNS records are A, AAAA and PTR. An A record is a simple “name to IPv4” mapping, e.g. one.one.one.one
would become 1.1.1.1
. A AAAA record is the same, except for IPv6.
$ dig A one.one.one.one
; <<>> DiG 9.14.7 <<>> A one.one.one.one
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34576
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: 19b10b10c5d45d10 (echoed)
;; QUESTION SECTION:
;one.one.one.one. IN A
;; ANSWER SECTION:
one.one.one.one. 176 IN A 1.1.1.1
one.one.one.one. 176 IN A 1.0.0.1
;; Query time: 8 msec
$ dig AAAA one.one.one.one
; <<>> DiG 9.14.7 <<>> AAAA one.one.one.one
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12686
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: a91fb8e973aa5e78 (echoed)
;; QUESTION SECTION:
;one.one.one.one. IN AAAA
;; ANSWER SECTION:
one.one.one.one. 299 IN AAAA 2606:4700:4700::1111
one.one.one.one. 299 IN AAAA 2606:4700:4700::1001
;; Query time: 24 msec
A PTR record, or Pointer, is what provides reverse DNS. When you see IPs translated to a hostname (for example, in a traceroute), it is PTR records that are providing this. Some tools, like host
automatically translate the IP address into the correct format for PTR records: -
$ host 1.1.1.1
1.1.1.1.in-addr.arpa domain name pointer one.one.one.one.
However other tools do not: -
$ dig PTR 1.1.1.1
; <<>> DiG 9.14.7 <<>> PTR 1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 40153
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: 4ed0f8ba650f2734 (echoed)
;; QUESTION SECTION:
;1.1.1.1. IN PTR
;; AUTHORITY SECTION:
. 773 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019121000 1800 900 604800 86400
;; Query time: 1 msec
To use dig
to check a PTR record, you need to supply the IP address in the following format: -
1.1.1.1 -> 1.1.1.1.in-addr.arpa
$ dig 1.1.1.1.in-addr-arpa
; <<>> DiG 9.14.7 <<>> PTR 1.1.1.1.in-addr.arpa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19361
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: 2d9b763cb41a84ed (echoed)
;; QUESTION SECTION:
;1.1.1.1.in-addr.arpa. IN PTR
;; ANSWER SECTION:
1.1.1.1.in-addr.arpa. 248 IN PTR one.one.one.one.
;; Query time: 1 msec
The same is true for IPv6 records, except the format is much longer: -
2606:4700:4700::1111 -> 1.1.1.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.7.4.0.0.7.4.6.0.6.2.ip6.arpa
$ dig PTR 1.1.1.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.7.4.0.0.7.4.6.0.6.2.ip6.arpa
; <<>> DiG 9.14.7 <<>> PTR 1.1.1.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.7.4.0.0.7.4.6.0.6.2.ip6.arpa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2723
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: 5762756121316ea0 (echoed)
;; QUESTION SECTION:
;1.1.1.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.7.4.0.0.7.4.6.0.6.2.ip6.arpa. IN PTR
;; ANSWER SECTION:
1.1.1.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.7.4.0.0.7.4.6.0.6.2.ip6.arpa. 165 IN PTR one.one.one.one.
;; Query time: 1 msec
What is an SRV record?
Rather than just being a mapping from a hostname to an IP (e.g. A or AAAA), or the reverse (PTR), an SRV record contains hostnames, ports and the protocols (TCP/UDP). Common usage of this include SIP and Active Directory Domain Controller discovery.
If you try to join a Windows Domain with just the domain name (e.g. example.com
), the SRV record is providing a list of Domain Controllers under a DNS SRV record for example.com
: -
dig SRV _ldap._tcp.dc._msdcs.example.com
; <<>> DiG 9.14.7 <<>> SRV _ldap._tcp.dc._msdcs.example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54352
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 6
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: 00d2c28406648fbf (echoed)
;; QUESTION SECTION:
;_ldap._tcp.dc._msdcs.example.com. IN SRV
;; ANSWER SECTION:
_ldap._tcp.dc._msdcs.example.com. 600 IN SRV 0 100 389 dc-01.example.com.
_ldap._tcp.dc._msdcs.example.com. 600 IN SRV 0 100 389 dc-02.example.com.
_ldap._tcp.dc._msdcs.example.com. 600 IN SRV 0 100 389 dc-03.example.com.
;; ADDITIONAL SECTION:
dc-01.example.com. 3600 IN A 192.168.20.1
dc-02.example.com. 3600 IN A 192.168.20.2
dc-03.example.com. 3600 IN A 192.168.20.3
It is worth noting that SRV records point to A/AAAA records (see the ADDITIONAL SECTION
), so they must be set up too.
What the above gives you is the protocol, the port and the hostname to reach Active Directory.
How can Prometheus use this?
To monitor a host, Prometheus requires the IP/hostname, port and protocol. This is exactly what an SRV record exposes, and so can be leveraged for service discovery. The exact implementation is documented here
Example: ETCD
ETCD (a distributed key/value store) can discover what other members are in the cluster using DNS SRV records (documentation (here)[https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/clustering.md]). Additionally, we can use the same SRV records to monitor the ETCD instances too.
An example of an ETCD SRV record is below: -
$ dig _etcd-client-ssl._tcp.staging.example.com
; <<>> DiG 9.14.7 <<>> SRV _etcd-client-ssl._tcp.staging.example.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20086
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 2
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: db9a9582f9740d45 (echoed)
;; QUESTION SECTION:
;_etcd-client-ssl._tcp.staging.example.com. IN SRV
;; ANSWER SECTION:
_etcd-client-ssl._tcp.staging.example.com. 204 IN SRV 10 50 2379 etcd-10-11-99-42.staging.example.com.
_etcd-client-ssl._tcp.staging.example.com. 204 IN SRV 10 50 2379 etcd-10-11-160-216.staging.example.com.
_etcd-client-ssl._tcp.staging.example.com. 204 IN SRV 10 50 2379 etcd-10-11-164-63.staging.example.com.
_etcd-client-ssl._tcp.staging.example.com. 204 IN SRV 10 50 2379 etcd-10-11-46-92.staging.example.com.
_etcd-client-ssl._tcp.staging.example.com. 204 IN SRV 10 50 2379 etcd-10-11-97-104.staging.example.com.
;; ADDITIONAL SECTION:
etcd-10-11-99-42.staging.example.com. 4 IN A 10.11.99.42
;; Query time: 26 msec
To make use of this within Prometheus, you need to format the scrape configuration like so: -
- job_name: 'etcd-scrape'
scheme: https
dns_sd_configs:
- names:
- '_etcd-client-ssl._tcp.staging.example.com.'
tls_config:
ca_file: etcd-certs/ca.pem
cert_file: etcd-certs/client.pem
key_file: etcd-certs/client-key.pem
The only part you need for DNS discovery is the dns_sd_configs
section. The rest are to allow you to speak HTTPS to the ETCD API. These will then appear as targets in Prometheus.
How to update the SRV record?
It all depends on your use case. In some cases, this may be a manual process. It can also be done by the systems themselves (Active Directory being a good example). Alternatively, use whatever automation method you feel is appropriate.
For example, I built a small Golang utility for ETCD that will scrape AWS tags, and for those that have the correct etcd-cluster
tag, it will update the SRV record for that cluster. This has the advantage of all cluster nodes being able to run the utility, rather than reliant on one node to make the updates.
Summary
My personal preference for service discovery is definitely using Consul. However if you already have DNS records that are getting created (e.g. ETCD, Active Directory), or the additional complexity of Consul will not provide enough benefit, DNS service discovery could be the way to go.
devops monitoring prometheus service discovery
technical prometheus monitoring
1212 Words
2019-12-10 12:13 +0000