Ansible for Networking - Part 2: The Lab environment :: YetiOps — A view on the tech and open source world from a hairy human

This is the second part in my ongoing series on using Ansible for Networking, showing how to use Ansible to configure and manage equipment from multiple networking vendors.

You can view the other posts in the series below: -

In the “Start of the series” post, I mentioned that the lab would consist of: -

The KVM hypervisor running on Linux

A virtual machine, running CentOS 8, that will run: -

FRR - Acting as a route server

Syslog

Tacplus (for TACACS+ integration)

Two routers/virtual machines of each vendor, one running as an “edge” router, one running as an “internal” router

A control machine that Ansible will run from, over a management network to all machines

This post goes through the Hypervisor, setting up the CentOS 8 virtual machine, and the control machine.

The Hypervisor

The Hypervisor in this scenario is KVM, running on my Manjaro-based laptop. Rather than trying to run this on the KVM machines in my home network, using my laptop allowed me to make changes to the environment without impacting the services on the network in my house. The reason for Manjaro is simply that I like their i3wm implementation.

As KVM is baked into the Linux kernel, just about every distribution of Linux can support it.

Networking

For networking, I run three bridge interfaces: -

virbr0 - The default NAT bridge that is installed as part of KVM (which allows access out to the internet)
virbr1 - An isolated network (i.e. one which allows traffic between VMs) that serves as a management network (named network)
virbr2 - Another isolated network, that VLANs will be passed over between VMs (named vlan-bridge)

Rather than creating separate bridges for each separate network/subnet in use, I decided that a common bridge with VLANs tagged across it would be far easier to manage. Also, some of the virtual machine images are limited in how many “physical” (i.e. virtual NICs) interfaces they can support.

Anything else?

Other than the two extra network bridges, the KVM setup is largely default. I tend towards using virtio drivers where the virtual machine will support them (some networking OSs recommend the E1000 Intel NIC emulation instead), and all hard disk images are stored in /var/lib/libvirt/images (as per the default KVM setup).

Virtual Machine Configuration

All the virtual machines have at least two network interfaces. Each machine has an interface connected to the Management network, and also an interface connected to the VLAN bridge. VLANs are carried across the vlan-bridge using 802.1q-based VLAN tagging. To check that your kernel supports this, run modprobe 8021q. If no errors are returned, you can pass VLANs without an issue.

Using IDs for the lab

To make the lab easy to work with and troubleshoot, I am using an “ID” for each vendor. This ID will be used to form the VLANs, IP addressing and Autonomous System. This is different from the virtual machine IDs, which are generated by the host operating system.

This means that when I need to look at any issues in the lab (say, not seeing certain routes), I know which virtual machine to look at.

The ID system looks something like the below: -

Vendor	ID	Edge VLAN	Internal VLAN
Cisco IOS	01	101	201
Juniper JunOS	02	102	202
Arista EOS	03	103	203
etc	etc	etc	etc

Further to this, the IP addressing and Autonomous System numbers would be: -

Vendor: Cisco
ID: 01
IPv4 Subnet on VLAN101: 10.100.101.0/24
IPv4 Subnet on VLAN201: 10.100.201.0/24
IPv6 Subnet on VLAN101: 2001:db8:101::/64
IPv6 Subnet on VLAN201: 2001:db8:201::/64
IPv4 Loopback Addresses: 192.0.2.101/32 and 192.0.2.201/32
IPv6 Loopback Address: 2001:db8:901:beef::1/128 and 2001:db8:901:beef::2/128
BGP Autonomous System Number: AS65101

Each is explained a bit further below, but using this system does make verification and troubleshooting much easier.

VLAN scheme

The VLAN scheme is defined as follows: -

1ID (e.g. 101, 102, 103) - Connectivity from the edge router to the CentOS Virtual Machine
2ID (e.g. 201, 202, 203) - Connectivity between the edge router and the internal router

IP addressing scheme

IPv4 addressing is defined as follows: -

10.100.1$ID.0/24 (e.g. 10.100.101.0/24, 10.100.102.0/24) - Connectivity from the edge router to the CentOS Virtual Machine
10.100.2$ID.0/24 (e.g. 10.100.201.0/24, 10.100.202.0/24) - Connectivity between the edge router and the internal router
10.15.30.0/24 - The management network, each machine gets an IP in this network
192.0.2.0/24 - The loopback range, which will be used for Router IDs and iBGP connectivity

IPv6 addressing is defined as follows: -

2001:db8:1$ID::0/64 (e.g. 2001:db8:101::1/64, 2001:db8:101::10/64) - Connectivity from the edge router to the CentOS Virtual Machine
2001:db8:2$ID::0/64 (e.g. 2001:db8:201::1/64, 2001:db8:201::10/64) - Connectivity between the edge router and the internal router
2001:db8:9NN:beef::/64 - The loopback range, which will be used for Router IDs and iBGP connectivity

No management range has been assigned for IPv6 in this lab.

The CentOS 8 Virtual Machine

In my current role, nearly all of our Linux estate runs on Debian (apart from some Amazon EC2s that run Amazon Linux). Previously, most of my professional Linux experience has been with RHEL and/or CentOS.

Since starting my current role, CentOS 8 has been released. I decided to use this series to familiarise myself with the changes from CentOS 7 to CentOS 8.

The CentOS 8 Virtual Machine, which from now on will be referred to as netsvr-01 or netsvr, is (almost) entirely managed by Ansible. This includes the configuration for FRR (for routing), tac_plus (for TACACS+ integration), syslog-ng (for logging purposes), as well as managing firewalld and any package dependencies.

Install and user configuration

Installing the operating system was done manually, rather than using something like PXE, Vagrant, or Packer. I currently do not run a PXE server at home, and I have not used Vagrant or Packer with KVM previously. This is something I’ll look at in a future post, but for the purposes of this series, it doesn’t add any benefits.

The initial user setup (i.e. adding the Ansible user) was also not automated. This is because I wanted to avoid the cyclical dependency of needing a user with sufficient privileges that Ansible would use, to allow Ansible to create users. There are ways around this (potentially using something like cloud-init or other methods), but for now adding the user myself was sufficient.

Tooling

Rather than templating configuration files (and making liberal usage of the Ansible copy task), I decided to try and make use of the native CentOS 8 tooling where possible. This includes dnf for package management, firewalld for firewall management and NetworkManager for network interface management.

Many of these tools also have associated Ansible modules, with excellent documentation.

However in using this approach, I came across some interesting issues and caveats.

What caveats?

Ansible and NetworkManager

Ansible previously used the networkmanager-glib library for interacting with NetworkManager. However this library has been deprecated, and is not included in CentOS 8. Instead, the recommended library is networkmanager-libnm.

As of writing this post, Ansible (v2.9.5) will not interact with NetworkManager unless networkmanager-glib is installed. This dependency issue (and compatibility for networkmanager-libnm) is due to be fixed, and has been merged into the Ansible master branch, but it is currently scheduled for version 2.10.

For now, all network additions and changes on the netsvr machine will be done manually using nmcli. This avoids spending time creating network configuration templates (in Jinja2) that will not be required soon anyway.

In the meantime, I have commented out the NetworkManager-specific sections of my playbooks, and will re-enable them when the support is available.

FRR packages and dependencies

The latest RPM packages for FRR (at the time of writing) have dependencies on libraries that are not present in CentOS 8. This is not entirely surprising, as the latest release was packaged for CentOS 7, rather than CentOS 8. As CentOS 7 is still the most common version of CentOS, and still supported, I expect this is a problem across many other applications too.

Where libraries and dependencies still exist in CentOS 8, I have been able to install CentOS 7 packages without issue (for example, with tac_plus). However with FRR I am relying on the version that is in the CentOS 8 repositories, which is a couple of versions behind the current one.

This is not a major issue, as all the features I require are in this version. If I really do need them, FRR do provide a guide for compiling FRR on CentOS 8.

I believe as time goes on, CentOS 8 (or at least RHEL8-based systems) will become the “standard” version to target (for all CentOS/RHEL-based RPMs/releases), and problems like this will go away.

Anything else?

For the most part, I have not found any other issues in utilising CentOS 8, rather than say Debian Buster (my usual choice) or CentOS 7 (the version with “better” application support currently). For example, dnf is very similar to yum in everyday usage, so managing packages with Ansible doesn’t require big changes conceptually.

FRR Configuration

FRR, or Free Range Routing, is a notable fork of Quagga that provides a number of routing protocols (and other useful network protocols, like VRRP and LDP) on Linux. It also has the vtysh shell package, which allows you to configure, verify and monitor using very Cisco-like syntax. It can be used to turn just about any Linux device into a router, or used to allow a server to use dynamic routing.

In this lab, FRR is configured as a route server, and will be set up to allow peering with all the “edge” routers from each vendor.

Now please refer to the above where I said: -

Rather than templating configuration files (and making liberal usage of the Ansible copy task), I decided to try and make use of the native […] tooling where possible

So how am I managing the FRR configuration? With templated configuration files and making use of the template (i.e. the copy task, but with templated variables) task…

Why?!

The Ansible module for configuring BGP in FRR (documentation here) covers most common use cases. If you’re setting up standard BGP (IPv4 or IPv6) peering, route reflectors, and all the usual configuration options (e.g. route-map, prefix-list etc), then all of these use cases are covered. Currently though, it does not support dynamic BGP neighbours.

Traditionally, BGP requires that you configure your peers statically, with the IP address of the specific neighbour (e.g. neighbor 192.168.1.1 remote-as 65001). You can use techniques like route reflection or confederation to reduce the amount of configuration required, but it still requires a known (and therefore static) set of peers to configure.

Recently, a number of vendors have added a feature called dynamic BGP peering. This means that one side can listen for peers, and those that meet certain requirements can form a BGP peering session with it.

Dynamic BGP peering originated because of the recent trend for using BGP in the data centre. Allowing peers to dynamically form means less static configuration. It also allows common configuration across multiple devices, as opposed to different peering configuration based upon where it is installed in the network. Devices can be pre-provisioned with the same configuration, and added to the network with ease, regardless of where they are physically connected.

To configure dynamic peers, you configure either a “prefix” (i.e. a subnet/range of IP addresses that peers could be coming from), an interface (e.g. eth0), or both, as part of a BGP peer group (essentially a set of configuration parameters common across peers).

If a device attempts to peer with the “listening” BGP process and comes from either the “prefix” or “interface” specified, then as long as they meet the other configuration parameters, a BGP peering session will be formed.

Admittedly in this lab, only the “edge” router from each vendor will peer with the netsvr machine. FRR will only ever see one device from the prefix range attempt to peer with it. However it does make it easier to add a secondary device (say, to test failover), as the FRR configuration would not require any changes.

Ansible Role

I am using Ansible Roles to configure FRR, with a directory structure as follows: -

$ tree frr
frr
├── defaults
│   └── main.yml
├── files
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── README.md
├── tasks
│   └── main.yml
├── templates
│   └── bgpd.conf.j2
├── tests
│   ├── inventory
│   └── test.yml
└── vars
    └── main.yml

To create this Ansible role, I used ansible-galaxy init frr. This automatically creates the directory structure, as well as all the YAML files and test inventory file.

Tasks

The tasks/main.yml looks like the below: -

## tasks file for frr
##
- name: NetworkManager libnm
  dnf:
    name: NetworkManager-libnm
    state: present

#########################################
## Commenting out until NMCLI is fixed  #
#########################################
##- name: Create loopback interface
##  nmcli:
##    type: bridge
##    autoconnect: yes
##    conn_name: bridge-loopback
##    ifname: bridge-lo0
##    ip4: "{{ loopback.ip4 }}"
##    state: present

- name: Install FRR 
  dnf:
    name: 
    - frr
    state: present

- name: Enable BGP
  lineinfile:
    path: /etc/frr/daemons
    regexp: 'bgpd=no'
    line: 'bgpd=yes'
  register: frr_bgp_daemon

- name: Enable zebra
  lineinfile:
    path: /etc/frr/daemons
    regexp: 'zebra=no'
    line: 'zebra=yes'
  register: frr_zebra_daemon

- name: BGP Config
  template:
    src: bgpd.conf.j2
    dest: /etc/frr/bgpd.conf
    owner: frr
    group: frr
  register: frr_bgp_config

- name: Allow BGP through FirewallD
  firewalld:
    port: 179/tcp
    permanent: yes
    state: enabled
    zone: public

- name: Run FRR
  service:
    name: frr
    state: restarted
    enabled: true
  when: frr_bgp_daemon.changed or frr_zebra_daemon.changed or frr_bgp_config.changed

As noted previously, until the Ansible NetworkManager module works correctly, it is commented out. To summarize what is done here: -

Install networkmanager-libnm and frr (using dnf)
Update /etc/frr/daemons to enable BGP and the Zebra daemons
Generate the /etc/frr/bgpd.conf file for BGP configuration
Allow BGP through the firewall (using firewalld)
Restart FRR, if the configuration of either /etc/frr/daemons or /etc/frr/bgpd.conf has changed
- In a production scenario, reloading would be preferable, but restarting on changes is fine in a lab

The register option creates a variable, that is updated with the status of the task. If the task has a status of changed (i.e. the configuration files have been updated), then $variable.changed (e.g. frr_bgp_daemon.changed) evaluates to True.

Template

The template that is used to generate the bgpd.conf configuration file looks like the below: -

frr version 7.0
frr defaults traditional
!
hostname netsvr-01
!
!
!
router bgp {{ frr['asn'] }}
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
{% for group in frr['bgp'] %}
 neighbor {{ group }} peer-group
 neighbor {{ group }} remote-as {{ frr['bgp'][group]['asn'] }}
{% if 'ipv4' in frr['bgp'][group]['listen_range'] %}
 bgp listen range {{ frr['bgp'][group]['listen_range']['ipv4'] }} peer-group {{ group }}
{% endif %}
{% if 'ipv6' in frr['bgp'][group]['listen_range'] %}
 bgp listen range {{ frr['bgp'][group]['listen_range']['ipv6'] }} peer-group {{ group }}
{% endif %}
{% endfor %}
 address-family ipv4 unicast
{% for group in frr['bgp'] %}
{% if 'ipv4' in frr['bgp'][group]['address_family'] %}
{% if 'unicast' in frr['bgp'][group]['address_family']['ipv4']['safi'] %}
  neighbor {{ group }} activate
{% if 'networks' in frr['bgp'][group]['address_family']['ipv4'] %}
{% for network in frr['bgp'][group]['address_family']['ipv4']['networks'] %}
  network {{ network }}
{% endfor %}
{% endif %}
{% endif %}
{% endif %}
 address-family ipv6 unicast
{% if 'ipv6' in frr['bgp'][group]['address_family'] %}
{% if 'unicast' in frr['bgp'][group]['address_family']['ipv6']['safi'] %}
  neighbor {{ group }} activate
{% if 'networks' in frr['bgp'][group]['address_family']['ipv6'] %}
{% for network in frr['bgp'][group]['address_family']['ipv6']['networks'] %}
  network {{ network }}
{% endfor %}
{% endif %}
{% endif %}
{% endif %}
{% endfor %}
!
!
line vty
!

For those who haven’t used Jinja2 before (or Python, which Jinja2 shares some common syntax with) can look a bit opaque, so to summarise each section: -

router bgp {{ frr['asn'] }}
  bgp log-neighbor-changes
  no bgp default ipv4-unicast

Start BGP, using the Autonomous System number provided by the frr['asn'] variable
Log any changes in neighbour states (e.g. neighbour up, neighbour down)
For any neighbours configured, do not automatically enable IPv4 BGP peering
- You can activate it on a per peer or group basis instead

The reason for using bgp default ipv4-unicast is useful when you run different address families (e.g. l2vpn or evpn), and stops FRR automatically configuring a standard IPv4 BGP session to every peer (or peer-group) defined.

{% for group in frr['bgp'] %}
 neighbor {{ group }} peer-group
 neighbor {{ group }} remote-as {{ frr['bgp'][group]['asn'] }}

For all groups specified in the frr['bgp'] variable, create: -

The peer-group, named group (which in this lab would be cisco or juniper or mikrotik for example)
Define the remote-as (i.e. the peer’s autonomous system) for the group
- This is derived from the frr['bgp'][$THIS-SPECIFIC-GROUP]['asn'] variable (each group will have a different ASN)

{% if 'ipv4' in frr['bgp'][group]['listen_range'] %}
 bgp listen range {{ frr['bgp'][group]['listen_range']['ipv4'] }} peer-group {{ group }}
{% endif %}

If there is an IPv4 section in the group, create a dynamic listening range
- The listening range will be an IPv4 prefix/subnet

{% if 'ipv6' in frr['bgp'][group]['listen_range'] %}
 bgp listen range {{ frr['bgp'][group]['listen_range']['ipv6'] }} peer-group {{ group }}
{% endif %}

As per the above, but for IPv6 (a listening range, but using an IPv6 prefix)

 address-family ipv4 unicast

Enable the IPv4 unicast address family (i.e. standard IPv4 BGP peering)

{% for group in frr['bgp'] %}
{% if 'ipv4' in frr['bgp'][group]['address_family'] %}
{% if 'unicast' in frr['bgp'][group]['address_family']['ipv4']['safi'] %}
  neighbor {{ group }} activate

There are three nested levels here (i.e. an if statement, inside an if statement, inside a for loop)

For loop - for all groups in the frr['bgp'] variable, then…
First if statement - If IPv4 is fined as part of the the group’s address_family variable, then…
Second if statement - If unicast exists in the safi variable then…
Activate the peer group

{% if 'networks' in frr['bgp'][group]['address_family']['ipv4'] %}
{% for network in frr['bgp'][group]['address_family']['ipv4']['networks'] %}
  network {{ network }}
{% endfor %}
{% endif %}
{% endif %}
{% endif %}

More nested if statements! The above will only evaluate if the IPv4 unicast peer group is set to be activated, as otherwise any associated networks would never be advertised out.

If the above is evaluated as true, then…
For all networks listed in the networks variable, create a network statement (i.e. advertising a subnet)

 address-family ipv6 unicast
{% if 'ipv6' in frr['bgp'][group]['address_family'] %}
{% if 'unicast' in frr['bgp'][group]['address_family']['ipv6']['safi'] %}
  neighbor {{ group }} activate
{% if 'networks' in frr['bgp'][group]['address_family']['ipv6'] %}
{% for network in frr['bgp'][group]['address_family']['ipv6']['networks'] %}
  network {{ network }}
{% endfor %}
{% endif %}
{% endif %}
{% endif %}
{% endfor %}

This does the same for IPv6 as the previous statements did for IPv4.

If you are not familiar with Jinja2 syntax, this may look daunting. I would recommend looking at resources like this and here to start with, and then soon all of the above will start to make sense.

Variables

I referenced the use of multiple variables in the template above, but where do these variables come from? In this case, I am using Ansible host_vars, which are host specific variables. They can be fined in an INI-style format, or YAML. I prefer YAML for this, as while you have to be careful with spaces and indentation, they are grouped together in a way which makes sense to me.

The variables I have used for the FRR configuration are as follows: -

frr:
  asn: 65430
  bgp:
    cisco:
      asn: 64101
      listen_range:
        ipv4: 10.100.101.0/24
        ipv6: "2001:db8:101::0/64"
      address_family:
        ipv4:
          safi: unicast
          networks:
          - 192.0.2.1/32
        ipv6:
          safi: unicast
          networks:
          - "2001:db8:999:beef::1/128"

In the template above, each section of the variable (i.e. the squared-brackets) refers to the next “level” down in the YAML variables defined above.

For example, frr['bgp'][group]['address_family']['ipv4']['networks'], would refer to: -

frr:
  bgp:
    cisco:
      address_family:
        ipv4:
          networks:
          - 192.0.2.1/32

The reason for group not having single quotation marks is because it derived a for loop, rather than being a hardcoded string. This allows you to loop through each group, rather than having to add sections of the template that are specific to each vendor/group.

When the BGP configuration template is generated, using the variables provided above, the output looks like so: -

router bgp 65430
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
 neighbor cisco peer-group
 neighbor cisco remote-as 64101
 bgp listen range 10.100.101.0/24 peer-group cisco
 bgp listen range 2001:db8:101::0/64 peer-group cisco
 address-family ipv4 unicast
  neighbor cisco activate
  network 192.0.2.1/32
 address-family ipv6 unicast
  neighbor cisco activate
  network 2001:db8:999:beef::1/128

The indentation could be tidied up, but the above is a fully functioning FRR BGP configuration, as seen below: -

netsvr-01# show running-config 
Building configuration...

Current configuration:
!
frr version 7.0
frr defaults traditional
hostname netsvr-01
no ip forwarding
no ipv6 forwarding
!
router bgp 65430
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor cisco peer-group
 neighbor cisco remote-as 64101
 bgp listen range 10.100.101.0/24 peer-group cisco
 bgp listen range 2001:db8:101::/64 peer-group cisco
 !
 address-family ipv4 unicast
  network 192.0.2.1/32
  neighbor cisco activate
 exit-address-family
 !
 address-family ipv6 unicast
  network 2001:db8:999:beef::1/128
  neighbor cisco activate
 exit-address-family
!
line vty

netsvr-01# show bgp ipv4 summary
IPv4 Unicast Summary:
BGP router identifier 192.168.122.81, local AS number 65430 vrf-id 0
BGP table version 1
RIB entries 1, using 160 bytes of memory
Peers 1, using 21 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
*10.100.101.253 4      65101       4       4        0    0    0 00:01:03            0

Total number of neighbors 1
* - dynamic neighbor
1 dynamic neighbor(s), limit 100

syslog-ng Configuration

syslog-ng is a syslog daemon, that in this scenario will be used for storing logs from each network device. This means you can look at logs from across the network in one place, rather than retrieving them from each device manually.

Ansible role

The role is created with ansible-galaxy init syslog. The directory structure is as follows: -

$ tree syslog
syslog
├── defaults
│   └── main.yml
├── files
│   └── syslog-remote.conf
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── README.md
├── tasks
│   └── main.yml
├── templates
├── tests
│   ├── inventory
│   └── test.yml
└── vars
    └── main.yml

Tasks

The tasks/main.yml file looks like the below: -

## tasks file for syslog
- name: Remove rsyslog
  dnf:
    name:
      - rsyslog
    state: absent

- name: Install syslog-ng
  dnf:
    name:
      - syslog-ng
    state: present

- name: Remote Syslog
  copy:
    src: syslog-remote.conf
    dest: /etc/syslog-ng/conf.d/syslog-remote.conf
  register: syslog_conf

- name: Remote Syslog directory
  file:
    state: directory
    path: /var/log/remote
    owner: root
    group: root
    mode: 0755

- name: Reload syslog-ng
  service:
    name: syslog-ng
    state: restarted
    enabled: yes
  when: syslog_conf.changed

- name: Allow syslog through FirewallD
  firewalld:
    service: syslog
    permanent: yes
    state: enabled
    zone: public

Steps: -

Remove rsyslog with dnf (as it conflicts with syslog-ng)
Install syslog-ng with dnf
Add the syslog-remote.conf file
Add the /var/log/remote directory to store logs from the network devices
Reload syslog-ng, only if the configuration has changed
Allow syslog-ng through the firewall with firewalld

We’re not using any templating or providing any extra variables, because the configuration required is static.

Syslog-ng remote configuration

The configuration required to enable remote logging within syslog-ng looks like the below: -

source net { udp(); };
destination remote { file("/var/log/remote/${FULLHOST}" template("${ISODATE} ${HOST}: ${MSGHDR}${MESSAGE}\n") ); };
log { source(net); destination(remote); };

The files will get created as $HOSTNAME or $IP in /var/log/remote, in the format of ISODATE HOSTNAME: %SYSLOG-PROGRAM Syslog message.

An example of the output can be seen below: -

$ pwd
/var/log/remote

$ ls
10.100.101.253

$ cat 10.100.101.253
2020-02-23T14:23:21-05:00 10.100.101.253: %CRYPTO-6-ISAKMP_ON_OFF: ISAKMP is OFF
2020-02-23T14:23:21-05:00 10.100.101.253: %CRYPTO-6-GDOI_ON_OFF: GDOI is OFF
2020-02-23T14:23:21-05:00 10.100.101.253: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host 10.100.101.254 port 514 started - CLI initiated
2020-02-23T14:23:37-05:00 10.100.101.253: %BGP-5-ADJCHANGE: neighbor 10.100.101.254 Up

You could then find out if for example, multiple BGP peers had dropped by doing grep -i bgp /var/log/remote/* | grep -i down. This would return all the files (which are named based upon the devices) that contain BGP drops.

With tools like the Elastic stack, Graylog or Splunk, its now possible to index logs (making them quicker to search based upon the type of queries used), create dashboards and alerts based upon them, and much more. Still, running syslog-ng (or other syslog daemons) can still help you gather huge insights into where you may be having issues in your network.

tac_plus Configuration

tac_plus is a daemon that can be used for TACACS+-based authentication and authorization (as an alternative to RADIUS). This allows you to manage your users centrally on a server (such as this one) so that you can login to any device in the network with your username and password. It can also assign privileges to the user, based upon “privilege” levels.

The “privilege” levels are configured in groups, which users can added to.

Ansible role

The role is created with ansible-galaxy init syslog. The directory structure is as follows: -

$ tree tacplus
tacplus
├── defaults
│   └── main.yml
├── files
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── README.md
├── tasks
│   └── main.yml
├── templates
│   └── tac_plus.conf.j2
├── tests
│   ├── inventory
│   └── test.yml
└── vars
    └── main.yml

Tasks

The tasks/main.yml file looks like the below: -

## tasks file for tacplus
- name: Nux Repo
  yum_repository:
    name: nux-misc
    description: nux-misc
    baseurl: http://li.nux.ro/download/nux/misc/el7/x86_64/
    enabled: 0
    gpgcheck: 1
    gpgkey: http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro

- name: Install tcp-wrappers (not in CentOS 8)
  dnf:
    name: 'http://mirror.centos.org/centos/7/os/x86_64/Packages/tcp_wrappers-libs-7.6-77.el7.x86_64.rpm'
    state: present

- name: Install tac_plus
  dnf:
    name: tac_plus
    enablerepo: nux-misc
    state: present

- name: Generate configuration
  template:
    src: tac_plus.conf.j2
    dest: /etc/tac_plus.conf
  register: tac_conf

- name: Restart tac_plus
  service:
    name: tac_plus
    state: restart
    enabled: yes
  when: tac_conf.changed

- name: Allow tacacs through FirewallD
  firewalld:
    port: 49/tcp
    permanent: yes
    state: enabled
    zone: public

Steps: -

Add the nux-misc repository to yum (which dnf makes use of)
- Disabled by default, only used when it is specifically called for
Install tcp-wrappers (deprecated in CentOS 8), a tac_plus dependency, directly from an RPM file
Install tac_plus using dnf, enabling the nux-misc repository to do so
Generate the tac_plus configuration
Restart tac_plus, if the configuration is changed
Allow the tacacs port through the firewall, using firewalld

Configuration template

The tac_plus.conf.j2 configuration template looks like the below: -

## Created by Henry-Nicolas Tourneur([email protected])
## See man(5) tac_plus.conf for more details

## Define where to log accounting data, this is the default.

accounting file = /var/log/tac_plus.acct

## This is the key that clients have to use to access Tacacs+

key = {{ tacacs_key }} 

group = netwrite {
    default service = permit
    service = exec {
        priv-lvl = 15
    }
}

{% for user in netusers %}
user = {{ user }} {
    member = netwrite
    login = des {{ netusers[user]['tacpwd'] }}
}
{% endfor %}

Compared to the FRR bgpd.conf.j2 file, there are far fewer parts to generate. We supply the tacacs_key, which is used to encrypt the messages between the network device and the tac_plus server. We also supply a list of users, with passwords, to generate this file.

The netwrite group has priv-lvl 15, which is analogous to full admin access on each network device. It is possible to create groups with read-only permissions, or with additional permissions. You could create a group so that certain write commands are allowed (for example, reloading a BGP neighbour or clearing statistics on an interface), but all other commands are restricted.

The actual list of users is defined in host_vars.

Host Variables

The host_vars specific to tac_plus are: -

tacacs_key: supersecret
netusers:
  yetiops:
    tacpwd: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          66336131323637326166316232623161663630373739613137366266633937306662323363333039
          xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxREDACTEDxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
          xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxREDACTEDxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
          xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxREDACTEDxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
          6165
  davethechicken:
    tacpwd: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          19784782343345848123148123094812389452340958230495809234846666642381109434123412
          xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxREDACTEDxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
          xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxREDACTEDxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
          xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxREDACTEDxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
          6162

The passwords that tac_plus expects are DES-hashed (yes, not even 3DES!). The easiest way to generate them is by using tac_pwd: -

$ tac_pwd 
Password to be encrypted: test
CCVwN31H4K74A

The user passwords are then encrypted in this file using Ansible Vault, which allows you to store sensitive data in version control, as they require an encryption key to unlock.

To encrypt a string, you would use the following: -

$ ansible-vault encrypt_string 's3cr3tp8ss' --name 'pass'
New Vault password: 
Confirm New Vault password: 
pass: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          64373235663534646635306363626365376537343137393136623863626332303235386264393237
          3435313266336633633430646462393138353331633734340a356265336366663030313338393965
          31643738383461616465626435376265333739663031366636353865373938663236653262396366
          3833346263653436380a333936633363303038646333613832313564316566313534373537396433
          3366
Encryption successful

You can then copy and paste this secret into your host_vars.

You can store your encryption keys in local files (and reference them with ansible-playbook --vault-password-file /path/to/vault-keys), so that Ansible does not need to ask for them when you run your playbooks.

Alternatively, you can make Ansible ask you for the encryption key, meaning you can then store the encryption key in whatever password management system you choose. To do this, see below: -

$ ansible-playbook centos.yaml --ask-vault-pass
Vault password: 

PLAY [centos] *************************************************

TASK [Gathering Facts] ****************************************
ok: [10.15.30.252]

[...]

Without this, your Playbook run will fail, as it will not be able to decrypt your keys.

Generated configuration file

The configuration file for tac_plus when generated from the template looks like the below: -

## Created by Henry-Nicolas Tourneur([email protected])
## See man(5) tac_plus.conf for more details

## Define where to log accounting data, this is the default.

accounting file = /var/log/tac_plus.acct

## This is the key that clients have to use to access Tacacs+

key = supersecret 

group = netwrite {
    default service = permit
    service = exec {
        priv-lvl = 15
    }
}

user = yetiops {
    member = netwrite
    login = des ###REDACTED### 
}

user = davethechicken {
    member = netwrite
    login = des ###REDACTED###
}

With this, you can then configure TACACS+-based authentication on your network, and then login to your network devices with the users defined in this file.

Top-level playbook

The playbook that includes all of the roles, as well as defining what hosts it will run on, is in the directory level below the roles: -

$ tree -L 1
.
├── ansible.cfg
├── ansible.log
├── centos.yaml
├── epel           <- Role directory
├── frr            <- Role directory
├── host_vars
├── inventory
├── README.md
├── syslog         <- Role directory
└── tacplus        <- Role directory
5 directories, 5 files

The contents of the centos.yaml playbook are: -

- hosts: centos
  become: true
  become_method: sudo
  tasks:
  - import_role:
      name: epel
  - import_role:
      name: frr
  - import_role:
      name: syslog
  - import_role:
      name: tacplus

There is an additional role here (epel), but all this does is install the epel release package (Extra Packages for Enterprise Linux).

When you run this playbook, each role will be imported and ran in order (so epel, frr, syslog, then tacplus). It will also, by default, pick up the host_vars/$IP_ADDRESS.yaml file for host-specific variables (ensure that $IP_ADDRESS is replaced with the IP or hostname you have defined in your Ansible inventory).

Other files

I also have a few settings in the ansible.cfg file: -

[defaults]
inventory      = ./inventory
timeout = 5
log_path = ./ansible.log

The above specifies my inventory file as ./inventory, adds a timeout (more useful for network devices, but I’m keeping it for consistency) and also creates a log file of every Playbook run. This makes it easier to debug, or go back and look at where changes were made that potentially broke the playbook runs.

Ansible control machine

As Ansible can run from just about anywhere, the choice of how you invoke your playbooks is down to personal preference.

In a production scenario, you would usually have either a machine (or machines) that have access to your devices, and run your playbooks from there. This means that a team of people can make changes and run them from the same place (rather than playbooks going out of sync on people’s workstations). Alternatively, you can run something like Ansible Tower or AWX (the upstream project that Ansible Tower builds upon) to manage your infrastructure.

In this scenario, as it is a lab environment, I am running all of the playbooks from the same laptop that is running the lab. I use passwordless SSH where it is supported (not every network vendor does support this), and I maintain all my playbooks in a Git repository (that I will make public during the series).

Summary

My approach to setting up the lab environment has been to make use of the native tools available (either on my laptop, or the virtual machines themselves), while also trying to keep things as simple as possible. Thanks to taking this approach, and because it is all managed using Ansible Playbooks, I can easily recreate this setup on other machines.

The next few posts in the series will get into managing actual network devices themselves. I also have a few bonus posts to make during this series, thanks to the generosity and help from the readers of this site.

Hopefully this will help get you up and running with your own lab, so you can test these kind of scenarios yourself!