Continuous Network Integration using Jenkins

Continuous Network Integration using Jenkins

Introduction

DevOps revolution brought many interesting tools to software engineers and one such tool is Jenkins. Jenkins is a Continuous Integration/Continuous Delivery build system. CI/CD basically means release often, release small i.e. push small changes often so that if things break, it is easy to rollback. While the software engineers have been using CI/CD for a while, not much has changed for network engineers; they are still making changes to network devices manually. Some change is required like OSPF cost on a set of devices, we make them manually or through a script.

In this article, I will demonstrate how Jenkins can be used to deploy changes to an entire Clos network in a datacenter.

Workflow

What does this workflow look like? There are a few things here -

1. The configs for entire Clos network is maintained in a single YAML file.

2. This YAML file is version controlled through a Source Control Management (SCM) system like Github, SVN, etc.

3. Every commit on the SCM system triggers Jenkins to clone the repo in the workspace and initiates a Build. Build in software world means compilation but in this case, it means generating configs.

4. Once configs are generated, we would want to run some tests aka pre-checks and ensure all checks/tests pass.

5. Deploy the configs to all devices.

6. Run post-checks

Here's something more visual.

This is quite high-level. I will go through each stage later. But first, let's see what our sample CLOS network looks like and let me mention few things about it.

I am following the Zen of Python here - Simple is better than Complex. But by changing a couple of attributes in YAML file, this can be made to scale to large number of devices in the CLOS network.

The CLOS network is built using OSPF (IGP to learn Loopbacks) and full-mesh IBGP between all devices. Within the CLOS network, port-allocation is always static. For example, interface Gi1/0 on a leaf device is always used to connect Spine1 device. Southbound interfaces will vary.

Configs in YAML file

As you can see the YAML file is extremely simple. It defines the necessary parameters to build the configs for this CLOS network. The "globals" section contains parameters that are common to all devices. Under the "pods" section, there is a single pod but multiple pods can exist, each with a set of leafs and spines.

All these parameters are rendered using the template and configs are generated for each device in this "site". The template is written in Jinja2 and rendered using a generate script. As I am using (virtual) Cisco devices, the template has Cisco IOS config.

Github

In this case, I am using Github as a repository to version control the YAML file. So, any changes committed to this repo are tracked by Jenkins. From Jenkins, I am polling for changes every minute (this is a configurable value) but you can use Webhooks to get Github to notify Jenkins when any changes to the repo happen.

Let's add a Top-of-rack switch (TOR1) which has uplinks to Leaf1 and Leaf2 switches to the CLOS network. This is how the changes looks in YAML file when committed to the repo.

Basically, I have specified it under BGP, the "name" of the neighbor, leaf nodes it connects to, interfaces on each leaf node, the LAG ID to used on leaf nodes and BGP ASN of the neighbor. This is enough information to build configs for leaf nodes for this neighbor.

Jenkins detects changes

As I mentioned earlier, Jenkins will poll Github for this repo to detect any commits or merges. I have configured this polling interval to be 1 min, but this can also be scheduled for any duration or even once a day at a specific time.

This is how the polling log looks in Jenkins UI. Notice I have highlighted the fact that Jenkins detected changes.

This has triggered a build, build#64 in this case for job Clos in Jenkins. First Jenkins will clone the remote repository into its workspace.

Generate configs

For this job, the first build step is to generate configs for entire CLOS network. I have added this as an "Execute shell" build step which will run the python script to generate configs. The console output in Jenkins looks like this.

This step allocates IP addresses and stores in MongoDB, then assigns IP addresses for Loopback, OOB and interconnect interfaces. If there are southbound neighbors, their IP addressing is done here too.

Pre-checks

Once configs are generated, few checks are made to ensure the network is healthy. For example, OSPF adjacencies are in FULL state, BGP neighbors are Up, config was generated successfully, etc. There could be many more checks that could be added but these are enough for now.

The console output looks like this in Jenkins. Jenkins also have a very nice report for these tests that I will show later. Notice how it shows the number of tests it ran in this build step.

Deploy Configs

This in itself is a multi-stage build step. In this build step, I am -

1. Before "replacing" the running-config, I drain the traffic from the device.

2. Getting a diff between running-config and the newly generated config. Since I am using IOS's config archive feature, I can run "show archive config differences <running-config> <generated-config>" command to get a contextual diff from the device.

Notice the changes I made in YAML file are reflected here in the diff. Essentially, LAG 1, interfaces 5 & 6 and BGP neighbor.

3. Replace running-config using "configure replace <generated-config> force revert trigger error" command. This will replace the running-config for which diff was seen. If it encounters an error, it will rollback automatically.

Not much to report here - the script from Jenkins just log into the device and executes the command.

4. Restore traffic back onto device.

5. Repeat steps 1-4 for all devices in the CLOS network.

Post-checks

Once all the devices are updated, I run a similar set of checks to ensure the network is in good state.

This will mark the build as SUCCESS if all the steps execute successfully.

Test report

Jenkins also provides a nice report of all the tests that were executed for this job. This provides a good view of how the job is executing and how the network state is.

Summary

So there you have it. A true network automation task using Jenkins. The only manual step involved was to update the YAML file. Everything else was automated. Ofcourse, I ignored important checks like to ensure device is taking traffic or not but that complexity is not required for testing.

Similar things done by other networkers:

1. http://networkop.github.io/blog/2016/02/19/network-ci-intro/

2. https://keepingitclassless.net/2015/01/continuous-integration-pipeline-network/

3. http://youtu.be/OWLTBYgPp0A?list=PLO8DR5ZGla8hhQXL_9_IRcw4HA9yu3JEC

All the code used for this demo are available here. If you have any comments, please send them to me at here.

Thanks for reading.