Switchdev on Fedora

Switchdev on the Mellanox platform

Switchdev is a linux project to directly support networking ASICs in a standard linux environment.

It provides in kernel support for the ASIC, so existing tools like ifconfig, ethtool, ip link just work, and are used to configure the forwarding hardware.

Mellanox provides a switchdev driver that they include into a fedora remix iso that can easily be installed.

We will use this platform to take a look what can be done in this new model.

As for several other examples we will une ONIE to install the switch OS.

It is important to mention here that besides the OS there also exists a switch firmware in the forwarding ASIC that must be the correct / current version.

If this is not the case Mellanox provides images and tools to update the firmware. As this is not a naive Linux functionality the tools ofr this must be provided by the hardware vendor.

The ONIE install is a bit more complex for this system, some additional steps are necessary:

1. We need to extract the ISO image onto the webroot of our installation server

2. The kickstart file needs to be in a directory named ks and point to the installation server

After preparing this, and starting the installation it ran through without any problem, and we were able to log into the device.


=======================================

[ OK ] Reached target Network.

Starting OpenSSH server daemon...

[ OK ] Reached target Network is Online.

Starting Notify NFS peers of a restart...

Starting Permit User Sessions...

[ OK ] Started Notify NFS peers of a restart.

[ OK ] Started Permit User Sessions.

[ OK ] Started Command Scheduler.

[ OK ] Started Job spooling tools.

Starting Hold until boot process finishes up...

Starting Terminate Plymouth Boot Screen...

[ OK ] Started OpenSSH server daemon.

Generic release 25 (Generic)

Kernel 4.8.15-300.fc25.x86_64 on an x86_64 (ttyS0)


localhost login: root

Password:

Welcome to the Mellanox Development System.

Please refer to https://github.com/Mellanox/mlxsw/wiki for user manual.

This is a Fedora Remix. Unmodified Fedora is available at getfedora.org

[root@localhost ~]#

===========================

Running ifconfig only shows us the management ports at the moment. We need to update the kernel to have the correct version to work with our firmware. Firmware in this case refers to the microcode on the linecard / ASIC itself. The management ports are "standard" ethernet ports running with the standard linux drivers from the OS perspective. The dataplane ports are special ethernet interfaces that need the "advanced" switchdev driver to work. The switchdev driver needs to talk directly to the hardware accelerated ports using an API exposed by the linecard firmware code. This API is called by the kernel, which means kernel version and firmware need to be in sync for the dataplane ports to work. Here is the state after installing, we see only the management interfaces:

[root@localhost ~]# ifconfig

To get the updates, we need to configure the networking accordingly to reach the fedora rpm servers.

If not done automatically through dhcp, you need to set ip, gateway and DNS servers manually as you do on the Linux distribution used. The distribution provided for the Mellanox is based on Fedora.

To update the kernel (remember, switchdev is a kernel driver), we use dnf. Dnf is a newer replacement for yum which works much the same way:


[root@localhost ~]# dnf update kernel

after a reboot the new driver is active, and we can see all the interfaces now. If this would not be the case we would need to update the linecard firmware too. As this is not a standard Linux functionallity, we would need to use a switch verndor specific tool, and download the most current firmware from the vendor site. The whole process is described at the Mellanox switchdev WIKI in detail.

Here we see the output of ifconfig after things have been updated, and the dataplane interfaces are active:

[root@localhost ~]# ifconfig -a

We can now use standard Linux commands / tools to configure the 100G dataplane ports just as the standard management interfaces. First, let's look at an example how to get hardware related information like temperature or fan speeds:

Hardware monitoring:

[root@localhost hwmon1]# ls

device fan3_input fan6_input name subsystem temp1_reset_history

fan1_input fan4_input fan7_input power temp1_highest uevent

fan2_input fan5_input fan8_input pwm1 temp1_input

Find the RPM of a fan:

[root@localhost hwmon1]# cat fan1_input

12562

Find the current ASIC temperature:

[root@localhost hwmon1]# cat temp1_input

34000

Another way is using the lm_sensors package. As often in Linux, there are several different ways to access the same information:

[root@localhost hwmon1]# sensors

coretemp-isa-0000

Adapter: ISA adapter

Physical id 0: +34.0°C (high = +87.0°C, crit = +105.0°C)

Core 0: +26.0°C (high = +87.0°C, crit = +105.0°C)

Core 1: +34.0°C (high = +87.0°C, crit = +105.0°C)

acpitz-virtual-0

Adapter: Virtual device

temp1: +27.8°C (crit = +106.0°C)

temp2: +29.8°C (crit = +106.0°C)

mlxsw-pci-0300

Adapter: PCI adapter

fan1: 12798 RPM

fan2: 10861 RPM

fan3: 12448 RPM

fan4: 10861 RPM

fan5: 12562 RPM

fan6: 10861 RPM

fan7: 12798 RPM

fan8: 10861 RPM

temp1: +34.0°C (highest = +35.0°C)

We can use ethtool to get port state information, again ethtool works the same way as we are used to:

[root@localhost hwmon1]# ethtool -i eth0

driver: mlxsw_spectrum

version: 1.0

firmware-version: 13.1220.130

expansion-rom-version:

bus-info: 0000:03:00.0

supports-statistics: yes

supports-test: no

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: no

[root@localhost hwmon1]#

We can use ip link to change port states and get link information:

[root@localhost hwmon1]# ip link show dev eth0

4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000

link/ether 7c:fe:90:ee:45:81 brd ff:ff:ff:ff:ff:ff

[root@localhost hwmon1]#

Now we look at configuring some basic networking functionallity on the dataplane ports. In this example we use ip link to configure a L2 bridge:

[root@localhost hwmon1]# ip link add name br0 type bridge vlan_filtering 1

[root@localhost hwmon1]# ip link show br0

37: br0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

link/ether 5a:54:a5:32:5d:4a brd ff:ff:ff:ff:ff:ff

[root@localhost hwmon1]# ip link set dev eth0 master br0

[ 2714.793068] br0: port 1(eth0) entered blocking state

[ 2714.798180] br0: port 1(eth0) entered disabled state

[ 2714.804300] device eth0 entered promiscuous mode

[root@localhost hwmon1]# ip link set dev eth1 master br0

[ 2722.929924] br0: port 2(eth1) entered blocking state

[ 2722.935035] br0: port 2(eth1) entered disabled state

[ 2722.941283] device eth1 entered promiscuous mode

[root@localhost hwmon1]# ip link show br0

37: br0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

link/ether 7c:fe:90:ee:45:81 brd ff:ff:ff:ff:ff:ff

[root@localhost hwmon1]# bridge vlan show dev eth0

port vlan ids

eth0 1 PVID Egress Untagged

eth0 1 PVID Egress Untagged


These are just some examples to demonstrate the way we can now configure standard networking through linux tools. This as an approach that has a certain learning curve in how to utilize linux commands to configure networking behavior, but this is the same as learning a new CLI when switching to a new networking vendor. The nice thing is that the linux commands will work the same with different vendor devices supporting switchdev, so there is no other CLI to learn.

In addition to this, we are also able to install standard Linux software this way.

As an example, we will use this platform to install perfsonar tools directly on the device. You will find the information regarding the perfsonar installation in a later blog post.