OS Tuning for high performance Market Data trading on physical host (2023)

Background

operating system: Centos 7.9

hardware: Dell PowerEdge R730, 24 cores (Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz), 250GB RAM

kernel bypass: OpenOnload on Solarflare NIC model SFN8522-R2 8000 Series 10G Adapter


This article shows possible tuning options that can be applied to Redhat-family server for increasing network performance, reducing latency and system jitter and making Market Trading applications more performant.


resources used:


BIOS

disable Hyperthreading in BIOS (also called Logical Processor)

HT induces cache misses and degrades performance, useful on desktops, not on production servers

check Hyper Threading state using Dell RACADM

racadm get BIOS.ProcSettings.LogicalProc

check using kernel (disabled = 0)

cat /sys/devices/system/cpu/smt/active

disable Hardware Pretecher

detect patterns in memory access, and loads it into L2 cache, good for simple loops over arrays (contingous memory blocks), disabling reduces cache misses on random access

racadm get BIOS.ProcSettings.ProcHwPrefetcher

CPU


TUNED Performance Profile - set performance profile to Network Latency (increases power usage, sets Swapiness to 10)

systemctl start tunedtuned-adm profile network-latency

check which profile is currently running

tuned-adm active

check what profiles are available

tuned-adm list
Available profiles:
- balanced - General non-specialized tuned profile- cpu-partitioning - Optimize for CPU partitioning- desktop - Optimize for the desktop use-case- hpc-compute - Optimize for HPC compute workloads- latency-performance - Optimize for deterministic performance at the cost of increased power consumption- network-latency - Optimize for deterministic performance at the cost of increased power consumption, focused on low latency network performance- network-throughput - Optimize for streaming network throughput, generally only necessary on older CPUs or 40G+ networks- powersave - Optimize for low power consumption- throughput-performance - Broadly applicable tuning that provides excellent performance across a variety of common server workloads- virtual-guest - Optimize for running inside a virtual guest- virtual-host - Optimize for running KVM guestsCurrent active profile: network-latency

Diagnostic Tools

perf

iostat

turbostat

irqbalance

numastat

check numa_hit and numa_miss values for each node


systemtap


tuna

show IRQs

tuna --show_irqs

# users affinity

68 0000:00:11.4 0x555555

69 p1p1-0 0x555555 sfc

70 p1p1-1 0x555555 sfc

71 p1p1-2 0x555555 sfc

72 p1p1-3 0x555555 sfc

73 p1p1-4 0x555555 sfc

74 p1p1-5 0x555555 sfc

75 p1p1-6 0x555555 sfc

76 p1p1-7 0x555555 sfc

77 p1p1-8 0x555555 sfc

78 p1p1-9 0x555555 sfc

79 p1p1-10 0x555555 sfc

80 p1p1-11 0x555555 sfc

81 p1p1-12 0x555555 sfc

82 p1p1-13 0x555555 sfc

83 p1p1-14 0x555555 sfc

84 p1p1-15 0x555555 sfc

85 p1p1-16 0x555555 sfc

86 p1p1-17 0x555555 sfc

87 p1p1-18 0x555555 sfc

88 p1p1-19 0x555555 sfc

89 p1p1-20 0x555555 sfc

90 p1p1-21 0x555555 sfc

91 p1p1-22 0x555555 sfc

92 p1p1-23 0x555555 sfc




Solarflare




MEM