OS Tuning for high performance Market Data trading on physical host (2023)
Background
operating system: Centos 7.9
hardware: Dell PowerEdge R730, 24 cores (Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz), 250GB RAM
kernel bypass: OpenOnload on Solarflare NIC model SFN8522-R2 8000 Series 10G Adapter
This article shows possible tuning options that can be applied to Redhat-family server for increasing network performance, reducing latency and system jitter and making Market Trading applications more performant.
resources used:
BIOS
disable Hyperthreading in BIOS (also called Logical Processor)
HT induces cache misses and degrades performance, useful on desktops, not on production servers
check Hyper Threading state using Dell RACADM
racadm get BIOS.ProcSettings.LogicalProccheck using kernel (disabled = 0)
cat /sys/devices/system/cpu/smt/activedisable Hardware Pretecher
detect patterns in memory access, and loads it into L2 cache, good for simple loops over arrays (contingous memory blocks), disabling reduces cache misses on random access
racadm get BIOS.ProcSettings.ProcHwPrefetcherCPU
TUNED Performance Profile - set performance profile to Network Latency (increases power usage, sets Swapiness to 10)
systemctl start tunedtuned-adm profile network-latencycheck which profile is currently running
tuned-adm activecheck what profiles are available
tuned-adm listAvailable profiles:- balanced - General non-specialized tuned profile- cpu-partitioning - Optimize for CPU partitioning- desktop - Optimize for the desktop use-case- hpc-compute - Optimize for HPC compute workloads- latency-performance - Optimize for deterministic performance at the cost of increased power consumption- network-latency - Optimize for deterministic performance at the cost of increased power consumption, focused on low latency network performance- network-throughput - Optimize for streaming network throughput, generally only necessary on older CPUs or 40G+ networks- powersave - Optimize for low power consumption- throughput-performance - Broadly applicable tuning that provides excellent performance across a variety of common server workloads- virtual-guest - Optimize for running inside a virtual guest- virtual-host - Optimize for running KVM guestsCurrent active profile: network-latency
Diagnostic Tools
perf
iostat
turbostat
irqbalance
numastat
check numa_hit and numa_miss values for each node
systemtap
tuna
show IRQs
tuna --show_irqs# users affinity
68 0000:00:11.4 0x555555
69 p1p1-0 0x555555 sfc
70 p1p1-1 0x555555 sfc
71 p1p1-2 0x555555 sfc
72 p1p1-3 0x555555 sfc
73 p1p1-4 0x555555 sfc
74 p1p1-5 0x555555 sfc
75 p1p1-6 0x555555 sfc
76 p1p1-7 0x555555 sfc
77 p1p1-8 0x555555 sfc
78 p1p1-9 0x555555 sfc
79 p1p1-10 0x555555 sfc
80 p1p1-11 0x555555 sfc
81 p1p1-12 0x555555 sfc
82 p1p1-13 0x555555 sfc
83 p1p1-14 0x555555 sfc
84 p1p1-15 0x555555 sfc
85 p1p1-16 0x555555 sfc
86 p1p1-17 0x555555 sfc
87 p1p1-18 0x555555 sfc
88 p1p1-19 0x555555 sfc
89 p1p1-20 0x555555 sfc
90 p1p1-21 0x555555 sfc
91 p1p1-22 0x555555 sfc
92 p1p1-23 0x555555 sfc