UDP packet loss troubleshooting

Analysis of UDP packet loss problem in Linux system

Last Update:2018-01-15 Source: Internet Author: User

Recent work encountered a server application UDP packet loss, in the process of reviewing a lot of information, summed up this article, for more people to refer to.

Before we get started, we'll use a graph to explain the process of receiving network messages from a Linux system.

In the process of receiving the UDP message, any process in the diagram may discard the message either actively or passively, so the packet loss may occur in the network card and driver, and it may occur in the system and application.

The reason for not analyzing the sending data flow is that the sending process is similar to receiving, only in the opposite direction, and the sending process message is less likely to be lost than received, only if the message rate sent by the application is greater than the kernel and the network card processing rate.

This article assumes that the machine has only one name for eth0 interface, if there are multiple interface or interface names are not eth0, please follow the actual situation to analyze.

Note: The text appears RX(receive) indicates the receiving message,TX(transmit) indicates the sending message.

Confirm that a UDP packet packet has occurred

To see if the network card has dropped packets, you can use the ethtool -S eth0 view, find in the outputbad or drop the corresponding field whether there is data, under normal circumstances, the number of these fields should be 0. If you see that the corresponding number is growing, it indicates that the NIC has dropped packets.

Another command to view the packet drop data is ifconfig that it has statistics on its outputRX(receive received messages) and TX(transmit):

~# ifconfig eth0

...

        RX packets 3553389376  bytes 2599862532475 (2.3 TiB)

        RX errors 0  dropped 1353  overruns 0  frame 0

        TX packets 3479495131  bytes 3205366800850 (2.9 TiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

In addition, the Linux system also provides packet drop information for each network protocol, which can be netstat -s viewed using commands, plus --udp, the ability to see only UDP-related packet data:

# netstat -s -u

IcmpMsg:

    InType0: 3

    InType3: 1719356

    InType8: 13

    InType11: 59

    OutType0: 13

    OutType3: 1737641

    OutType8: 10

    OutType11: 263

Udp: 517488890 packets received 2487375 packets to unknown port received. 47533568 packet receive errors 147264581 packets sent 12851135 receive buffer errors 0 send buffer errors

UdpLite:

IpExt:

    OutMcastPkts: 696

    InBcastPkts: 2373968

    InOctets: 4954097451540

    OutOctets: 5538322535160

    OutMcastOctets: 79632

    InBcastOctets: 934783053
    InNoECTPkts: 5584838675

For the above output, follow the information below to view the UDP packet loss scenario:

Note: The problem is not that the number of drops is not zero, for UDP, if a small number of drops is likely to be expected behavior, such as packet loss rate (packet loss/number of packets received) at one out of 10,000 or even lower.

Network card or driver packet loss

Previously, ifethtool -S eth0 

there is a problem with the rx_***_errors network card, causing the system to drop packets, you need to contact the server or network card provider for processing.

# ethtool -S eth0 | grep rx_ | grep errors

     rx_crc_errors: 0

     rx_missed_errors: 0

     rx_long_length_errors: 0

     rx_short_length_errors: 0

     rx_align_errors: 0

     rx_errors: 0

     rx_length_errors: 0

     rx_over_errors: 0

     rx_frame_errors: 0
    rx_fifo_errors: 0

netstat -i will also provide the connection of each network card and packet loss situation, normally the output error or drop should be 0.

If the hardware or driver is not a problem, the general network card drops because the set buffer (ring buffer) is too small, you can useethtoolthe command to view and set the network card ring buffer.

ethtool -g You can view the ring buffer for a network card, such as the following example

# ethtool -g eth0

Ring parameters for eth0:

Pre-set maximums:

RX: 4096

RX Mini: 0

RX Jumbo: 0

TX: 4096

Current hardware settings:

RX: 256

RX Mini: 0

RX Jumbo: 0

TX: 256

Pre-set represents the maximum ring buffer value for the NIC, which can be used toethtool -G eth0 rx 8192set its value.

Linux System Packet Loss

Linux system drops a lot of reasons, the common is: UDP message error, firewall, UDP buffer size is not enough, the system load is too high, the reasons for these drops are analyzed here.

UDP Message Error

If the UDP message is modified during transmission, it can result in a checksum error, or a length error, Linux will verify this when it receives a UDP message and discard the message once the error is invented.

If you want UDP message checksum to be sent to the application in a timely manner, you can disable the UDP checksum check by using the socket parameter:

int disable = 1;

setsockopt(sock_fd, SOL_SOCKET, SO_NO_CHECK, (void*)&disable, sizeof(disable)


Firewall

If the system firewall drops, the performance of the behavior is generally all UDP packets are not properly received, of course, do not rule out the firewall only drop a portion of the possibility of the message.

If you are experiencing a very large drop rate, check your firewall rules to ensure that the firewall does not actively drop UDP packets.

UDP buffer size is insufficient

After receiving the message, the Linux system will save the message to the buffer. Because the size of the buffer is limited, if a UDP message is too large (exceeding the buffer size or MTU size), the rate at which the message is received is too fast, it can cause Linux to drop packets directly because the cache is full.

At the system level, Linux sets the maximum value that can be configured for receive buffer, which can be viewed in the following file, typically Linux setting an initial value based on the memory size at startup.

However, these initial values are not intended to deal with high-traffic UDP packets, and if the application receives and sends a very large number of UDP packets, it needs to be said that this value is larger. You can use thesysctlcommand to make it effective immediately:

sysctl -w net.core.rmem_max=26214400 # Set to 25M

You can also modify the/etc/sysctl.confcorresponding parameters to keep the parameters in effect the next time you start.

If the message is too large, the data can be segmented on the sender to ensure that the size of each message is within the MTU.

Another parameter that can be configured isnetdev_max_backlogthat it represents the number of messages that can be cached by the Linux kernel after it reads a message from the NIC driver, by default 1000, which can be set to a value such as 2000:

sudo sysctl -w net.core.netdev_max_backlog=2000

System load is too high

System CPU, memory, IO load is too high can cause network drops, such as the CPU if the load is too high, the system does not have time for the checksum calculation of the message, copy memory, etc., resulting in a network card or socket buffer out of the packet, memory load is too high, The application is too slow to process the packet, the IO load is too high, the CPU is used to respond to IO wait, and there is no time to process the UDP packets in the cache.

The Linux system itself is an interconnected system, and any problem with one component can affect the normal operation of other components. It is either an application problem or insufficient system load for the system. For the former need to find timely, debug and repair, for the latter, but also to find and expand in time.

Apply Drop Packets

The system's UDP buffer size is mentioned above, and the adjusted SYSCTL parameter is only the maximum allowable value for the system, and each application needs to set its own socket buffer size value when creating the socket.

The Linux system puts the received message into the socket buffer, and the application continuously reads the message from buffer. So here are two application-related factors that can affect whether the packet is dropped: the size of the socket buffer and the speed at which the application reads the message.

For the first question, you can set the size of the socket receive buffer when the application initializes the socket, such as the following code to set the socket buffer to 20MB:

uint64_t receive_buf_size = 20*1024*1024; //20 MB
setsockopt(socket_fd, SOL_SOCKET, SO_RCVBUF, &receive_buf_size, sizeof(receive_buf_size));

If you are not writing and maintaining a program, it is not even possible to modify the application code. Many applications will provide configuration parameters to adjust this value, please refer to the corresponding official document, if there is no configuration parameters available, only to the developer of the program to mention issue.

Obviously, increasing the receive buffer for your app will reduce the likelihood of packet loss, but will also cause your app to use more memory, so use caution.

Another factor is that the application reads the speed of the message in buffer, and for the application, the processing message should take the form of an asynchronous

Where did you leave the bag?

To learn more about which function the Linux system drops when it executes, you can use thedropwatchtool, which listens to the system for packet drops, and prints out the address of the function where the packet drops occurred:

# dropwatch -l kas

Initalizing kallsyms db

dropwatch> start

Enabling monitoring...

Kernel monitoring activated.

Issue Ctrl-C to stop monitoring 1 drops at tcp_v4_do_rcv+cd (0xffffffff81799bad) 10 drops at tcp_v4_rcv+80 (0xffffffff8179a620) 1 drops at sk_stream_kill_queues+57 (0xffffffff81729ca7) 4 drops at unix_release_sock+20e (0xffffffff817dc94e) 1 drops at igmp_rcv+e1 (0xffffffff817b4c41) 1 drops at igmp_rcv+e1 (0xffffffff817b4c41)

With this information, find the corresponding kernel code, you can know in which step the kernel dropped the message, as well as the approximate reason for packet loss.

In addition, you can use the Linux perf tool to listenkfree_skbfor events that call this function when the network message is discarded:

sudo perf record -g -a -e skb:kfree_skb
sudo perf script

On the use and interpretation of the perf command, there are many articles on the Internet to refer to.

Summarize