Sunday, December 27, 2015

IP Fragmentation: How to Avoid It?

Most Ethernet LANs use an MTU of 1500 bytes (modern LANs can use Jumbo frames, allowing for an MTU up to 9000 bytes); however, border protocols like PPPoE will reduce this.

Jumbo Frames have performance benefits if it is enabled on 10 Gigabit + Networks, especially when using Ethernet based storage access, with vMotion, or with other high throughput applications such as Oracle RAC Interconnects.[1]

Enabling Jumbo Frames may also aggravate IP fragmentation due to more frequent MTU mismatches along the transmission path. In this article, we will review IP fragmentation overhead and the way to avoid it.

MSS vs MTU


To begin with, you can think those terms measuring two different sizes:
  • MSS is related to receiving buffer(s)
    • Can be announced via TCP Header(s)[22,24]
      • When a TCP client initiates a connection to a server, it includes its MSS as an option in the first (SYN) packet
      • When a TCP server receives a connection from a client, it includes its MSS as an option in the a SYN/ACK packet
  • MTU is related to transmission link(s)
    • Can be configured on network interface(s)[15,19,20]
The IP protocol was designed for use on a wide variety of transmission links. Although the maximum length of an IP datagram is 64K, most transmission links enforce a smaller maximum packet length limit, called a MTU (or Maximum Transmission Unit).

Originally, MSS meant how big a buffer was allocated on a receiving station to be able to store the TCP data. In other words, TCP Maximum Segment Size (MSS) defines the maximum amount of data that a host is willing to accept in a single TCP/IP datagram.  MSS and MTU are related in this way—MSS is just the TCP data size, which does not include the IP header and the TCP header:[22]
  • MSS = MTU - sizeof(TCPHDR) - sizeof(IPHDR)
    • For example, assuming both TCPHDR and IPHDR have the default size of 20 bytes
      • MSS = 8960 if MTU= 9000
      • MSS = 1460 if MTU= 1500

IP Fragmentation Overhead


The design of IP accommodates MTU differences by allowing routers to fragment IP datagrams as necessary. The receiving station is responsible for reassembling the fragments back into the original full size IP datagram. Read [2] for a good illustration of IP Fragmentation and Reassembly.
There are several issues that make IP fragmentation undesirable:
  • Overhead for Sender (or Router)
    • There is a small increase in CPU and memory overhead to fragment an IP datagram
  • Overhead for Receiver
    • The receiver must allocate memory for the arriving fragments and coalesce them back into one datagram after all of the fragments are received.
  • Overhead for handling dropped frames
    • Fragments could be dropped because of a congested link
      • If this happens, the complete original datagram will have to be retransmitted
  • Causing firewall having trouble enforcing its policies
    • If the IP fragments are out of order, a firewall may block the non-initial fragments because they do not carry the information that would match the packet filter.
      • Read [2] for more explanation

How to Avoid Fragmentation


Contrary to popular belief, the MSS value is not negotiated between hosts.[22,24] However, an additional step could be taken by the sender to avoid fragmentation on the local and remote wires. In scenario 2, it illustrates this additional step:
The way MSS now works is that each host will first compare its outgoing interface MTU with its own MSS buffer and choose the lowest value as the MSS to send. The hosts will then compare the MSS size received against their own interface MTU and again choose the lower of the two values.
Path MTU Discovery (PMTUD) is another technique that can help you avoid IP fragmentation. It was originally intended for routers in IPv4. However, all modern operating systems use it on endpoints. PMTUD is especially useful in network situations where intermediate links have smaller MTUs than the MTU of the end links. To avoid IP fragmentation, it dynamically determines the lowest MTU along the path from a packet's source to its destination. However, as discussed here, there are three things that can break PMTUD.

Unfortunately, IP fragmentation issues could be be more widespread than you thought due to:
  • Dynamic Routing
    • The MTUs on the intermediate links may vary on different routes
  • IP tunnels
    • The reason that tunnels cause more fragmentation is because the tunnel encapsulation adds "overhead" to the size a packet.
      • For example, adding Generic Router Encapsulation (GRE) adds 24 bytes to a packet, and after this increase the packet may need to be fragmented because it is larger then the outbound MTU.

In [3], it lists which sites using PMTUD or alternatives to resolve IP fragmentation. It also lists sites simply allowing their packets to be fragmented rather than using PMTUD. This demonstrates the complexity of fragmentation avoidance.

References

  1. The Great Jumbo Frames Debate
  2. Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC
  3. Sites with Broken/Working PMTUD
  4. Path MTU Discovery
  5. TCP headers
  6. bad TCP checksums
  7. MSS performance consideration
  8. Understanding Routing Table
  9. route (Linux man page)
  10. Docker should set host-side veth MTU #4378
  11. Add MTU to lxc conf to make host and container MTU match
  12. Xen Networking
  13. TCP parameter settings (/proc/sys/net/ipv4)
  14. Change the MTU of a network interface
    • tcp_base_mss, tcp_mtu_probing, etc
  15. MTU manipulation
    • Ethernet MTU vs IP MTU
      • The default Ethernet MTU is 1500 bytes and can be configured and can be raised on Cisco IOS with the system mtu command under global configuration.
      • As with Ethernet frames, the MTU can be adjusted for IP packets. However, the IP MTU is configured per interface rather than system-wide, with the ip mtu command.
  16. Jumbo Frames, the gotcha's you need to know! (good)
  17. Understand container communication (Docker)
  18. calicoctl should allow configuration of veth MTU #488 - GitHub
  19. Linux MTU Change Size
  20. Changing the MTU size in Windows Vista, 7 or 8
  21. Linux Configure Jumbo Frames to Boost Network Performance
  22. The TCP Maximum Segment Size and Related Topics (RFC 879)
    • The MSS can be used completely independently in each direction of data flow.
  23. Changing TCP MSS under LINUX - Networking
  24. TCP negotiations

No comments: