Wednesday, December 30, 2015

Jumbo Frames—Design Considerations for Efficient Network

Each network has some maximum packet size, or maximum transmission unit (MTU). Ultimately there is some limit imposed by the technology, but often the limit is an engineering choice or even an administrative choice.[1]

Many Gigabit Ethernet switches and Gigabit Ethernet network interface cards can support jumbo frames.[2] There are performance benefits to enable Jumbo Frames (MTU: 9000). However, existing transmission links may still impose smaller MTU (e.g., 1500). This could exhibit issues along transit paths, which is referred to here as MTU Mismatch.

In this article, we will examine issues manifested by MTU mismatch and their design considerations.

How to Accommodate MTU Differences


When a host on the Internet wants to send some data, it must know how to divide the data into packets. And in particular, it needs to know the maximum size of packet.

Jumbo frames are Ethernet frames with more than 1500 bytes of payload.[3] Conventionally, jumbo frames can carry up to 9000 bytes of payload, but variations exist and some care must be taken using the term. In this article, we will use MTU: 9000 and MTU: 1500 as our examples to discuss MTU-mismatch issues.

Issues

MTU is a maximum—you tell a network device NOT to drop frames unless they are larger than the maximum. A device with an MTU of 1500 can still communicate with a device with an MTU of 9000. However, when large-size packets are sent from MTU 9000 device to MTU-1500 device, the following happens:
  • If DF (Don't Fragment) flag is set
    • Packets will be dropped and a router is required to return an ICMP Destination Unreachable message to the source of the datagram, with the Code indicating "fragmentation needed and DF set"
  • If DF flag is not set
    • Packets will be fragmented to accommodate MTU differences, which will beget a cost[4]

How to Test Potential MTU Mismatch


Either ping, tracepath, or traceroute (with --mtu option) command can be used to test potential MTU-mismatches.

For example, you can verify that the path between two end nodes has at least the expected MTU using the ping command:
ping -M do -c 4 -s 8972
The -M do option causes the DF flag to be set.
The -c option sets the number of pings.
The -s option specifies the number of bytes of padding that should be added to the echo request. In addition to this number there will be 20 bytes for the internet protocol header, and 8 bytes for the ICMP header and timestamp. The amount of padding should therefore be 28 bytes less than the network-layer MTU that you are trying to test (9000 − 28 = 8972).

If the test is unsuccessful, then you should see an error in response to each echo request:
$ ping -M do -c 4 -s 8972 10.252.136.96
PING 10.252.136.96 (10.252.136.96) 8972(9000) bytes of data.
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)


--- 10.252.136.96 ping statistics ---
0 packets transmitted, 0 received, +4 errors
Similarly, you can use tracepath command to test:
$ tracepath -n -l 9000
The -n option specifies not looking up host names (i.e, only print IP addresses numerically).
The -l option sets the initial packet length to pktlen instead of 65536 for tracepath or 128000 for tracepath6.
In the tracepath output, the last line summarizes information about all the path to the destination:
The last line shows detected Path MTU, amount of hops to the destination and our guess about amount of hops from the destination to us, which can be different when the path is asymmetric.
/* a packet of length 9000 cannot reach its destination */
$ tracepath -n -l 9000 10.249.184.27
1: 10.241.71.129 0.630ms
2: 10.241.152.60 0.577ms
3: 10.241.152.0 0.848ms
4: 10.246.1.49 1.007ms
5: 10.246.1.106 0.783ms
6: no reply
...
31: no reply
Too many hops: pmtu 9000
Resume: pmtu 9000
/* a packet of length 1500 reached its destination */
$ tracepath -n -l 1500 10.249.184.27
1: 10.241.71.129 0.502ms
2: 10.241.152.62 0.419ms
3: 10.241.152.4 0.543ms
4: 10.246.1.49 0.886ms
5: 10.246.1.106 0.439ms
6: 10.249.184.27 0.292ms reached
Resume: pmtu 1500 hops 6 back 59

When to Enable Jumbo Frames?


Enabling jumbo frame mode (for example, on Gigabit Ethernet network interface cards) can offer the following benefits:
  • Less consumption of bandwidth by non-data protocol overhead
    • Hence increase network throughput
  • Reduction of the packet rate
    • Hence reduce server overhead
      • The use of large MTU sizes allows the operating system to send fewer packets of a larger size to reach the same network throughput.
      • For example, you will see the decrease in CPU usage when transferring larger file
The above factors are especially important in speeding up NFS or iSCSI traffic, which normally has larger payload size.

Design Considerations


When jumbo frame mode is enabled, the trade-offs include:
  • Bigger I/O buffer
    • Required for both end nodes and intermediate transit nodes
  • MTU mismatch
    • May beget IP fragmentation or even loss of data
Therefore, some design considerations are required. For example, you can:
  • Avoid situations where you have jumbo frame enabled host NIC's talking to non-jumbo frame enabled host NIC's.
      • One design trick is to let your NFS or ISCSI traffic be sent via a dedicated NIC and your normal host traffic be sent via a non-jumbo-MTU enabled interface
        • If your workload only include small messages, then the larger MTU size will not help
      • Be sure to use commands with the Don't fragment bit set to ensure that your hosts which are configured for jumbo frames are able to successfully communicate with each other via jumbo frames.
  • Enable Path MTU Discovery (PMTUD)[18]
    • When possible, use the largest MTU size that the adapter and network support, but constrained by Path MTU
    • Make sure the packet filter on your firewall process ICMP packets correctly
      • RFC 4821, Packetization Layer Path MTU Discovery, describes a Path MTU Discovery technique which responds more robustly to ICMP filtering.
  • Be aware of extra non-data protocol overhead if you configure encapsulation such as GRE tunneling or IPsec encryption.

References

  1. The TCP Maximum Segment Size and Related Topics
  2. Jumbo/Giant Frame Support on Catalyst Switches Configuration Example
  3. Ethernet Jumbo Frames\
  4. IP Fragmentation: How to Avoid It? (Xml and More)
  5. The Great Jumbo Frames Debate
  6. Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC
  7. Sites with Broken/Working PMTUD
  8. Path MTU Discovery
  9. TCP headers
  10. bad TCP checksums
  11. MSS performance consideration
  12. Understanding Routing Table
  13. route (Linux man page)
  14. Docker should set host-side veth MTU #4378
  15. Add MTU to lxc conf to make host and container MTU match
  16. Xen Networking
  17. TCP parameter settings (/proc/sys/net/ipv4)
  18. Change the MTU of a network interface
    • To enable PMTUD on Linux, type:
      • echo 1 > /proc/sys/net/ipv4/tcp_mtu_probing
      • echo 1024 > /proc/sys/net/ipv4/tcp_base_mss
  19. MTU manipulation
  20. Jumbo Frames, the gotcha's you need to know! (good)
  21. Understand container communication (Docker)
  22. calicoctl should allow configuration of veth MTU #488 - GitHub
  23. Linux MTU Change Size
  24. Changing the MTU size in Windows Vista, 7 or 8
  25. Linux Configure Jumbo Frames to Boost Network Performance
  26. Path MTU discovery in practice
  27. Odd tracepath and ping behavior when using a 9000 byte MTU
  28. How to Read a Traceroute

No comments: