Wednesday, December 30, 2015

Jumbo Frames—Design Considerations for Efficient Network

Each network has some maximum packet size, or maximum transmission unit (MTU). Ultimately there is some limit imposed by the technology, but often the limit is an engineering choice or even an administrative choice.[1]

Many Gigabit Ethernet switches and Gigabit Ethernet network interface cards can support jumbo frames.[2] There are performance benefits to enable Jumbo Frames (MTU: 9000). However, existing transmission links may still impose smaller MTU (e.g., 1500). This could exhibit issues along transit paths, which is referred to here as MTU Mismatch.

In this article, we will examine issues manifested by MTU mismatch and their design considerations.

How to Accommodate MTU Differences


When a host on the Internet wants to send some data, it must know how to divide the data into packets. And in particular, it needs to know the maximum size of packet.

Jumbo frames are Ethernet frames with more than 1500 bytes of payload.[3] Conventionally, jumbo frames can carry up to 9000 bytes of payload, but variations exist and some care must be taken using the term. In this article, we will use MTU: 9000 and MTU: 1500 as our examples to discuss MTU-mismatch issues.

Issues

MTU is a maximum—you tell a network device NOT to drop frames unless they are larger than the maximum. A device with an MTU of 1500 can still communicate with a device with an MTU of 9000. However, when large-size packets are sent from MTU 9000 device to MTU-1500 device, the following happens:
  • If DF (Don't Fragment) flag is set
    • Packets will be dropped and a router is required to return an ICMP Destination Unreachable message to the source of the datagram, with the Code indicating "fragmentation needed and DF set"
  • If DF flag is not set
    • Packets will be fragmented to accommodate MTU differences, which will beget a cost[4]

How to Test Potential MTU Mismatch


Either ping, tracepath, or traceroute (with --mtu option) command can be used to test potential MTU-mismatches.

For example, you can verify that the path between two end nodes has at least the expected MTU using the ping command:
ping -M do -c 4 -s 8972
The -M do option causes the DF flag to be set.
The -c option sets the number of pings.
The -s option specifies the number of bytes of padding that should be added to the echo request. In addition to this number there will be 20 bytes for the internet protocol header, and 8 bytes for the ICMP header and timestamp. The amount of padding should therefore be 28 bytes less than the network-layer MTU that you are trying to test (9000 − 28 = 8972).

If the test is unsuccessful, then you should see an error in response to each echo request:
$ ping -M do -c 4 -s 8972 10.252.136.96
PING 10.252.136.96 (10.252.136.96) 8972(9000) bytes of data.
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.249.184.27 icmp_seq=1 Frag needed and DF set (mtu = 1500)


--- 10.252.136.96 ping statistics ---
0 packets transmitted, 0 received, +4 errors
Similarly, you can use tracepath command to test:
$ tracepath -n -l 9000
The -n option specifies not looking up host names (i.e, only print IP addresses numerically).
The -l option sets the initial packet length to pktlen instead of 65536 for tracepath or 128000 for tracepath6.
In the tracepath output, the last line summarizes information about all the path to the destination:
The last line shows detected Path MTU, amount of hops to the destination and our guess about amount of hops from the destination to us, which can be different when the path is asymmetric.
/* a packet of length 9000 cannot reach its destination */
$ tracepath -n -l 9000 10.249.184.27
1: 10.241.71.129 0.630ms
2: 10.241.152.60 0.577ms
3: 10.241.152.0 0.848ms
4: 10.246.1.49 1.007ms
5: 10.246.1.106 0.783ms
6: no reply
...
31: no reply
Too many hops: pmtu 9000
Resume: pmtu 9000
/* a packet of length 1500 reached its destination */
$ tracepath -n -l 1500 10.249.184.27
1: 10.241.71.129 0.502ms
2: 10.241.152.62 0.419ms
3: 10.241.152.4 0.543ms
4: 10.246.1.49 0.886ms
5: 10.246.1.106 0.439ms
6: 10.249.184.27 0.292ms reached
Resume: pmtu 1500 hops 6 back 59

When to Enable Jumbo Frames?


Enabling jumbo frame mode (for example, on Gigabit Ethernet network interface cards) can offer the following benefits:
  • Less consumption of bandwidth by non-data protocol overhead
    • Hence increase network throughput
  • Reduction of the packet rate
    • Hence reduce server overhead
      • The use of large MTU sizes allows the operating system to send fewer packets of a larger size to reach the same network throughput.
      • For example, you will see the decrease in CPU usage when transferring larger file
The above factors are especially important in speeding up NFS or iSCSI traffic, which normally has larger payload size.

Design Considerations


When jumbo frame mode is enabled, the trade-offs include:
  • Bigger I/O buffer
    • Required for both end nodes and intermediate transit nodes
  • MTU mismatch
    • May beget IP fragmentation or even loss of data
Therefore, some design considerations are required. For example, you can:
  • Avoid situations where you have jumbo frame enabled host NIC's talking to non-jumbo frame enabled host NIC's.
      • One design trick is to let your NFS or ISCSI traffic be sent via a dedicated NIC and your normal host traffic be sent via a non-jumbo-MTU enabled interface
        • If your workload only include small messages, then the larger MTU size will not help
      • Be sure to use commands with the Don't fragment bit set to ensure that your hosts which are configured for jumbo frames are able to successfully communicate with each other via jumbo frames.
  • Enable Path MTU Discovery (PMTUD)[18]
    • When possible, use the largest MTU size that the adapter and network support, but constrained by Path MTU
    • Make sure the packet filter on your firewall process ICMP packets correctly
      • RFC 4821, Packetization Layer Path MTU Discovery, describes a Path MTU Discovery technique which responds more robustly to ICMP filtering.
  • Be aware of extra non-data protocol overhead if you configure encapsulation such as GRE tunneling or IPsec encryption.

References

  1. The TCP Maximum Segment Size and Related Topics
  2. Jumbo/Giant Frame Support on Catalyst Switches Configuration Example
  3. Ethernet Jumbo Frames\
  4. IP Fragmentation: How to Avoid It? (Xml and More)
  5. The Great Jumbo Frames Debate
  6. Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC
  7. Sites with Broken/Working PMTUD
  8. Path MTU Discovery
  9. TCP headers
  10. bad TCP checksums
  11. MSS performance consideration
  12. Understanding Routing Table
  13. route (Linux man page)
  14. Docker should set host-side veth MTU #4378
  15. Add MTU to lxc conf to make host and container MTU match
  16. Xen Networking
  17. TCP parameter settings (/proc/sys/net/ipv4)
  18. Change the MTU of a network interface
    • To enable PMTUD on Linux, type:
      • echo 1 > /proc/sys/net/ipv4/tcp_mtu_probing
      • echo 1024 > /proc/sys/net/ipv4/tcp_base_mss
  19. MTU manipulation
  20. Jumbo Frames, the gotcha's you need to know! (good)
  21. Understand container communication (Docker)
  22. calicoctl should allow configuration of veth MTU #488 - GitHub
  23. Linux MTU Change Size
  24. Changing the MTU size in Windows Vista, 7 or 8
  25. Linux Configure Jumbo Frames to Boost Network Performance
  26. Path MTU discovery in practice
  27. Odd tracepath and ping behavior when using a 9000 byte MTU
  28. How to Read a Traceroute

Sunday, December 27, 2015

IP Fragmentation: How to Avoid It?

Most Ethernet LANs use an MTU of 1500 bytes (modern LANs can use Jumbo frames, allowing for an MTU up to 9000 bytes); however, border protocols like PPPoE will reduce this.

Jumbo Frames have performance benefits if it is enabled on 10 Gigabit + Networks, especially when using Ethernet based storage access, with vMotion, or with other high throughput applications such as Oracle RAC Interconnects.[1]

Enabling Jumbo Frames may also aggravate IP fragmentation due to more frequent MTU mismatches along the transmission path. In this article, we will review IP fragmentation overhead and the way to avoid it.

MSS vs MTU


To begin with, you can think those terms measuring two different sizes:
  • MSS is related to receiving buffer(s)
    • Can be announced via TCP Header(s)[22,24]
      • When a TCP client initiates a connection to a server, it includes its MSS as an option in the first (SYN) packet
      • When a TCP server receives a connection from a client, it includes its MSS as an option in the a SYN/ACK packet
  • MTU is related to transmission link(s)
    • Can be configured on network interface(s)[15,19,20]
The IP protocol was designed for use on a wide variety of transmission links. Although the maximum length of an IP datagram is 64K, most transmission links enforce a smaller maximum packet length limit, called a MTU (or Maximum Transmission Unit).

Originally, MSS meant how big a buffer was allocated on a receiving station to be able to store the TCP data. In other words, TCP Maximum Segment Size (MSS) defines the maximum amount of data that a host is willing to accept in a single TCP/IP datagram.  MSS and MTU are related in this way—MSS is just the TCP data size, which does not include the IP header and the TCP header:[22]
  • MSS = MTU - sizeof(TCPHDR) - sizeof(IPHDR)
    • For example, assuming both TCPHDR and IPHDR have the default size of 20 bytes
      • MSS = 8960 if MTU= 9000
      • MSS = 1460 if MTU= 1500

IP Fragmentation Overhead


The design of IP accommodates MTU differences by allowing routers to fragment IP datagrams as necessary. The receiving station is responsible for reassembling the fragments back into the original full size IP datagram. Read [2] for a good illustration of IP Fragmentation and Reassembly.
There are several issues that make IP fragmentation undesirable:
  • Overhead for Sender (or Router)
    • There is a small increase in CPU and memory overhead to fragment an IP datagram
  • Overhead for Receiver
    • The receiver must allocate memory for the arriving fragments and coalesce them back into one datagram after all of the fragments are received.
  • Overhead for handling dropped frames
    • Fragments could be dropped because of a congested link
      • If this happens, the complete original datagram will have to be retransmitted
  • Causing firewall having trouble enforcing its policies
    • If the IP fragments are out of order, a firewall may block the non-initial fragments because they do not carry the information that would match the packet filter.
      • Read [2] for more explanation

How to Avoid Fragmentation


Contrary to popular belief, the MSS value is not negotiated between hosts.[22,24] However, an additional step could be taken by the sender to avoid fragmentation on the local and remote wires. In scenario 2, it illustrates this additional step:
The way MSS now works is that each host will first compare its outgoing interface MTU with its own MSS buffer and choose the lowest value as the MSS to send. The hosts will then compare the MSS size received against their own interface MTU and again choose the lower of the two values.
Path MTU Discovery (PMTUD) is another technique that can help you avoid IP fragmentation. It was originally intended for routers in IPv4. However, all modern operating systems use it on endpoints. PMTUD is especially useful in network situations where intermediate links have smaller MTUs than the MTU of the end links. To avoid IP fragmentation, it dynamically determines the lowest MTU along the path from a packet's source to its destination. However, as discussed here, there are three things that can break PMTUD.

Unfortunately, IP fragmentation issues could be be more widespread than you thought due to:
  • Dynamic Routing
    • The MTUs on the intermediate links may vary on different routes
  • IP tunnels
    • The reason that tunnels cause more fragmentation is because the tunnel encapsulation adds "overhead" to the size a packet.
      • For example, adding Generic Router Encapsulation (GRE) adds 24 bytes to a packet, and after this increase the packet may need to be fragmented because it is larger then the outbound MTU.

In [3], it lists which sites using PMTUD or alternatives to resolve IP fragmentation. It also lists sites simply allowing their packets to be fragmented rather than using PMTUD. This demonstrates the complexity of fragmentation avoidance.

References

  1. The Great Jumbo Frames Debate
  2. Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC
  3. Sites with Broken/Working PMTUD
  4. Path MTU Discovery
  5. TCP headers
  6. bad TCP checksums
  7. MSS performance consideration
  8. Understanding Routing Table
  9. route (Linux man page)
  10. Docker should set host-side veth MTU #4378
  11. Add MTU to lxc conf to make host and container MTU match
  12. Xen Networking
  13. TCP parameter settings (/proc/sys/net/ipv4)
  14. Change the MTU of a network interface
    • tcp_base_mss, tcp_mtu_probing, etc
  15. MTU manipulation
    • Ethernet MTU vs IP MTU
      • The default Ethernet MTU is 1500 bytes and can be configured and can be raised on Cisco IOS with the system mtu command under global configuration.
      • As with Ethernet frames, the MTU can be adjusted for IP packets. However, the IP MTU is configured per interface rather than system-wide, with the ip mtu command.
  16. Jumbo Frames, the gotcha's you need to know! (good)
  17. Understand container communication (Docker)
  18. calicoctl should allow configuration of veth MTU #488 - GitHub
  19. Linux MTU Change Size
  20. Changing the MTU size in Windows Vista, 7 or 8
  21. Linux Configure Jumbo Frames to Boost Network Performance
  22. The TCP Maximum Segment Size and Related Topics (RFC 879)
    • The MSS can be used completely independently in each direction of data flow.
  23. Changing TCP MSS under LINUX - Networking
  24. TCP negotiations

Sunday, December 13, 2015

Routing Table: All Things Considered

Routing is the decision over which interface a packet is to be sent. This decision has to be made for locally created packets, too. Routing tables contain network addresses and the associated interface or next hop.

In this article, we will review all aspects of routing table (i.e.,  ip route and ip rule) on a Linux server.

Routing Table


A routing table is a set of rules, often viewed in table format, that is used to determine where data packets traveling over an Internet Protocol (IP) network will be directed. All IP-enabled devices, including routers and switches, use routing tables.

Routing tables can be maintained in two ways:[1,11]
  • Manually 
    • Tables for static network devices do not change unless a network administrator manually changes them
      • Useful when few or just one route exist
      • Can be administrative burden
      • Frequently used for default route[12]
    • Static routes can be added via
      • "route add" command. 
        • To persist it, you can also add the route command to rc.local
  • Dynamically
    • Devices build and maintain their routing tables automatically by using routing protocols[8] to exchange information about the surrounding network topology. 
    • Dynamic routing tables allow devices to "listen" to the network and respond to occurrences like:
      • Device failures 
      • Network congestion.

Kernel IP Routing Table


Beyond the two commonly used routing tables (the local and main routing tables), the Linux kernel supports up to 252 additional routing tables.

The ip route and ip rule commands have built in support for the special tables main and local. Any other routing tables can be referred to by number or an administratively maintained mapping file, /etc/iproute2/rt_tables.

Typical content of /etc/iproute2/rt_tables[13,14]

#
# reserved values
#
255     local      1
254     main       2
253     default    3
0       unspec     4
#
# local
#
1      inr.ruhep   5
      
1The local table is a special routing table maintained by the kernel. Users can remove entries from the local routing table at their own risk. Users cannot add entries to the local routing table. The file /etc/iproute2/rt_tables need not exist, as the iproute2 tools have a hard-coded entry for the local table.
2The main routing table is the table operated upon by route and, when not otherwise specified, by ip route. The file /etc/iproute2/rt_tables need not exist, as the iproute2 tools have a hard-coded entry for the main table.
3The default routing table is another special routing table
4Operating on the unspec routing table appears to operate on all routing tables simultaneously. 
5This is an example indicating that table 1 is known by the name inr.ruhep. Any references to table inr.ruhep in an ip rule or ip route will substitute the value 1 for the word inr.ruhep.

Format of Routing Table


Rules in the routing table usually consists of the following entities:
  • Network Destination 
    • The one which is outside your subnet
      • Basically it has different subnet mask compared to the local's
  • NetMask[9]
    • Makes it easier for the Router (i.e., layer 3 device, which isolates 2 subnets). 
      • This is used to identify which subnet the packet must go to
    • aka GenMask
      •  shows the “generality” of the route, i.e., the network mask for this route
  • Gateway
    • There could be more than one gateway within a network, so to reach the destination we configure which could be the best possible gateway
    • A gateway is an essential feature of most routers, although other devices (such as any PC or server) can function as a gateway.
  • Interface
    • You could have multiple interfaces (ethernet interfaces, eth0, eth1, eth2...) on your device, which each interface would be assigned an IPAddress
    • This provides instruction on how to reach the gateway and through which interface it needs to push the packet.
  • Metrics (Cost)[10]
    • Provides the path cost, basically for static routing the value would be 1 (default, but we can change it) and for dynamic routing (RIP, IGRP, OSPF) it varies.
  • MSS
    • Maximum Segment Size for TCP connections over this route
      • Usually has the value of 0, meaning “no changes”
    • MTU vs MSS
      • MTU = MSS + Header (40 bytes or more)
      • For example,
        • MTU = 576 -> MSS = 536
        • MTU = 1500 -> MSS = 1460 
    • Fragmentation of data segment 
      • If the data segment size is too large for any of the routers through which the data passes, the oversize segment(s) are fragmented. 
      • This slows down the connection speed as seen by the computer user. In some cases the slowdown is dramatic.
      • The likelihood of such fragmentation can be minimized by keeping the MSS as small as reasonably possible. 
  • Window
    • Default window size, which indicates how many TCP packets can be sent before at least one of them has to be ACKnowledged. 
    • Like the MSS, this field is usually 0, meaning “no changes”
  • irtt (Initial Round Trip Time)
    • May be used by the kernel to guess about the best TCP parameters without waiting for slow replies. 
    • In practice, it’s not used much, so you’ll probably never see anything else than 0 here.

How do I get there from here?


Device uses routing table to decide either a packet to stay in the current sub-net or be pushed to outside the sub-net.  Here are the rules to be used given a destination address in the packet:[2]
  • Position of route in the table has no significance. 
  • When more than one route matches a destination address
    • The route with the longest subnet mask (most consecutive 1-bits) is used
  • When multiple routes match the destination and have subnet masks of the same length
    • The route with the lowest metric is used

Scenario 1

Assume that we have a packet with destination IP address as w.x.y.z arrives to a router which checks its routing table.  If the router identifies that w.x.y.0/24 is present in the table, it will try to reach the concerned gateway by pushing through the respective interface.

Assuming that there are two entries of the same with metrics different, it will choose the one which has a lower value. Assuming the router does not find any entries in the routing table, it will go to default gateway. 

Scenario 2

Now assume another scenario: a packet with the destination IP address k.l.m.n arrives to the router, and k.l.m.0/24 is default mask of the router, then it implies that the packet is destined to same network and it will not push packet to the peer sub-net.

Linux Commands


The following Linux commands can be used to print the routing table on the server:
In below discussions, we will use a server which has the following entry in its /etc/sysconfig/network configuration file:
  • Default Gateway: 10.244.0.1

and the interface entry in the /sbin/ifconfig configuration file has:

eth0      Link encap:Ethernet  HWaddr 00:ll:mm:nn:aa:bb
          inet addr:10.244.3.87  Bcast:10.244.7.255  Mask:255.255.248.0


route Command


Output

  • Ref
    • Number of references to this route (not used in the Linux kernel)
  • Use
    • Count of lookups for the route. Depending on the use of -F and -C this will be either route cache misses (-F) or hits (-C).
  • Flags
    • U (route is up)
    • H (target is a host)
    • G (use gateway)
    • R (reinstate route for dynamic routing)
    • D (dynamically installed by daemon or redirect)
    • M (modified from routing daemon or redirect)
    • A (installed by addrconf)
    • C (cache entry)
    • ! (reject route)

Option

  • -F
    • Operate on the kernel's FIB (Forwarding Information Base) routing table. This is the default.
  • -C
    • Operate on the kernel's routing cache.
  • -n
    • Show numerical addresses instead of trying to determine symbolic host names. This is useful if you are trying to determine why the route to your name server has vanished.


$ /sbin/route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.244.0.1      0.0.0.0         UG    0      0        0 eth0
10.244.0.0      0.0.0.0         255.255.248.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1010   0        0 usb0
169.254.182.0   0.0.0.0         255.255.255.0   U     0      0        0 usb0


netstat Command


netstat options:
  • -r 
    • Display the kernel routing tables
    • -n 
      • Show numerical addresses instead of trying to determine symbolic host, port or user names

    $ netstat -rn
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
    0.0.0.0         10.244.0.1      0.0.0.0         UG        0 0          0 eth0
    10.244.0.0      0.0.0.0         255.255.248.0   U         0 0          0 eth0
    169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
    169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 usb0
    169.254.182.0   0.0.0.0         255.255.255.0   U         0 0          0 usb0


    In the context of servers, 0.0.0.0 means "all IPv4 addresses on the local machine". If a host has two IP addresses, 192.168.1.1 and 10.1.2.1, and a server running on the host listens on 0.0.0.0, it will be reachable at both  IPs.

    ip route Command


    Output

    • protocol
      • redirect - the route was installed due to an ICMP  redirect
      • kernel  -  the  route was installed by the kernel during autoconfiguration
      • boot  -  the  route  was  installed  during  the  bootup sequence.  If a routing daemon starts, it will purge all of them.
      • static - the route was installed by the administrator to override  dynamic  routing.  Routing daemon will respect them and, probably, even advertise them to its peers.
      • ra - the route was installed by Router Discovery  protocol
    • scope
      • global - the address is globally valid
      • site - (IPv6 only) the address is site local, i.e. it is valid inside this site
      • link - the address is link local, i.e. it is valid  only on this device
      • host - the address is valid only inside this host
    • src
      • The  source  address  to prefer when sending to the destinations  covered by the route prefix
      • Most commonly used  on multi-homed hosts, although almost every machine out there uses this hint for connections to localhost

    Option
    • ip route show - list routes
      • the command displays the contents of the routing tables or the route(s) selected by some criteria
      • table (sub command)
        • show  the  routes from provided table(s).  The default setting is to show table main.  TABLEID may either be the ID of a real table or one of the special values:
          • all - list all of the tables.
          • cache - dump the routing cache.


    $ /sbin/ip route
    default via 10.244.0.1 dev eth0
    10.244.0.0/21    dev eth0  proto kernel  scope link  src 10.244.3.87
    169.254.0.0/16   dev eth0  scope link  metric 1002
    169.254.0.0/16   dev usb0  scope link  metric 1010
    169.254.182.0/24 dev usb0  proto kernel  scope link  src 169.254.182.77


    # Viewing the local routing table with ip route show table local

    $ /sbin/ip route show table local
    broadcast 10.244.0.0 dev eth0  proto kernel  scope link  src 10.244.3.87
    local     10.244.3.87 dev eth0  proto kernel  scope host  src 10.244.3.87
    broadcast 10.244.7.255 dev eth0  proto kernel  scope link  src 10.244.3.87
    broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1
    local     127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
    local     127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1
    broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1
    broadcast 169.254.182.0 dev usb0  proto kernel  scope link  src 169.254.182.77
    local     169.254.182.77 dev usb0  proto kernel  scope host  src 169.254.182.77

    broadcast 169.254.182.255 dev usb0  proto kernel  scope link  src 169.254.182.7

    References

    1. Routing Table Definition
    2. How do I read/interpret a (netstat) routing table ?
    3. Static Routes and the Default Gateway (Redhat)
    4. How Routing Algorithms Work
      • Hierarchical routing -- When the network size grows, the number of routers in the network increases. Consequently, the size of routing tables increases, as well, and routers can't handle network traffic as efficiently. We use hierarchical routing to overcome this problem.
      • Internet -> Clusters -> Regions -> Nodes
    5. Routing Table
    6. route
    7. ip route
    8. Introduction to routing protocols (good)
    9. IP Address Mask Formats—the Router will display different Mask formats at different times:
      • bitcount —172.16.31.6/24
      • hexadecimal —172.16.31.6 0xFFFFFF00
      • decimal — 172.16.31.6 255.255.255.0 
    10. Metrics (Cost).  Different protocols use different metrics:
      • RIP/RIPv2 is hop count and ticks (IPX)
        • Ticks are used to determine server timeout
      • OSPF/ISIS is interface cost (bandwidth) 
      • (E)IGRP is compound 
      • BGP can be complicated
    11. 3 ways of building forwarding table in router:
      • Directly connected 
        • Routes that the router is attached to
      • Static 
        • Routes are manually defined 
      • Dynamic 
        • Routes protocol are learned from a Protocol
    12. Default route
      • Route used if no match is found in routing table
      • Special network number: 0.0.0.0 (IP) 
    13. Multiple Routing Table
    14. IPROUTE2 Utility Suite Howto
    15. Difference between routing and forwarding table
    16. Loopback address
      •  A special IP number (127.0.0.1) that is designated for the software loopback interface of a machine. 
        • The loopback interface has no hardware associated with it, and it is not physically connected to a network.
    17. Netstat: network analysis and troubleshooting, explained

    Tuesday, December 1, 2015

    Cloud: Find Out More about Your VM

    If you can access a VM in a cloud and would like to find out more about it, here are what you can do on a Linux system.

    lscpu


    lscpu is a useful Linux command to uncover CPU architecture information.  It can print out the following VM information:
    • Hypervisor vendor[1]
    • Virtualization type 
    • Cpu virtualization extension[2]
    Using two different servers (see below) as examples, we have found that:
    • Server #1
      • This is a Xen guest fully virtualized (HVM).
    • Server #2
      • This is a physical server.  However, it does have the virtualization extensions in hardware.
        • Another way to verify it is to:[3]
          • cat /proc/cpuinfo | grep vmx
    Server #1

    $ lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                4
    On-line CPU(s) list:   0-3
    Thread(s) per core:    4
    Core(s) per socket:    1
    Socket(s):             1
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 63
    Stepping:              2
    CPU MHz:               2294.924
    BogoMIPS:              4589.84
    Hypervisor vendor:     Xen
    Virtualization type:   full

    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              256K
    L3 cache:              46080K
    NUMA node0 CPU(s):     0-3

    Server #2


    $ lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                16
    On-line CPU(s) list:   0-15
    Thread(s) per core:    2
    Core(s) per socket:    4
    CPU socket(s):         2
    NUMA node(s):          2
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 26
    Stepping:              5
    CPU MHz:               1600.000
    BogoMIPS:              4521.27
    Virtualization:        VT-x
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              256K
    L3 cache:              8192K
    NUMA node0 CPU(s):     0-3,8-11
    NUMA node1 CPU(s):     4-7,12-15


    virt-what


    virt-what is another useful Linux command to detect if we are running in a virtual machine.  For example, Server #1 is a Xen guest fully virtualized as shown below:[4]

    $ sudo virt-what
    xen
    xen-hvm

    If virt-what is not installed in the system, you can install it using yum (an interactive, rpm based, package manager).

    To find out which package "virt-what" is in, type:

    $ yum whatprovides "*/virt-what"
    Loaded plugins: aliases, changelog, downloadonly, kabi, presto, refresh-
                  : packagekit, security, tmprepo, verify, versionlock
    Loading support for kernel ABI
    virt-what-1.11-1.1.el6.x86_64 : Detect if we are running in a virtual machine
    Repo        : installed
    Matched from:
    Filename    : /usr/sbin/virt-what

    To install the matched package, type (note that "-1.11-1.1.el6" in the middle of the full name has been removed):

    # yum install virt-what.x86_64
    Loaded plugins: aliases, changelog, downloadonly, kabi, presto, refresh-
                  : packagekit, security, tmprepo, verify, versionlock
    Loading support for kernel ABI
    Setting up Install Process
    Nothing to do


    Finally, If nothing is printed and the "virt-what" exits with code 0 (or no error) as in Server #2, then it can mean either that the program is running on bare-metal or the program is running inside a type of virtual machine which we don't know about or cannot detect.

    References

    1. How to check which hypervisor is used from my VM?
    2. Linux: Find Out If CPU Support Intel VT and AMD-V Virtualization Support
        • Hardware virtualization support:
          • vmx — Intel VT-x, virtualization support enabled in BIOS.
          • svm — AMD SVM,virtualization enabled in BIOS.
      1. Enabling Intel VT and AMD-V virtualization hardware extensions in BIOS
      2. Hardware-assisted virtualizion (HVM)
        • HVM support requires special CPU extensions - VT-x for Intel processors and AMD-V for AMD based machines. 
      3. Oracle Process Cloud Service 16.2.1 Release
      4. All Cloud-related articles on Xml and More