Xml and More: February 2016

Docker container networks have to achieve two seemingly conflicting goals:

Provides complete isolation for containers on a host
Provides the service that is running inside the container, not only to other co-located containers, but also to remote hosts.

In this article, we will review how Docker container networks achieve its goals.

Docker Container Networks

To provide service that is running inside the container in a secured matter, it is important to have control over the networks your applications running on. To see how container networks achieve that, we will examine the container networks from the following perspectives:

Network modes

Default networks vs user-defined networks

Packet Forwarding and Filtering (Netfilter)

Port mappings

Bridge (veth Interface)
DNS Configuration

To enable a service consumer to communicate with the service providing containers, Docker needs to configure the following entities:

IP address

Providing ways to configure any of the containers network interfaces to support services on different containers

Port

Providing ways to expose and publish a port on the container (also, mapping it to a port on the host)

Rules

Controlling access to a container's service via rules associated with the host's netfilter framework, in both the NAT and filter tables (see diagram 1, 2, 3).

Network Modes

When you install Docker, it creates three networks automatically:^[34,37]

bridge

Represents the docker0 (a virtual ethernet bridge) network present in all Docker installations.

Each container's network interface is attached to the bridge, and network address translation (NAT) is used when containers need to make themselves visible to the Docker host and beyond.

Unless you specify otherwise with the docker run --net=option, the Docker daemon connects containers to this network by default.
Docker does not support automatic service discovery on the default bridge network.
Supports the use of port mapping and docker run --link to allow communications between containers in the docker0 network.

host

Adds a container on the hosts network stack. You’ll find the network configuration inside the container is identical to the host.

Because containers deployed in host mode share the same host network stack, you can’t use the same IP address for the same service on different containers on the same host.
In this mode, you don't get port mapping anymore.

none

Tells docker to put the container in its own network stack but not to do configure any of the containers network interfaces.
This allows for you to create custom network configuration

All these network modes applied at the container level. So you can certainly have a mix of different network modes on the same docker host.

Default Networks vs User-Defined Networks

Besides default networks, you can create your own user-defined networks that better isolate containers. Docker provides some default network drivers for creating these networks. The easiest user-defined network to create is a bridge network. This network is similar to the historical, default docker0 network. After you create the network, you can launch containers on it using the docker run --net= option. Within a user-defined bridge network, linking is not supported. You can expose and publish container ports on containers in this network. This is useful if you want to make a portion of the bridge network available to an outside network.

You can read [34, 37] for more details.

Packet Forwarding and Filtering

Whether a container can talk to the world is governed by two factors.

Whether the host machine is forwarding its IP packets

In order for a remote host to consume a container's service, the Docker host must act like a router, forwarding traffic to the network associated with the ethernet bridge.
IP packet forwarding is governed by the ip_forward system parameter in Docker

Many using Docker will want ip_forward to be on, to at least make communication possible between containers and the wider world.^[39]

Whether the host's iptables allow this particular connections^[45]

Docker will never make changes to your host's iptables rules if you set --iptables=false when the daemon starts. Otherwise the Docker server will append forwarding rules to the DOCKER filter chain.

Controlling access to a container's service is controlled with rules associated with the host's netfilter framework, in both the NAT and filter tables. A Docker host makes significant use of netfilter rules to aid NAT, and to control access to the containers it hosts.^[44]

Netfilter offers various functions and operations for packet filtering, network address translation, and port translation, which provide the functionality required for directing packets through a network, as well as for providing ability to prohibit packets from reaching sensitive locations within a computer network.

Bridge (veth Interface)

The default network mode in Docker is bridge. To create a virtual subnet shared between the host machine and every container in bridge mode, Docker bind every veth* interface to the docker0 bridge.

To show information on the bridge and its attached ports (or interfaces), you do:

# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.56847afe9799 no veth33957e0
veth6cee79b

To show veth interfaces on a host, you do:

# ip link list
3: docker0: mtu 9000 qdisc noqueue state UP link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff
11: veth33957e0: mtu 9000 qdisc noqueue master docker0 state UP link/ether 3e:01:d1:0f:24:b8 brd ff:ff:ff:ff:ff:ff
13: veth6cee79b: mtu 9000 qdisc noqueue master docker0 state UP link/ether fa:aa:84:15:82:5a brd ff:ff:ff:ff:ff:ff

Note that there are two containers on the host, hence two veth interfaces were shown. Those virtual interfaces work in pairs:

eth0 in the container

Will have an IPv4 address
For all purposes, it looks like a normal interface.

veth interface in the host

Won't have an IPv4 address

Those two interfaces are connected together: any packet sent on an interface will appear as being received by the other. You can imagine that they are connected by a cross-over cable, if that helps.

DNS Configuration

How can Docker supply each container with a hostname and DNS configuration, without having to build a custom image with the hostname written inside? Its trick is to overlay three crucial /etc files inside the container with virtual files where it can write fresh information. You can see this by running mount inside a container:^[29]

# mount
/dev/mapper/vg--docker-dockerVolume on /etc/resolv.conf type btrfs ...
/dev/mapper/vg--docker-dockerVolume on /etc/hostname type btrfs ...
/dev/mapper/vg--docker-dockerVolume on /etc/hosts type btrfs ...

This arrangement allows Docker to do clever things like keep resolv.conf up to date across all containers when the host machine receives new configuration over DHCP later.

With DHCP, computers request IP addresses and networking parameters automatically from a DHCP server, reducing the need for a network administrator or a user to configure these settings manually. For resource constrained routers and firewalls, dnsmasq is often used for its small-footprint. Dnsmasq provides network infrastructure for small networks: DNS, DHCP, router advertisement and network boot.

References

tcp_base_mss, tcp_mtu_probing, etc

Inside a Docker container, the correct way to check its memory size is not using regular Linux commands such as:

top
free

In this article, we will review how to discover runtime metrics inside a container and focus specifically on memory statistics. Note that the docker version used in this discussion is 1.6.1.

Misleading Metrics

Using either "top" or "free" command, it will report the memory size of 7 GiB instead of 2 GiB (the correct answer) for our container. Those commands don't know the existence of container and hence report the memory metrics of its host only.

Docker Stats API

One way to find out the correct memory statistics is to use the docker sub-command:

stats

For example, you can type:

# docker ps
CONTAINER ID
66f4084c6a36

#docker stats 66f4084c6a36
CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
66f4084c6a36 0.05% 257.1 MiB/2 GiB 12.55% 198.5 KiB/2.008 MiB

From the above, we can find the max memory size of container is 2 GiB. Note that the information returned by "top" or "free" command is retrieved from /proc/meminfo:

# cat /proc/meminfo
MemTotal: 7397060 kB

cgroups (or Control Groups)

As described in [3], Docker containers are built on top of cgroups. For cgroups, runtime metrics are exposed through:

Newer builds

Control groups are exposed through a pseudo-filesystem named^[4]

/sys/fs/cgroup

/sys/fs/cgroup/memory/docker/

Older builds

The control groups might be mounted on /cgroup, without distinct hierarchies.
To figure out where your control groups are mounted, you can run:

$ grep cgroup /proc/mounts

/cgroup/memory/docker/

In either newer or older build, you need to first fetch the long-form container ID by typing:

# docker ps --no-trunc
CONTAINER ID
66f4084c6a3683cc2f41242e4d58a6381072ba64f41ce2e94b75c82099acd732

In our system, we can find memory runtime metrics under the folder:

/cgroup/memory/docker/66f4084c6a3683cc2f41242e4d58a6381072ba64f41ce2e94b75c82099acd732

For example, relevant memory runtime metrics can be found as follows:
# cat memory.stat
cache 84217856
rss 186290176
mapped_file 14630912
swap 0
pgpgin 83557
pgpgout 17515
pgfault 78655
pgmajfault 43
inactive_anon 0
active_anon 186290176
inactive_file 68890624
active_file 15327232
unevictable 0
hierarchical_memory_limit 2147483648
hierarchical_memsw_limit 4294967296
total_cache 84217856
total_rss 186290176

In summary, cgroups allow Docker to

Group processes and manage their aggregate resource consumption
Share available hardware resources to containers
Limit the memory and CPU consumption of containers

A container can be resized by simply changing the limits of its corresponding cgroup.
You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup.

Provide a reliable way of terminating all processes inside a container.

To learn more about cgroups, you can read [5].

Wednesday, February 17, 2016

Docker Container Networks: All Things Considered

Docker Container Networks

Network Modes

Packet Forwarding and Filtering

Bridge (veth Interface)

DNS Configuration

References

Monday, February 15, 2016

Docker Container: How to Check Memory Size

Misleading Metrics

Docker Stats API

cgroups (or Control Groups)

References