Xml and More: Docker

Showing posts with label Docker. Show all posts

Tuesday, April 19, 2016

Docker: Btrfs Storage in Practice

One of the ways Docker makes containerization so easy is by managing an overlay-style filesystem, allowing containers and images to incrementally change the filesystem layout of the image without requiring large copies of multiple images kicking around. This is a copy-on-write approach: parent layers are held read-only, and changes are reflected in the working layer.

The Docker has support for different image/container layer storage drivers:^[7]

aufs
btrfs
devicemapper
overlay
vfs
zfs

Your choice of storage driver can affect the performance of your containerized applications. So it’s important to understand the different storage driver options available and select the right one for your application.

In this article, we will focus only on btrfs (B-tree file system) storage.

Storage Driver and Backing Filesystem

To begin with, let's study different storage-related entities that form a container:

Images

Is a tagged hierarchy of read-only layers plus some metadata
docker images command can be used to list all images and report their virtual sizes

Image layers

Each successive layer (with a UUID tag) builds on top of the layer below it
Reuse layers can improve image build time^[8]

Each Dockerfile instruction generates a new layer
You should put instructions least likely to change at the top of your Dockerfile to reuse layers as much as possible and try to make changes only at the bottom of your Dockerfile.

Image layers can be shared among images
Docker limits the number of layers to 127

Layers don’t come for free, depending on storage driver used there are some penalties to pay

For example, in AUFS, each layer can introduce latency to container write performance on the first write to each file existing in the image layers stack, especially if the file is big and exists below many image layers.

docker history command can be used to list all layers of an image

Storage drivers

Docker has a pluggable storage driver architecture.

This gives you the flexibility to “plug in” the storage driver that is best for your environment and use-case.

Each Docker storage driver is based on a Linux filesystem or volume manager
The Docker daemon can only run one storage driver, and all containers created by that daemon instance use the same storage driver.
Each storage driver is free to implement the management of image layers and the container layer in its own unique way.

This means some storage drivers perform better than others in different circumstances.
See [7] to learn more on which storage driver you should choose

Storage Driver and Backing Filesystem

Which storage driver you use, in part, depends on the backing filesystem you plan to use for your Docker host’s local storage area. Some storage drivers can operate on top of different backing filesystems. However, other storage drivers require the backing filesystem to be the same as the storage driver. For example, the btrfs storage driver requires a btrfs backing filesystem.

The following table lists each storage driver and whether it must match the host’s backing file system or not:

|Storage driver |Must match backing filesystem |
|---------------|------------------------------|
|overlay        |No                            |
|aufs           |No                            |
|btrfs          |Yes                           |
|devicemapper   |No                            |
|vfs*           |No                            |
|zfs            |Yes                           |

The btrfs Backend

The backing filesystem refers to the filesystem that was used to create the Docker host’s local storage area under /var/lib/docker. The brtfs (B-tree file system) backend requires /var/lib/docker to be on a btrfs filesystem and uses the filesystem level snapshotting to implement layers.

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdb4       11G  7.1G  2.5G  75% /var/lib/docker
<snipped>

# mount -l
/dev/xvdb4 on /var/lib/docker type btrfs (rw)
<snipped>

You can find the layers of the images in the folder /var/lib/docker/btrfs/subvolumes. Each layer is stored as a btrfs subvolume inside the folder and start out as a snapshot of the parent subvolume (if any).

The btrfs driver is very fast for docker build - but like devicemapper does not share executable memory between devices. Mounting /var/lib/docker on a different filesystem than the rest of your system is recommended in order to limit the impact of filesystem corruption.

You can set the storage driver by passing the --storage-driver= option to the docker command line, or by setting the option on the DOCKER_OPTS line in the /etc/default/docker file. For example, to set the btrfs storage driver, do:

# docker -d -s btrfs -g /mnt/btrfs_partition ...

  -s, --storage-driver=""  Storage driver to use
  -g, --graph=""           Path to use as the root of the Docker runtime.  
                             Default is /var/lib/docker.

To verify if btrfs storage driver is used in your docker container, do:

# docker info
Containers: 1
Images: 19
Storage Driver: btrfs
...

References

Btrfs
LVM dangers and caveats
20 Linux Server Hardening Security Tips
ZFS Vs. BTRFS
How to Use Different Docker Filesystem Backends
Daemon storage-driver option

Storage driver options

Select a storage driver
Optimizing Docker images for image size and build time
Docker Filesystems: Understanding the btrfs Backend (Xml and More)
Oracle WebLogic Server on Docker Containers (white paper)
WebLogic on Docker (GitHub)

Sample Docker configurations to facilitate installation, configuration, and environment setup for DevOps users. This project includes quick start dockerfiles and samples for both WebLogic 12.1.3 and 12.2.1 based on Oracle Linux and Oracle JDK 8 (Server).

Wednesday, February 17, 2016

Docker Container Networks: All Things Considered

Docker container networks have to achieve two seemingly conflicting goals:

Provides complete isolation for containers on a host
Provides the service that is running inside the container, not only to other co-located containers, but also to remote hosts.

In this article, we will review how Docker container networks achieve its goals.

Docker Container Networks

To provide service that is running inside the container in a secured matter, it is important to have control over the networks your applications running on. To see how container networks achieve that, we will examine the container networks from the following perspectives:

Network modes

Default networks vs user-defined networks

Packet Forwarding and Filtering (Netfilter)

Port mappings

Bridge (veth Interface)
DNS Configuration

To enable a service consumer to communicate with the service providing containers, Docker needs to configure the following entities:

IP address

Providing ways to configure any of the containers network interfaces to support services on different containers

Port

Providing ways to expose and publish a port on the container (also, mapping it to a port on the host)

Rules

Controlling access to a container's service via rules associated with the host's netfilter framework, in both the NAT and filter tables (see diagram 1, 2, 3).

Network Modes

When you install Docker, it creates three networks automatically:^[34,37]

bridge

Represents the docker0 (a virtual ethernet bridge) network present in all Docker installations.

Each container's network interface is attached to the bridge, and network address translation (NAT) is used when containers need to make themselves visible to the Docker host and beyond.

Unless you specify otherwise with the docker run --net=option, the Docker daemon connects containers to this network by default.
Docker does not support automatic service discovery on the default bridge network.
Supports the use of port mapping and docker run --link to allow communications between containers in the docker0 network.

host

Adds a container on the hosts network stack. You’ll find the network configuration inside the container is identical to the host.

Because containers deployed in host mode share the same host network stack, you can’t use the same IP address for the same service on different containers on the same host.
In this mode, you don't get port mapping anymore.

none

Tells docker to put the container in its own network stack but not to do configure any of the containers network interfaces.
This allows for you to create custom network configuration

All these network modes applied at the container level. So you can certainly have a mix of different network modes on the same docker host.

Default Networks vs User-Defined Networks

Besides default networks, you can create your own user-defined networks that better isolate containers. Docker provides some default network drivers for creating these networks. The easiest user-defined network to create is a bridge network. This network is similar to the historical, default docker0 network. After you create the network, you can launch containers on it using the docker run --net= option. Within a user-defined bridge network, linking is not supported. You can expose and publish container ports on containers in this network. This is useful if you want to make a portion of the bridge network available to an outside network.

You can read [34, 37] for more details.

Packet Forwarding and Filtering

Whether a container can talk to the world is governed by two factors.

Whether the host machine is forwarding its IP packets

In order for a remote host to consume a container's service, the Docker host must act like a router, forwarding traffic to the network associated with the ethernet bridge.
IP packet forwarding is governed by the ip_forward system parameter in Docker

Many using Docker will want ip_forward to be on, to at least make communication possible between containers and the wider world.^[39]

Whether the host's iptables allow this particular connections^[45]

Docker will never make changes to your host's iptables rules if you set --iptables=false when the daemon starts. Otherwise the Docker server will append forwarding rules to the DOCKER filter chain.

Controlling access to a container's service is controlled with rules associated with the host's netfilter framework, in both the NAT and filter tables. A Docker host makes significant use of netfilter rules to aid NAT, and to control access to the containers it hosts.^[44]

Netfilter offers various functions and operations for packet filtering, network address translation, and port translation, which provide the functionality required for directing packets through a network, as well as for providing ability to prohibit packets from reaching sensitive locations within a computer network.

Bridge (veth Interface)

The default network mode in Docker is bridge. To create a virtual subnet shared between the host machine and every container in bridge mode, Docker bind every veth* interface to the docker0 bridge.

To show information on the bridge and its attached ports (or interfaces), you do:

# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.56847afe9799 no veth33957e0
veth6cee79b

To show veth interfaces on a host, you do:

# ip link list
3: docker0: mtu 9000 qdisc noqueue state UP link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff
11: veth33957e0: mtu 9000 qdisc noqueue master docker0 state UP link/ether 3e:01:d1:0f:24:b8 brd ff:ff:ff:ff:ff:ff
13: veth6cee79b: mtu 9000 qdisc noqueue master docker0 state UP link/ether fa:aa:84:15:82:5a brd ff:ff:ff:ff:ff:ff

Note that there are two containers on the host, hence two veth interfaces were shown. Those virtual interfaces work in pairs:

eth0 in the container

Will have an IPv4 address
For all purposes, it looks like a normal interface.

veth interface in the host

Won't have an IPv4 address

Those two interfaces are connected together: any packet sent on an interface will appear as being received by the other. You can imagine that they are connected by a cross-over cable, if that helps.

DNS Configuration

How can Docker supply each container with a hostname and DNS configuration, without having to build a custom image with the hostname written inside? Its trick is to overlay three crucial /etc files inside the container with virtual files where it can write fresh information. You can see this by running mount inside a container:^[29]

# mount
/dev/mapper/vg--docker-dockerVolume on /etc/resolv.conf type btrfs ...
/dev/mapper/vg--docker-dockerVolume on /etc/hostname type btrfs ...
/dev/mapper/vg--docker-dockerVolume on /etc/hosts type btrfs ...

This arrangement allows Docker to do clever things like keep resolv.conf up to date across all containers when the host machine receives new configuration over DHCP later.

With DHCP, computers request IP addresses and networking parameters automatically from a DHCP server, reducing the need for a network administrator or a user to configure these settings manually. For resource constrained routers and firewalls, dnsmasq is often used for its small-footprint. Dnsmasq provides network infrastructure for small networks: DNS, DHCP, router advertisement and network boot.

References

tcp_base_mss, tcp_mtu_probing, etc

Monday, February 15, 2016

Docker Container: How to Check Memory Size

Inside a Docker container, the correct way to check its memory size is not using regular Linux commands such as:

top
free

In this article, we will review how to discover runtime metrics inside a container and focus specifically on memory statistics. Note that the docker version used in this discussion is 1.6.1.

Misleading Metrics

Using either "top" or "free" command, it will report the memory size of 7 GiB instead of 2 GiB (the correct answer) for our container. Those commands don't know the existence of container and hence report the memory metrics of its host only.

Docker Stats API

One way to find out the correct memory statistics is to use the docker sub-command:

stats

For example, you can type:

# docker ps
CONTAINER ID
66f4084c6a36

#docker stats 66f4084c6a36
CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
66f4084c6a36 0.05% 257.1 MiB/2 GiB 12.55% 198.5 KiB/2.008 MiB

From the above, we can find the max memory size of container is 2 GiB. Note that the information returned by "top" or "free" command is retrieved from /proc/meminfo:

# cat /proc/meminfo
MemTotal: 7397060 kB

cgroups (or Control Groups)

As described in [3], Docker containers are built on top of cgroups. For cgroups, runtime metrics are exposed through:

Newer builds

Control groups are exposed through a pseudo-filesystem named^[4]

/sys/fs/cgroup

/sys/fs/cgroup/memory/docker/

Older builds

The control groups might be mounted on /cgroup, without distinct hierarchies.
To figure out where your control groups are mounted, you can run:

$ grep cgroup /proc/mounts

/cgroup/memory/docker/

In either newer or older build, you need to first fetch the long-form container ID by typing:

# docker ps --no-trunc
CONTAINER ID
66f4084c6a3683cc2f41242e4d58a6381072ba64f41ce2e94b75c82099acd732

In our system, we can find memory runtime metrics under the folder:

/cgroup/memory/docker/66f4084c6a3683cc2f41242e4d58a6381072ba64f41ce2e94b75c82099acd732

For example, relevant memory runtime metrics can be found as follows:
# cat memory.stat
cache 84217856
rss 186290176
mapped_file 14630912
swap 0
pgpgin 83557
pgpgout 17515
pgfault 78655
pgmajfault 43
inactive_anon 0
active_anon 186290176
inactive_file 68890624
active_file 15327232
unevictable 0
hierarchical_memory_limit 2147483648
hierarchical_memsw_limit 4294967296
total_cache 84217856
total_rss 186290176

In summary, cgroups allow Docker to

Group processes and manage their aggregate resource consumption
Share available hardware resources to containers
Limit the memory and CPU consumption of containers

A container can be resized by simply changing the limits of its corresponding cgroup.
You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup.

Provide a reliable way of terminating all processes inside a container.

To learn more about cgroups, you can read [5].

References

Monday, January 11, 2016

Docker Filesystems: Understanding the btrfs Backend

The basis of the filesystem use in Docker is the storage backend abstraction.^[14] A storage backend allows you to store a set of layers each addressed by a unique name.

Various storage backends are supported in Docker filesystems:^[1]

vfs backend
devicemapper backend^[21]
btrfs backend
aufs backend

In this article, we will discuss docker filesystems in general and btrfs backend in specific.

Images and Containers

A core part of the Docker model is the efficient use of layered images and containers:

Images

Each Docker image on the system is stored as a layer, with the parent being the layer of the parent image.

To create such an image a new layer is created (based on the right parent) and then the changes in that image are applied to the newly mounted filesystem.

Docker images have intermediate layers that increase reusability, decrease disk usage, and speed up docker build by allowing each step to be cached. These intermediate layers are not shown by default in the "docker images" command.

Each layer is a filesystem tree that can be mounted^[2] when needed and modified. New layers can be started from scratch, but they can also be created with a specified parent.

Containers

Docker containers are isolated mini Linux environments built from Docker images, base images with zero or more filesystem layers on top of them.

As shown below, there are 1 container and 118 images in this docker installation. In its storage backend, btrfs is the configured storage driver,^[13] which will be the focus of this article:

# docker info
Containers: 1
Images: 118
Storage Driver: btrfs

To retrieve low-level information on a container or image, you can use "docker inspect" command which takes a required ID argument (either container's or image's). You can use "docker ps" to find the ID of a specific container or use "docker images" to list the IDs of all images.

Base Image

Base images are typically minimal operating system images and the layers on top of them are added by developers to create convenience images (such as an image which already has Java SE installed and configured) for direct use or for use as building blocks.

Each container is related to a top image which is built up from layers of images starting from a base image.

To find the top image associated with a container, type:

# docker inspect --format "{{ .Image }}" ce483e532466
eca6affff525415c7e2199f1e8b2222ffce31d4bcf4a0cd05a48807d2c1f7647

To find the layers of images that a container is built up from, type:

# docker history eca6affff525415c7e2199f1e8b2222ffce31d4bcf4a0cd05a48807d2c1f7647
IMAGE               CREATED             CREATED BY                                      SIZE
eca6affff525        4 days ago          /bin/sh -c #(nop) WORKDIR /u01/app              0 B
ccf8bd04df89        4 days ago          /bin/sh -c #(nop) ENV APP_HOME=/u01/app/        0 B
c91b83e8c828        4 days ago          /bin/sh -c #(nop) USER [apaas]                  0 B
ab06ea65ece3        4 days ago          /bin/sh -c chown -R apaas:apaas /u01/           9.146 MB
2354b0ad9541        4 days ago          /bin/sh -c #(nop) ADD dir:ff4334d8629caee02b1   9.144 MB
246fb66aa39e        4 days ago          /bin/sh -c chmod -R +x /u01/scripts/            1.383 kB
21c5ddd9b74c        4 days ago          /bin/sh -c #(nop) COPY dir:17f42381efa361f6c6   1.383 kB
c347b96af5be        4 days ago          /bin/sh -c mkdir -p /u01/scripts /u01/logs      0 B
00c1fc450430        4 days ago          /bin/sh -c #(nop) USER [root]                   0 B
f52b843cf97e        7 weeks ago         /bin/sh -c mv java java.orig && chmod +x ./ja   7.718 kB
7c6d6279239c        7 weeks ago         /bin/sh -c #(nop) USER [apaas]                  0 B
5c6ad3a0ad33        7 weeks ago         /bin/sh -c mkdir -p /u01/logs && chown -R apa   306.5 MB
0100a4922bfb        7 weeks ago         /bin/sh -c #(nop) WORKDIR /u01/jdk/jdk1.7.0_9   0 B
4983e8502db6        7 weeks ago         /bin/sh -c #(nop) ADD file:3511bd6019a189ef28   226 B
266e209d77d3        7 weeks ago         /bin/sh -c #(nop) ENV PATH=/u01/jdk/jdk1.7.0_   0 B
c00eef371809        7 weeks ago         /bin/sh -c #(nop) ENV JAVA_HOME=/u01/jdk/jdk1   0 B
db5d61324db8        7 weeks ago         /bin/sh -c #(nop) ADD file:babe1a2cf183ba22e4   306.5 MB
10287b34527b        5 months ago        /bin/sh -c groupadd apaas && useradd -g apaas   296.1 kB
035a8c863461        5 months ago        /bin/sh -c mkdir -p /u01/jdk/ && mkdir -p /u0   0 B
a555d44630e2        10 months ago       /bin/sh -c #(nop) CMD [/bin/bash]               0 B
23a9eb33093d        10 months ago       /bin/sh -c #(nop) ADD file:33b9447cdbd58ef81b   195.1 MB
7258693d533e        10 months ago       /bin/sh -c #(no/p) MAINTAINER Oracle Linux Pro   0 B

Note that "eca6affff525" is the top image which is built on top of "ccf8bd04df89" and so on. The base image is "7258693d533e", which doesn't have a parent. For example, if you display the parent of the base image, it displays nothing (i.e., no patent):

# docker inspect --format "{{ .Parent }}" 7258693d533e
<blank>

The btrfs Backend

The brtfs backend requires /var/lib/docker to be on a btrfs filesystem and uses the filesystem level snapshotting to implement layers.

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvdb4 11G 7.1G 2.5G 75% /var/lib/docker
# mount -l
/dev/xvdb4 on /var/lib/docker type btrfs (rw)
<snipped>

You can find the layers of the images in the folder /var/lib/docker/btrfs/subvolumes. Each layer is stored as a btrfs subvolume inside the folder and start out as a snapshot of the parent subvolume (if any).

This backend is pretty fast. Mounting /var/lib/docker on a different filesystem than the rest of your system is recommended in order to limit the impact of filesystem corruption.

Image Cleanup

One of the purposes of learning Docker filesystems and storage backends is to assure you know what you are doing before cleaning up unwanted images.^[15,16]

For example, before removing an image, no containers can be using it (running or stopped). After you've assured that, these commands can cleanup untagged images (see also filtering) or all images.

# batch cleanup untagged images

docker rmi $(docker images -q -f "dangling=true")

# remove all images by id

docker rmi $(docker images -aq)

References

Supported Filesystems (Docker)
Concept of Mounting

The concept of mounting allows programs to be agnostic about where your data is structured.
From an application (or user) point of view, the file system is one tree. Under the hood, the file system structure can be on a single partition, but also on a dozen partitions, network storage, removable media and more.

Displaying Physical Volumes (Redhat)
Docker - How to analyze a container's disk usage? (good)
Finding all storage devices attached to a Linux machine
/dev/dm-1 (block device)

dev/dm-1 is for "device mapper n.1". Basically, it is a logical unit carved out using the kernel embedded device mapper layer. From a userspace application point of view, it is a RAW block device.

Linux file system
Docker images command
Docker cp command

You can copy to or from either a running or stopped container.
Behavior is similar to the common Unix utility cp -a in that

directories are copied recursively with permissions preserved if possible.
Ownership is set to the user and primary group on the receiving end of the transfer. For example,

Files copied to a container will be created with UID:GID of the root user.
Files copied to the local machine will be created with the UID:GID of the user which invoked the docker cp command.

It is not possible to copy certain system files such as resources under /proc,/sys, /dev, and mounts created by the user in the container.

Understanding Volumes in Docker (good)
Docker Volume Manager
Docker Quicksheet
Storage Driver (Docker)

A storage driver is how docker implements a particular union file system.
Keeping with are “batteries included, but replaceable” philosophy, Docker supports a number of different union file systems.

For instance, Ubuntu’s default storage driver is AUFS, where for Red Hat and Centos it’s Device Mapper.

Docker Images

Docker images are stored as series of read-only layers.
When we start a container, Docker takes the read-only image and adds a read-write layer on top.
If the running container modifies an existing file, the file is copied out of the underlying read-only layer and into the top-most read-write layer where the changes are applied.

The version in the read-write layer hides the underlying file, but does not destroy it — it still exists in the underlying image.

When a Docker container is deleted, relaunching the image will start a fresh container without any of the changes made in the previously running container — those changes are lost.
Docker calls this combination of read-only layers with a read-write layer on top a Union File System.

Why is docker image eating up my disk space that is not used by docker
Docker error : no space left on device
docker ps -s

-s, --size=false Display total file sizes

Advanced Docker Volumes
Resizing Docker containers with the Device Mapper plugin
Question on Resource Limits? (Docker)
devicemapper - a storage backend based on Device Mapper
Docker: Btrfs Storage in Practice (Xml and More)

Saturday, November 7, 2015

Security and Isolation Implementation in Docker Containers

Multitenancy is regarded an important feature of cloud computing. If we consider applications running on a container a tenant, the goal of good security-and-isolation design is to ensure tenants running on a host only use resources visible to them.

As container technology evolves, its implementation of security, isolation and resource control has been continually improved. In this article, we will review how Docker container achieves its security and isolation utilizing native container features of Linux such as namespaces, cgroups, capabilities, etc.

Virtualization and Isolation

Operating system-level virtualization, containers, zones, or even "chroot with steroids" are names that define the same concept of user-space isolation. Product such as Docker makes use of user-space isolation on top of OS-level vitualization facilities to provide extra security.

Since version 0.9, Docker includes the libcontainer library as its own way to directly use virtualization facilities provided by the Linux kernel, in addition to using abstracted virtualization interfaces via LXC, ^[1] systemd-nspawn^[2], and libvert,^[3]

These virtualization libraries all utilize native container features of Linux (see Diagram above):

namespaces
cgroups
capabilities

and more. Docker combines these components into a wrapper which it calls a container format.

libcontainer

The default container format is called libcontainer. Docker also supports traditional Linux containers using LXC. In the future, Docker may support other container formats, for example, by integrating with BSD Jails or Solaris Zones.

Execution driver is the implementation of a specific container format and used for running docker containers. In the latest release, libcontainer

Is the default execution driver for running docker containers
Is shipped alongside the LXC driver
Is a pure Go library which is developed to access the kernel’s container APIs directly, without any other dependencies

Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules – all in a consistent and predictable way, and without depending on LXC or any other userland package.^[6]
You provide a root filesystem and a configuration on how libcontainer is supposed to execute a container and it does the rest.
It allows spawning new containers or attaching to an existing container.
In fact, libcontainer delivered much needed stability that the team had decided to make it the default.

As of Docker 0.9, LXC is now optional

Note that LXC driver will continue to be supported going forward.

To switch back to the LXC driver, simply restart the Docker daemon with

docker -d -e lxc

namespaces

Docker isn't virtualization, as such – instead, it's an abstraction on top of the kernel's support for namespaces, which provides the isolated workspace (or containter). When you run a container, Docker creates a set of namespaces for that container.

Some of the namespaces that Docker uses on Linux are:

pid namespace

Used for process isolation (PID: Process ID).
Processes running inside the container appear to be running on a normal Linux system although they are sharing the underlying kernel with processes located in other namespaces.

net namespace

Used for managing network interfaces (NET: Networking).
DNAT allows you to configure your guest's networking independently of your host's and have a convenient interface for forwarding only the ports you want between them.

However, you can replace this with a bridge to a physical interface.

ipc namespace

Used for managing access to IPC resources (IPC: InterProcess Communication).

mnt namespace

Used for managing mount-points (MNT: Mount).

uts namespace

Used for isolating kernel and version identifiers. (UTS: Unix Timesharing System).

These isolation benefits naturally come with costs. Based on your network access patterns, your memory constraints, you may choose how to configure namespaces for your containers with Docker.

cgroups (or Control Groups)

Docker on Linux makes use of another technology called cgroups. Because each VM is a process, all normal Linux resource management facilities such as scheduling and cgroups apply to VMs. Furthermore, there is only one level of resource allocation and scheduling because a containerized Linux system only has one kernel and the kernel has full visibility into the containers.

In summary, cgroups allow Docker to

Group processes and manage their aggregate resource consumption
Share available hardware resources to containers
Limit the memory and CPU consumption of containers

A container can be resized by simply changing the limits of its corresponding cgroup.
You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup.

Provide a reliable way of terminating all processes inside a container.

Capabilities^[20]

"POSIX capabilities" is what Linux uses.^[9] These capabilities are a partitioning of the all powerful root privilege into a set of distinct privileges. You can see a full list of available capabilities in Linux manpages. Docker drops all capabilities except those needed, a whitelist instead of a blacklist approach.

Your average server (bare metal or virtual machine) needs to run a bunch of processes as root. Those typically include:

SSH
cron
syslogd
Hardware management tools (e.g., load modules)
Network configuration tools (e.g., to handle DHCP, WPA, or VPNs),

and much more.

A container is very different, because almost all of those tasks are handled by the infrastructure around the container. By default, Docker starts containers with a restricted set of capabilities. In most cases, containers will not need “real” root privileges at all. For example, processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the CAP_NET_BIND_SERVICE instead. And therefore, containers can run with a reduced capability set; meaning that “root” within a container has much less privileges than the real “root”.

Capabilities are just one of the many security features provided by modern Linux kernels. To harden a Docker host, you can also leverage other existing, well-known systems like

TOMOYO
AppArmor
SELinux
GRSEC, etc.

If your distribution comes with security model templates for Docker containers, you can use them out of the box. For instance, Docker ships a template that works with AppArmor and Red Hat comes with SELinux policies for Docker.

Photo Credit

Docker Blog

References

LXC—Linux containers.
Control Centre: The systemd Linux init system
The virtualization API: libvirt
Solomon Hykes and others. What is Docker?
How is Docker different from a normal virtual machine? (Stackoverflow)
Docker 0.9: introducing execution drivers and libcontainer

Uses layered filesystems AuFS.

Is there a formula for calculating the overhead of a Docker contain er ?
An Updated Performance Comparison of Virtual Machinesand Linux Containers
capabilities(7) - Linux man page
Netlink (Wikipedia)
The lost packages of docker
ebtables/iptables interaction on a Linux-based bridge
Comparsion Between AppArmor and Selinux
The docker-proxy (netfilter)
Hardware isolation
Understand the architecture (docker)
Linux kernel capabilities FAQ
Docker: Differences between Container and Full VM (Xml and More)
Docker vs VMs

There is one key metric where Docker Containers are weaker than Virtual Machines, and that’s “Isolation”. Intel’s VT-d and VT- x technologies have provided Virtual Machines with ring-1 hardware isolation of which, it takes full advantage. It helps Virtual Machines from breaking down and interfering with each other.

Docker Security
Introduction to Control Groups (Cgroups)
Docker Runtime Metrics

Control groups are exposed through a pseudo-filesystem. In recent distros, you should find this filesystem under /sys/fs/cgroup.
On older systems, the control groups might be mounted on /cgroup, without distinct hierarchies.

To figure out where your control groups are mounted, you can run:

$ grep cgroup /proc/mounts

Oracle WebLogic Server on Docker Containers (white paper)
WebLogic on Docker (GitHub)

Sample Docker configurations to facilitate installation, configuration, and environment setup for DevOps users. This project includes quick start dockerfiles and samples for both WebLogic 12.1.3 and 12.2.1 based on Oracle Linux and Oracle JDK 8 (Server).

Sunday, November 1, 2015

Docker: Differences between Container and Full VM

A virtual machine (VM) is an emulation of a particular computer system. Virtual machines operate based on the computer architecture and functions of a real or hypothetical computer, and their implementations may involve specialized hardware, software, or a combination of both.

In this article, we will examine the differences between a Docker Container and a Full VM (see Note 1).

Docker Container

Docker is a facility for creating encapsulated computer environments, each encapsulated computer environment is called a container.^[2,7]

Starting up a Docker container is lightning fast because:

Each container shares the host computer's copy of the kernel.

However, each with its own running copy of Linux

This means there's no hypervisor, and no extended bootup.

In contrast, Virtual Machines implantation in KVM, VirtualBox or VMware is different.

Terminology

Host OS vs Guest OS

Host OS

is the original OS installed on a computer

Guest OS

is installed in a virtual machine or disk partition in addition to the host or main OS

In a virtualization, a guest OS can be different from the host OS
In disk partitioning, a guest OS must be the same as the host OS

Hypervisor (or virtual machine monitor)

is a piece of computer software, firmware or hardware that creates and runs virtual machines.
A computer on which a hypervisor is running one or more virtual machines is defined as a host machine.
Each virtual machine is called a guest machine.

Docker Container

A encapsulated computer environment created by Docker
Docker on Linux platforms

Building on top of facilities provided by the Linux kernel (primarily cgroups and namespaces)
Unlike a virtual machine, does not require or include a separate operating system

Docker on non-Linux platforms

Uses a Linux virtual machine to run the containers.

Docker daemon

is the persistent process that manages containers.

Docker uses the same binary for both the daemon and client.

uses Linux-specific kernel features

Container vs Full VM

A full virtualized system gets its own set of resources allocated to it, and does minimal sharing. You get more isolation, but it is much heavier (requires more resources). With Docker container you get less isolation, but they are more lightweight and require less resources. So you could easily run 1000's on a host, and it doesn't even blink.^[1]

Basically, a Docker container (see Note 1) and a full VM have different fundamental goals

VM is to fully emulate a foreign environment

Hypervisor in a full VM implementation is required to translate commands between Guest OS and Host OS.
Each VM requires a full copy of the OS, the application being run and any supporting libraries
If you need to simultaneously run different operating systems (like Windows, OS/X or BSD), or run programs compiled for other operating systems: You need to do a full Virtual Machines implantation.

In contrast, the container OS (or, more accurately, the kernel) must be the same as the host OS and is shared between container and host (see Note 1).

Container is to make applications portable and self-contained

Each container shares the host computer's copy of the kernel.

This means there's no hypervisor and no extended bootup.

The container engine is responsible for starting and stopping containers in a similar way to the hypervisor on a VM.

However, processes running inside containers are equivalent to native processes on the host and do not incur the overheads associated with hypervisor execution.

Notes

In this article, we only focus on Docker implementations on Linux platforms. In other words, our discussions here exclude non-Linux platforms (i.e, Windows, Mac OS X, etc.).^[2]

Because the Docker daemon uses Linux-specific kernel features, you can’t run Docker natively in either Windows or Mac OS X.
Docker on non-Linux platforms uses a Linux virtual machine to run the containers.

Photo Credit

Docker Blog

Cross Column

Tuesday, April 19, 2016

Docker: Btrfs Storage in Practice

Storage Driver and Backing Filesystem

Storage Driver and Backing Filesystem

The btrfs Backend

References

Wednesday, February 17, 2016

Docker Container Networks: All Things Considered

Docker Container Networks

Network Modes

Packet Forwarding and Filtering

Bridge (veth Interface)

DNS Configuration

References

Monday, February 15, 2016

Docker Container: How to Check Memory Size

Misleading Metrics

Docker Stats API

cgroups (or Control Groups)

References

Monday, January 11, 2016

Docker Filesystems: Understanding the btrfs Backend

Images and Containers

Base Image

The btrfs Backend

Image Cleanup

References

Saturday, November 7, 2015

Security and Isolation Implementation in Docker Containers

Virtualization and Isolation

libcontainer

namespaces

cgroups (or Control Groups)

Capabilities[20]

Photo Credit

References

Sunday, November 1, 2015

Docker: Differences between Container and Full VM

Docker Container

Terminology

Container vs Full VM

Notes

Photo Credit

References

Capabilities^[20]