Cross Column

Showing posts with label Docker. Show all posts
Showing posts with label Docker. Show all posts

Tuesday, April 19, 2016

Docker: Btrfs Storage in Practice

One of the ways Docker makes containerization so easy is by managing an overlay-style filesystem, allowing containers and images to incrementally change the filesystem layout of the image without requiring large copies of multiple images kicking around. This is a copy-on-write approach: parent layers are held read-only, and changes are reflected in the working layer.

The Docker has support for different image/container layer storage drivers:[7]
  • aufs
  • btrfs
  • devicemapper
  • overlay
  • vfs
  • zfs 
Your choice of storage driver can affect the performance of your containerized applications. So it’s important to understand the different storage driver options available and select the right one for your application.

In this article, we will focus only on btrfs (B-tree file system) storage.

Storage Driver and Backing Filesystem


To begin with, let's study different storage-related entities that form a container:
  • Images
    • Is a tagged hierarchy of read-only layers plus some metadata
    • docker images command can be used to list all images and report their virtual sizes
  • Image layers
    • Each successive layer (with a UUID tag) builds on top of the layer below it
    • Reuse layers can improve image build time[8]
      • Each Dockerfile instruction generates a new layer
      • You should put instructions least likely to change at the top of your Dockerfile to reuse layers as much as possible and try to make changes only at the bottom of your Dockerfile.
    • Image layers can be shared among images
    • Docker limits the number of layers to 127
      • Layers don’t come for free, depending on storage driver used there are some penalties to pay
        • For example, in AUFS, each layer can introduce latency to container write performance on the first write to each file existing in the image layers stack, especially if the file is big and exists below many image layers.
    • docker history command can be used to list all layers of an image
  • Storage drivers
    • Docker has a pluggable storage driver architecture. 
      • This gives you the flexibility to “plug in” the storage driver that is best for your environment and use-case.
    • Each Docker storage driver is based on a Linux filesystem or volume manager
    • The Docker daemon can only run one storage driver, and all containers created by that daemon instance use the same storage driver.
    • Each storage driver is free to implement the management of image layers and the container layer in its own unique way. 
      • This means some storage drivers perform better than others in different circumstances.
      • See [7] to learn more on which storage driver you should choose

Storage Driver and Backing Filesystem


Which storage driver you use, in part, depends on the backing filesystem you plan to use for your Docker host’s local storage area. Some storage drivers can operate on top of different backing filesystems. However, other storage drivers require the backing filesystem to be the same as the storage driver. For example, the btrfs storage driver requires a btrfs backing filesystem. 

The following table lists each storage driver and whether it must match the host’s backing file system or not:

|Storage driver |Must match backing filesystem |
|---------------|------------------------------|
|overlay        |No                            |
|aufs           |No                            |
|btrfs          |Yes                           |
|devicemapper   |No                            |
|vfs*           |No                            |
|zfs            |Yes                           |

The btrfs Backend


The backing filesystem refers to the filesystem that was used to create the Docker host’s local storage area under /var/lib/docker.  The brtfs (B-tree file system) backend requires /var/lib/docker to be on a btrfs filesystem and uses the filesystem level snapshotting to implement layers.

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdb4       11G  7.1G  2.5G  75% /var/lib/docker
<snipped>

# mount -l
/dev/xvdb4 on /var/lib/docker type btrfs (rw)
<snipped>


You can find the layers of the images in the folder /var/lib/docker/btrfs/subvolumes.  Each layer is stored as a btrfs subvolume inside the folder  and start out as a snapshot of the parent subvolume (if any).

The btrfs driver is very fast for docker build - but like devicemapper does not share executable memory between devices. Mounting /var/lib/docker on a different filesystem than the rest of your system is recommended in order to limit the impact of filesystem corruption.

You can set the storage driver by passing the --storage-driver= option to the docker command line, or by setting the option on the DOCKER_OPTS line in the /etc/default/docker file.  For example, to set the btrfs storage driver, do:
# docker -d -s btrfs -g /mnt/btrfs_partition ...

  -s, --storage-driver=""  Storage driver to use
  -g, --graph=""           Path to use as the root of the Docker runtime.  
                             Default is /var/lib/docker.

To verify if btrfs storage driver is used in your docker container, do:
# docker info
Containers: 1
Images: 19
Storage Driver: btrfs
...

References

  1. Btrfs
  2. LVM dangers and caveats
  3. 20 Linux Server Hardening Security Tips
  4. ZFS Vs. BTRFS
  5. How to Use Different Docker Filesystem Backends
  6. Daemon storage-driver option
  7. Select a storage driver
  8. Optimizing Docker images for image size and build time
  9. Docker Filesystems: Understanding the btrfs Backend (Xml and More)
  10. Oracle WebLogic Server on Docker Containers (white paper)
  11. WebLogic on Docker (GitHub)
    • Sample Docker configurations to facilitate installation, configuration, and environment setup for DevOps users. This project includes quick start dockerfiles and samples for both WebLogic 12.1.3 and 12.2.1 based on Oracle Linux and Oracle JDK 8 (Server).

Wednesday, February 17, 2016

Docker Container Networks: All Things Considered

Docker container networks have to achieve two seemingly conflicting goals:
  1. Provides complete isolation for containers on a host
  2. Provides the service that is running inside the container, not only to other co-located containers, but also to remote hosts.
In this article, we will review how Docker container networks achieve its goals.

Docker Container Networks


To provide service that is running inside the container in a secured matter, it is important to have control over the networks your applications running on. To see how container networks achieve that, we will examine the container networks from the following perspectives:
  • Network modes
    • Default networks vs user-defined networks
  • Packet Forwarding and Filtering (Netfilter)
    • Port mappings
  • Bridge (veth Interface)
  • DNS Configuration
To enable a service consumer to communicate with the service providing containers, Docker needs to configure the following entities:
  • IP address
    • Providing ways to configure any of the containers network interfaces to support services on different containers
  • Port
    • Providing ways to expose and publish a port on the container (also, mapping it to a port on the host)
  • Rules
    • Controlling access to a container's service via rules associated with the host's netfilter framework, in both the NAT and filter tables (see diagram 123).

    Network Modes


    When you install Docker, it creates three networks automatically:[34,37]
    1. bridge
      • Represents the docker0 (a virtual ethernet bridge) network present in all Docker installations.
        • Each container's network interface is attached to the bridge, and network address translation (NAT) is used when containers need to make themselves visible to the Docker host and beyond.
      • Unless you specify otherwise with the docker run --net=option, the Docker daemon connects containers to this network by default.
      • Docker does not support automatic service discovery on the default bridge network.
      • Supports the use of port mapping and docker run --link to allow communications between containers in the docker0 network.
    2. host
      • Adds a container on the hosts network stack. You’ll find the network configuration inside the container is identical to the host.
        • Because containers deployed in host mode share the same host network stack, you can’t use the same IP address for the same service on different containers on the same host.
        • In this mode, you don't get port mapping anymore.
    3. none
      • Tells docker to put the container in its own network stack but not to do configure any of the containers network interfaces.
      • This allows for you to create custom network configuration
    All these network modes applied at the container level. So you can certainly have a mix of different network modes on the same docker host.

    Default Networks vs User-Defined Networks

    Besides default networks, you can create your own user-defined networks that better isolate containers. Docker provides some default network drivers for creating these networks. The easiest user-defined network to create is a bridge network. This network is similar to the historical, default docker0 network. After you create the network, you can launch containers on it using the docker run --net= option. Within a user-defined bridge network, linking is not supported. You can expose and publish container ports on containers in this network. This is useful if you want to make a portion of the bridge network available to an outside network.

    You can read [34, 37] for more details.

    Packet Forwarding and Filtering


    Whether a container can talk to the world is governed by two factors.
    1. Whether the host machine is forwarding its IP packets
      • In order for a remote host to consume a container's service, the Docker host must act like a router, forwarding traffic to the network associated with the ethernet bridge.
      • IP packet forwarding is governed by the ip_forward system parameter in Docker
        • Many using Docker will want ip_forward to be on, to at least make communication possible between containers and the wider world.[39]
    2. Whether the host's iptables allow this particular connections[45]
      • Docker will never make changes to your host's iptables rules if you set --iptables=false when the daemon starts. Otherwise the Docker server will append forwarding rules to the DOCKER filter chain.
    Controlling access to a container's service is controlled with rules associated with the host's netfilter framework, in both the NAT and filter tables. A Docker host makes significant use of netfilter rules to aid NAT, and to control access to the containers it hosts.[44]

    Netfilter offers various functions and operations for packet filtering, network address translation, and port translation, which provide the functionality required for directing packets through a network, as well as for providing ability to prohibit packets from reaching sensitive locations within a computer network.

    Bridge (veth Interface)


    The default network mode in Docker is bridge. To create a virtual subnet shared between the host machine and every container in bridge mode, Docker bind every veth* interface to the docker0 bridge.

    To show information on the bridge and its attached ports (or interfaces), you do:

    # brctl show
    bridge name bridge id         STP enabled interfaces
    docker0     8000.56847afe9799 no          veth33957e0
                                              veth6cee79b


    To show veth interfaces on a host, you do:

    # ip link list
    3: docker0: mtu 9000 qdisc noqueue state UP link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff
    11: veth33957e0: mtu 9000 qdisc noqueue master docker0 state UP link/ether 3e:01:d1:0f:24:b8 brd ff:ff:ff:ff:ff:ff
    13: veth6cee79b: mtu 9000 qdisc noqueue master docker0 state UP link/ether fa:aa:84:15:82:5a brd ff:ff:ff:ff:ff:ff


    Note that there are two containers on the host, hence two veth interfaces were shown. Those virtual interfaces work in pairs:
    • eth0 in the container 
      • Will have an IPv4 address 
      • For all purposes, it looks like a normal interface. 
    • veth interface in the host 
      • Won't have an IPv4 address
    Those two interfaces are connected together: any packet sent on an interface will appear as being received by the other. You can imagine that they are connected by a cross-over cable, if that helps.

    DNS Configuration


    How can Docker supply each container with a hostname and DNS configuration, without having to build a custom image with the hostname written inside? Its trick is to overlay three crucial /etc files inside the container with virtual files where it can write fresh information. You can see this by running mount inside a container:[29]

    # mount
    /dev/mapper/vg--docker-dockerVolume on /etc/resolv.conf type btrfs ...
    /dev/mapper/vg--docker-dockerVolume on /etc/hostname type btrfs ...
    /dev/mapper/vg--docker-dockerVolume on /etc/hosts type btrfs ...

    This arrangement allows Docker to do clever things like keep resolv.conf up to date across all containers when the host machine receives new configuration over DHCP later.

    With DHCP, computers request IP addresses and networking parameters automatically from a DHCP server, reducing the need for a network administrator or a user to configure these settings manually. For resource constrained routers and firewalls, dnsmasq is often used for its small-footprint. Dnsmasq provides network infrastructure for small networks: DNS, DHCP, router advertisement and network boot.

    References

    1. The TCP Maximum Segment Size and Related Topics
    2. Jumbo/Giant Frame Support on Catalyst Switches Configuration Example
    3. Ethernet Jumbo Frames\
    4. IP Fragmentation: How to Avoid It? (Xml and More)
    5. The Great Jumbo Frames Debate
    6. Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC
    7. Sites with Broken/Working PMTUD
    8. Path MTU Discovery
    9. TCP headers
    10. bad TCP checksums
    11. MSS performance consideration
    12. Understanding Routing Table
    13. route (Linux man page)
    14. Docker should set host-side veth MTU #4378
    15. Add MTU to lxc conf to make host and container MTU match
    16. Xen Networking
    17. TCP parameter settings (/proc/sys/net/ipv4)
    18. Change the MTU of a network interface
      • tcp_base_mss, tcp_mtu_probing, etc
    19. MTU manipulation
    20. Jumbo Frames, the gotcha's you need to know! (good)
    21. Understand container communication (Docker)
    22. calicoctl should allow configuration of veth MTU #488 - GitHub
    23. Linux MTU Change Size
    24. Changing the MTU size in Windows Vista, 7 or 8
    25. Linux Configure Jumbo Frames to Boost Network Performance
    26. Path MTU discovery in practice
    27. 10 iptables rules to help secure your Linux box
    28. An Updated Performance Comparison of Virtual Machinesand Linux Containers
    29. Network Configuration (Docker)
    30. Storage Concepts in Docker: Persistent Storage
    31. Xen org
    32. CLOUD ARCHITECTURES,NETWORKS, SERVICES, ANDMANAGEMENT
    33. Cloud Networking
    34. Docker Networking 101 – Host mode
    35. Configuring DNS (Docker)
    36. Configuring dnsmasq to serve my own domain name zone
    37. Understand Docker container networks (Docker)
    38. dnsmasq - A lightweight DHCP and caching DNS server.
    39. Understand Container Communitcation
    40. Linux: Check if in Same Network
    41. Packet flow in Netfilter and General Networking (diagram)
    42. How to Enable IP Forwarding in Linux
    43. Exposing a port on a live docker container
    44. The docker-proxy
    45. Linux Firewall Tutorial: IPTables Tables, Chains, Rules Fundamentals
    46. iptables (ipset.netfilter.org)
    47. How to find out capacity for network interfaces?
    48. Security Considerations: Enabling/Disabling Ping /Traceroute for Your Network (Xml and More)
    49. How to Read a Traceroute (good)

    Monday, February 15, 2016

    Docker Container: How to Check Memory Size

    Inside a Docker container, the correct way to check its memory size is not using regular Linux commands such as:

    In this article, we will review how to discover runtime metrics inside a container and focus specifically on memory statistics.  Note that the docker version used in this discussion is 1.6.1.

    Misleading Metrics


    Using either "top" or "free" command, it will report the memory size of 7 GiB instead of 2 GiB (the correct answer) for our container.  Those commands don't know the existence of container and hence report the memory metrics of its host only.

    Docker Stats API


    One way to find out the correct memory statistics is to use the docker sub-command:
    For example, you can type:

    # docker ps
    CONTAINER ID  
    66f4084c6a36  

    #docker stats 66f4084c6a36
    CONTAINER         CPU %       MEM USAGE/LIMIT    MEM %       NET I/O
    66f4084c6a36      0.05%       257.1 MiB/2 GiB    12.55%      198.5 KiB/2.008 MiB


    From the above, we can find the max memory size of container is 2 GiB.  Note that the information returned by "top" or "free" command is retrieved from /proc/meminfo:

    # cat /proc/meminfo
    MemTotal:        7397060 kB

    cgroups (or Control Groups)


    As described in [3], Docker containers are built on top of cgroups.  For cgroups, runtime metrics are exposed through:
    • Newer builds
      • Control groups are exposed through a pseudo-filesystem named[4]
        • /sys/fs/cgroup
          • /sys/fs/cgroup/memory/docker/
    • Older builds
      • The control groups might be mounted on /cgroup, without distinct hierarchies.
      • To figure out where your control groups are mounted, you can run:
        • $ grep cgroup /proc/mounts
          • /cgroup/memory/docker/
    In either newer or older build, you need to first fetch the long-form container ID by typing:

    # docker ps --no-trunc
    CONTAINER ID  
    66f4084c6a3683cc2f41242e4d58a6381072ba64f41ce2e94b75c82099acd732

    In our system, we can find memory runtime metrics under the folder:

    • /cgroup/memory/docker/66f4084c6a3683cc2f41242e4d58a6381072ba64f41ce2e94b75c82099acd732

    For example, relevant memory runtime metrics can be found as follows:
    # cat memory.stat
    cache 84217856
    rss 186290176
    mapped_file 14630912
    swap 0
    pgpgin 83557
    pgpgout 17515
    pgfault 78655
    pgmajfault 43
    inactive_anon 0
    active_anon 186290176
    inactive_file 68890624
    active_file 15327232
    unevictable 0
    hierarchical_memory_limit 2147483648
    hierarchical_memsw_limit 4294967296
    total_cache 84217856
    total_rss 186290176

    In summary, cgroups allow Docker to
    • Group processes and manage their aggregate resource consumption 
    • Share available hardware resources to containers 
    • Limit the memory and CPU consumption of containers 
      • A container can be resized by simply changing the limits of its corresponding cgroup
      • You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup
    • Provide a reliable way of terminating all processes inside a container.

    Monday, January 11, 2016

    Docker Filesystems: Understanding the btrfs Backend

    The basis of the filesystem use in Docker is the storage backend abstraction.[14] A storage backend allows you to store a set of layers each addressed by a unique name.

    Various storage backends are supported in Docker filesystems:[1]
    • vfs backend
    • devicemapper backend[21]
    • btrfs backend
    • aufs backend
    In this article, we will discuss docker filesystems in general and btrfs backend in specific.

    Images and Containers


    A core part of the Docker model is the efficient use of layered images and containers:
    • Images
      • Each Docker image on the system is stored as a layer, with the parent being the layer of the parent image. 
        • To create such an image a new layer is created (based on the right parent) and then the changes in that image are applied to the newly mounted filesystem.
      • Docker images have intermediate layers that increase reusability, decrease disk usage, and speed up docker build by allowing each step to be cached. These intermediate layers are not shown by default in the "docker images" command.
        • Each layer is a filesystem tree that can be mounted[2] when needed and modified. New layers can be started from scratch, but they can also be created with a specified parent.
    • Containers
      • Docker containers are isolated mini Linux environments built from Docker images, base images with zero or more filesystem layers on top of them.
    As shown below, there are 1 container and 118 images in this docker installation.  In its storage backend, btrfs is the configured storage driver,[13] which will be the focus of this article:

    # docker info
    Containers: 1
    Images: 118
    Storage Driver: btrfs

    To retrieve low-level information on a container or image, you can use "docker inspect" command which takes a required ID argument (either container's or image's).  You can use "docker ps" to find the ID of a specific container or use "docker images" to list the IDs of all images.

    Base Image


    Base images are typically minimal operating system images and the layers on top of them are added by developers to create convenience images (such as an image which already has Java SE installed and configured) for direct use or for use as building blocks.

    Each container is related to a top image which is built up from layers of images starting from a base image.

    To find the top image associated with a container, type:

    # docker inspect --format "{{ .Image }}" ce483e532466
    eca6affff525415c7e2199f1e8b2222ffce31d4bcf4a0cd05a48807d2c1f7647


    To find the layers of images that a container is built up from, type:

    # docker history eca6affff525415c7e2199f1e8b2222ffce31d4bcf4a0cd05a48807d2c1f7647
    IMAGE               CREATED             CREATED BY                                      SIZE
    eca6affff525        4 days ago          /bin/sh -c #(nop) WORKDIR /u01/app              0 B
    ccf8bd04df89        4 days ago          /bin/sh -c #(nop) ENV APP_HOME=/u01/app/        0 B
    c91b83e8c828        4 days ago          /bin/sh -c #(nop) USER [apaas]                  0 B
    ab06ea65ece3        4 days ago          /bin/sh -c chown -R apaas:apaas /u01/           9.146 MB
    2354b0ad9541        4 days ago          /bin/sh -c #(nop) ADD dir:ff4334d8629caee02b1   9.144 MB
    246fb66aa39e        4 days ago          /bin/sh -c chmod -R +x /u01/scripts/            1.383 kB
    21c5ddd9b74c        4 days ago          /bin/sh -c #(nop) COPY dir:17f42381efa361f6c6   1.383 kB
    c347b96af5be        4 days ago          /bin/sh -c mkdir -p /u01/scripts /u01/logs      0 B
    00c1fc450430        4 days ago          /bin/sh -c #(nop) USER [root]                   0 B
    f52b843cf97e        7 weeks ago         /bin/sh -c mv java java.orig && chmod +x ./ja   7.718 kB
    7c6d6279239c        7 weeks ago         /bin/sh -c #(nop) USER [apaas]                  0 B
    5c6ad3a0ad33        7 weeks ago         /bin/sh -c mkdir -p /u01/logs && chown -R apa   306.5 MB
    0100a4922bfb        7 weeks ago         /bin/sh -c #(nop) WORKDIR /u01/jdk/jdk1.7.0_9   0 B
    4983e8502db6        7 weeks ago         /bin/sh -c #(nop) ADD file:3511bd6019a189ef28   226 B
    266e209d77d3        7 weeks ago         /bin/sh -c #(nop) ENV PATH=/u01/jdk/jdk1.7.0_   0 B
    c00eef371809        7 weeks ago         /bin/sh -c #(nop) ENV JAVA_HOME=/u01/jdk/jdk1   0 B
    db5d61324db8        7 weeks ago         /bin/sh -c #(nop) ADD file:babe1a2cf183ba22e4   306.5 MB
    10287b34527b        5 months ago        /bin/sh -c groupadd apaas && useradd -g apaas   296.1 kB
    035a8c863461        5 months ago        /bin/sh -c mkdir -p /u01/jdk/ && mkdir -p /u0   0 B
    a555d44630e2        10 months ago       /bin/sh -c #(nop) CMD [/bin/bash]               0 B
    23a9eb33093d        10 months ago       /bin/sh -c #(nop) ADD file:33b9447cdbd58ef81b   195.1 MB
    7258693d533e        10 months ago       /bin/sh -c #(no/p) MAINTAINER Oracle Linux Pro   0 B
    


    Note that "eca6affff525" is the top image which is built on top of "ccf8bd04df89" and so on.  The base image is "7258693d533e", which doesn't have a parent.  For example, if you display the parent of the base image, it displays nothing (i.e., no patent):

    # docker inspect --format "{{ .Parent }}" 7258693d533e
    <blank>

    The btrfs Backend


    The brtfs backend requires /var/lib/docker to be on a btrfs filesystem and uses the filesystem level snapshotting to implement layers.

    # df -h
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/xvdb4       11G  7.1G  2.5G  75% /var/lib/docker
    # mount -l
    /dev/xvdb4 on /var/lib/docker type btrfs (rw)
    <snipped>

    You can find the layers of the images in the folder /var/lib/docker/btrfs/subvolumes.  Each layer is stored as a btrfs subvolume inside the folder  and start out as a snapshot of the parent subvolume (if any).

    This backend is pretty fast. Mounting /var/lib/docker on a different filesystem than the rest of your system is recommended in order to limit the impact of filesystem corruption.

    Image Cleanup


    One of the purposes of learning Docker filesystems and storage backends is to assure you know what you are doing before cleaning up unwanted images.[15,16]

    For example, before removing an image, no containers can be using it (running or stopped). After you've assured that, these commands can cleanup untagged images (see also filtering) or all images.

    # batch cleanup untagged images 
    docker rmi $(docker images -q -f "dangling=true")

    # remove all images by id 
    docker rmi $(docker images -aq)

    References

    1. Supported Filesystems (Docker)
    2. Concept of Mounting
      • The concept of mounting allows programs to be agnostic about where your data is structured
      • From an application (or user) point of view, the file system is one tree. Under the hood, the file system structure can be on a single partition, but also on a dozen partitions, network storage, removable media and more.
    3. Displaying Physical Volumes (Redhat)
    4. Docker - How to analyze a container's disk usage? (good)
    5. Finding all storage devices attached to a Linux machine
    6. /dev/dm-1 (block device)
      • dev/dm-1 is for "device mapper n.1". Basically, it is a logical unit carved out using the kernel embedded device mapper layer. From a userspace application point of view, it is a RAW block device.
    7. Linux file system
    8. Docker images command
    9. Docker cp command
      • You can copy to or from either a running or stopped container.
      • Behavior is similar to the common Unix utility cp -a in that 
        • directories are copied recursively with permissions preserved if possible. 
        • Ownership is set to the user and primary group on the receiving end of the transfer.  For example, 
          • Files copied to a container will be created with UID:GID of the root user. 
          • Files copied to the local machine will be created with the UID:GID of the user which invoked the docker cp command.
        • It is not possible to copy certain system files such as resources under /proc,/sys, /dev, and mounts created by the user in the container.
    10. Understanding Volumes in Docker (good) 
    11. Docker Volume Manager
    12. Docker Quicksheet 
    13. Storage Driver (Docker)
      • A storage driver is how docker implements a particular union file system. 
      • Keeping with are “batteries included, but replaceable” philosophy, Docker supports a number of different union file systems. 
        • For instance, Ubuntu’s default storage driver is AUFS, where for Red Hat and Centos it’s Device Mapper.
    14. Docker Images
      • Docker images are stored as series of read-only layers. 
      • When we start a container, Docker takes the read-only image and adds a read-write layer on top. 
      • If the running container modifies an existing file, the file is copied out of the underlying read-only layer and into the top-most read-write layer where the changes are applied. 
        • The version in the read-write layer hides the underlying file, but does not destroy it — it still exists in the underlying image. 
      • When a Docker container is deleted, relaunching the image will start a fresh container without any of the changes made in the previously running container — those changes are lost. 
      • Docker calls this combination of read-only layers with a read-write layer on top a Union File System.
    15. Why is docker image eating up my disk space that is not used by docker
    16. Docker error : no space left on device
    17. docker ps -s
      • -s, --size=false Display total file sizes
    18. Advanced Docker Volumes
    19. Resizing Docker containers with the Device Mapper plugin
    20. Question on Resource Limits? (Docker)
    21. devicemapper - a storage backend based on Device Mapper
    22. Docker: Btrfs Storage in Practice (Xml and More)

    Saturday, November 7, 2015

    Security and Isolation Implementation in Docker Containers

    Multitenancy is regarded an important feature of cloud computing. If we consider applications running on a container a tenant, the goal of good security-and-isolation design is to ensure tenants running on a host only use resources visible to them.

    As container technology evolves, its implementation of security, isolation and resource control has been continually improved.  In this article, we will review how Docker container achieves its security and isolation utilizing native container features of Linux such as namespaces, cgroups, capabilities, etc.

    Virtualization and Isolation


    Operating system-level virtualization, containers, zones, or even "chroot with steroids" are names that define the same concept of user-space isolation. Product such as Docker makes use of user-space isolation on top of OS-level vitualization facilities to provide extra security.

    Since version 0.9, Docker includes the libcontainer library as its own way to directly use virtualization facilities provided by the Linux kernel, in addition to using abstracted virtualization interfaces via LXC, [1] systemd-nspawn[2], and libvert,[3]

    These virtualization libraries all utilize native container features of Linux (see Diagram above):
    • namespaces
    • cgroups
    • capabilities
    and more. Docker combines these components into a wrapper which it calls a container format.

    libcontainer


    The default container format is called libcontainer. Docker also supports traditional Linux containers using LXC. In the future, Docker may support other container formats, for example, by integrating with BSD Jails or Solaris Zones.

    Execution driver is the implementation of a specific container format and used for running docker containers. In the latest release, libcontainer
    • Is the default execution driver for running docker containers
    • Is shipped alongside the LXC driver
    • Is a pure Go library which is developed to access the kernel’s container APIs directly, without any other dependencies
      • Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules – all in a consistent and predictable way, and without depending on LXC or any other userland package.[6]
      • You provide a root filesystem and a configuration on how libcontainer is supposed to execute a container and it does the rest.
      • It allows spawning new containers or attaching to an existing container.
      • In fact, libcontainer delivered much needed stability that the team had decided to make it the default.
        • As of Docker 0.9, LXC is now optional
          • Note that LXC driver will continue to be supported going forward.
        • To switch back to the LXC driver, simply restart the Docker daemon with
          • docker -d -e lxc

    namespaces


    Docker isn't virtualization, as such – instead, it's an abstraction on top of the kernel's support for namespaces, which provides the isolated workspace (or containter). When you run a container, Docker creates a set of namespaces for that container.

    Some of the namespaces that Docker uses on Linux are:
    • pid namespace
      • Used for process isolation (PID: Process ID).
      • Processes running inside the container appear to be running on a normal Linux system although they are sharing the underlying kernel with processes located in other namespaces.
    • net namespace
      • Used for managing network interfaces (NET: Networking).
      • DNAT allows you to configure your guest's networking independently of your host's and have a convenient interface for forwarding only the ports you want between them.
        • However, you can replace this with a bridge to a physical interface.
    • ipc namespace
      • Used for managing access to IPC resources (IPC: InterProcess Communication).
    • mnt namespace
      • Used for managing mount-points (MNT: Mount).
    • uts namespace
      • Used for isolating kernel and version identifiers. (UTS: Unix Timesharing System).
    These isolation benefits naturally come with costs. Based on your network access patterns, your memory constraints, you may choose how to configure namespaces for your containers with Docker.

    cgroups (or Control Groups)


    Docker on Linux makes use of another technology called cgroups. Because each VM is a process, all normal Linux resource management facilities such as scheduling and cgroups apply to VMs. Furthermore, there is only one level of resource allocation and scheduling because a containerized Linux system only has one kernel and the kernel has full visibility into the containers.

    In summary, cgroups allow Docker to
    • Group processes and manage their aggregate resource consumption
    • Share available hardware resources to containers
    • Limit the memory and CPU consumption of containers
      • A container can be resized by simply changing the limits of its corresponding cgroup.
      • You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup.
    • Provide a reliable way of terminating all processes inside a container.

    Capabilities[20]


    "POSIX capabilities" is what Linux uses.[9] These capabilities are a partitioning of the all powerful root privilege into a set of distinct privileges. You can see a full list of available capabilities in Linux manpages. Docker drops all capabilities except those needed, a whitelist instead of a blacklist approach.

    Your average server (bare metal or virtual machine) needs to run a bunch of processes as root. Those typically include:
    • SSH 
    • cron 
    • syslogd 
    • Hardware management tools (e.g., load modules) 
    • Network configuration tools (e.g., to handle DHCP, WPA, or VPNs),
    and much more.

    A container is very different, because almost all of those tasks are handled by the infrastructure around the container. By default, Docker starts containers with a restricted set of capabilities. In most cases, containers will not need “real” root privileges at all. For example, processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the CAP_NET_BIND_SERVICE instead. And therefore, containers can run with a reduced capability set; meaning that “root” within a container has much less privileges than the real “root”.

    Capabilities are just one of the many security features provided by modern Linux kernels. To harden a Docker host, you can also leverage other existing, well-known systems like
    If your distribution comes with security model templates for Docker containers, you can use them out of the box. For instance, Docker ships a template that works with AppArmor and Red Hat comes with SELinux policies for Docker.

    Photo Credit


    References

    1. LXC—Linux containers.
    2. Control Centre: The systemd Linux init system
    3. The virtualization API: libvirt
    4. Solomon Hykes and others. What is Docker?
    5. How is Docker different from a normal virtual machine? (Stackoverflow)
    6. Docker 0.9: introducing execution drivers and libcontainer
      • Uses layered filesystems AuFS.
    7. Is there a formula for calculating the overhead of a Docker container?
    8. An Updated Performance Comparison of Virtual Machinesand Linux Containers
    9. capabilities(7) - Linux man page
    10. Netlink (Wikipedia)
    11. The lost packages of docker
    12. ebtables/iptables interaction on a Linux-based bridge
    13. Comparsion Between AppArmor and Selinux
    14. The docker-proxy (netfilter)
    15. Hardware isolation
    16. Understand the architecture (docker)
    17. Linux kernel capabilities FAQ
    18. Docker: Differences between Container and Full VM (Xml and More)
    19. Docker vs VMs
      • There is one key metric where Docker Containers are weaker than Virtual Machines, and that’s “Isolation”. Intel’s VT-d and VT- x technologies have provided Virtual Machines with ring-1 hardware isolation of which, it takes full advantage. It helps Virtual Machines from breaking down and interfering with each other.
    20. Docker Security
    21. Introduction to Control Groups (Cgroups)
    22. Docker Runtime Metrics
      • Control groups are exposed through a pseudo-filesystem. In recent distros, you should find this filesystem under /sys/fs/cgroup
      • On older systems, the control groups might be mounted on /cgroup, without distinct hierarchies.
        • To figure out where your control groups are mounted, you can run:
          • $ grep cgroup /proc/mounts
    23. Oracle WebLogic Server on Docker Containers (white paper)
    24. WebLogic on Docker (GitHub)
      • Sample Docker configurations to facilitate installation, configuration, and environment setup for DevOps users. This project includes quick start dockerfiles and samples for both WebLogic 12.1.3 and 12.2.1 based on Oracle Linux and Oracle JDK 8 (Server).

    Sunday, November 1, 2015

    Docker: Differences between Container and Full VM

    A virtual machine (VM) is an emulation of a particular computer system. Virtual machines operate based on the computer architecture and functions of a real or hypothetical computer, and their implementations may involve specialized hardware, software, or a combination of both.

    In this article, we will examine the differences between a Docker Container and a Full VM  (see Note 1).

    Docker Container


    Docker is a facility for creating encapsulated computer environments, each encapsulated computer environment is called a container.[2,7]

    Starting up a Docker container is lightning fast because:
    Each container shares the host computer's copy of the kernel.
    • However, each with its own running copy of Linux
    • This means there's no hypervisor, and no extended bootup

    In contrast, Virtual Machines implantation in KVM, VirtualBox or VMware is different.


    Terminology

    • Host OS vs Guest OS
      • Host OS
        • is the original OS installed on a computer
      • Guest OS
        • is installed in a virtual machine or disk partition in addition to the host or main OS
          • In a virtualization, a guest OS can be different from the host OS
          • In disk partitioning, a guest OS must be the same as the host OS
    • Hypervisor (or virtual machine monitor)
      • is a piece of computer software, firmware or hardware that creates and runs virtual machines.
      • A computer on which a hypervisor is running one or more virtual machines is defined as a host machine
      • Each virtual machine is called a guest machine.
    • Docker Container
      • A encapsulated computer environment created by Docker
      • Docker on Linux platforms
        • Building on top of facilities provided by the Linux kernel (primarily cgroups and namespaces)
        • Unlike a virtual machine, does not require or include a separate operating system
      • Docker on non-Linux platforms 
    • Docker daemon
      • is the persistent process that manages containers. 
        • Docker uses the same binary for both the daemon and client.
      • uses Linux-specific kernel features


    Container vs Full VM


    A full virtualized system gets its own set of resources allocated to it, and does minimal sharing. You get more isolation, but it is much heavier (requires more resources).  With Docker container you get less isolation, but they are more lightweight and require less resources. So you could easily run 1000's on a host, and it doesn't even blink.[1]

    Basically, a Docker container (see Note 1) and a full VM have different fundamental goals
    • VM is to fully emulate a foreign environment
      • Hypervisor in a full VM implementation is required to translate commands between Guest OS and Host OS
      • Each VM requires a full copy of the OS, the application being run and any supporting libraries
      • If you need to simultaneously run different operating systems (like Windows, OS/X or BSD), or run programs compiled for other operating systems: You need to do a full Virtual Machines implantation.
        • In contrast, the container OS (or, more accurately, the kernel) must be the same as the host OS and is shared between container and host (see Note 1).
    • Container is to make applications portable and self-contained
      • Each container shares the host computer's copy of the kernel. 
        • This means there's no hypervisor and no extended bootup.
      • The container engine is responsible for starting and stopping containers in a similar way to the hypervisor on a VM. 
        • However, processes running inside containers are equivalent to native processes on the host and do not incur the overheads associated with hypervisor execution.

    Notes

    1. In this article, we only focus on Docker implementations on Linux platforms. In other words, our discussions here exclude non-Linux platforms (i.e, Windows, Mac OS X, etc.).[2]
      • Because the Docker daemon uses Linux-specific kernel features, you can’t run Docker natively in either Windows or Mac OS X.
      • Docker on non-Linux platforms uses a Linux virtual machine to run the containers.

    Photo Credit

    References

    1. How is Docker different from a normal virtual machine? (Stackoverflow)
    2. Newbie's Overview of Docker
    3. Supported Installation (Docker)
    4. EXTERIOR: Using Dual-VM Based External Shell for Guest-OS Introspection, Configuration, and Recovery
    5. Comparing Virtual Machines and Linux Containers Performance
    6. An Updated Performance Comparison of Virtual Machines and Linux Containers
    7. Security and Isolation Implementation in Docker Containers