Saturday, November 7, 2015

Security and Isolation Implementation in Docker Containers

Multitenancy is regarded an important feature of cloud computing. If we consider applications running on a container a tenant, the goal of good security-and-isolation design is to ensure tenants running on a host only use resources visible to them.

As container technology evolves, its implementation of security, isolation and resource control has been continually improved.  In this article, we will review how Docker container achieves its security and isolation utilizing native container features of Linux such as namespaces, cgroups, capabilities, etc.

Virtualization and Isolation


Operating system-level virtualization, containers, zones, or even "chroot with steroids" are names that define the same concept of user-space isolation. Product such as Docker makes use of user-space isolation on top of OS-level vitualization facilities to provide extra security.

Since version 0.9, Docker includes the libcontainer library as its own way to directly use virtualization facilities provided by the Linux kernel, in addition to using abstracted virtualization interfaces via LXC, [1] systemd-nspawn[2], and libvert,[3]

These virtualization libraries all utilize native container features of Linux (see Diagram above):
  • namespaces
  • cgroups
  • capabilities
and more. Docker combines these components into a wrapper which it calls a container format.

libcontainer


The default container format is called libcontainer. Docker also supports traditional Linux containers using LXC. In the future, Docker may support other container formats, for example, by integrating with BSD Jails or Solaris Zones.

Execution driver is the implementation of a specific container format and used for running docker containers. In the latest release, libcontainer
  • Is the default execution driver for running docker containers
  • Is shipped alongside the LXC driver
  • Is a pure Go library which is developed to access the kernel’s container APIs directly, without any other dependencies
    • Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules – all in a consistent and predictable way, and without depending on LXC or any other userland package.[6]
    • You provide a root filesystem and a configuration on how libcontainer is supposed to execute a container and it does the rest.
    • It allows spawning new containers or attaching to an existing container.
    • In fact, libcontainer delivered much needed stability that the team had decided to make it the default.
      • As of Docker 0.9, LXC is now optional
        • Note that LXC driver will continue to be supported going forward.
      • To switch back to the LXC driver, simply restart the Docker daemon with
        • docker -d -e lxc

namespaces


Docker isn't virtualization, as such – instead, it's an abstraction on top of the kernel's support for namespaces, which provides the isolated workspace (or containter). When you run a container, Docker creates a set of namespaces for that container.

Some of the namespaces that Docker uses on Linux are:
  • pid namespace
    • Used for process isolation (PID: Process ID).
    • Processes running inside the container appear to be running on a normal Linux system although they are sharing the underlying kernel with processes located in other namespaces.
  • net namespace
    • Used for managing network interfaces (NET: Networking).
    • DNAT allows you to configure your guest's networking independently of your host's and have a convenient interface for forwarding only the ports you want between them.
      • However, you can replace this with a bridge to a physical interface.
  • ipc namespace
    • Used for managing access to IPC resources (IPC: InterProcess Communication).
  • mnt namespace
    • Used for managing mount-points (MNT: Mount).
  • uts namespace
    • Used for isolating kernel and version identifiers. (UTS: Unix Timesharing System).
These isolation benefits naturally come with costs. Based on your network access patterns, your memory constraints, you may choose how to configure namespaces for your containers with Docker.

cgroups (or Control Groups)


Docker on Linux makes use of another technology called cgroups. Because each VM is a process, all normal Linux resource management facilities such as scheduling and cgroups apply to VMs. Furthermore, there is only one level of resource allocation and scheduling because a containerized Linux system only has one kernel and the kernel has full visibility into the containers.

In summary, cgroups allow Docker to
  • Group processes and manage their aggregate resource consumption
  • Share available hardware resources to containers
  • Limit the memory and CPU consumption of containers
    • A container can be resized by simply changing the limits of its corresponding cgroup.
    • You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup.
  • Provide a reliable way of terminating all processes inside a container.

Capabilities[20]


"POSIX capabilities" is what Linux uses.[9] These capabilities are a partitioning of the all powerful root privilege into a set of distinct privileges. You can see a full list of available capabilities in Linux manpages. Docker drops all capabilities except those needed, a whitelist instead of a blacklist approach.

Your average server (bare metal or virtual machine) needs to run a bunch of processes as root. Those typically include:
  • SSH 
  • cron 
  • syslogd 
  • Hardware management tools (e.g., load modules) 
  • Network configuration tools (e.g., to handle DHCP, WPA, or VPNs),
and much more.

A container is very different, because almost all of those tasks are handled by the infrastructure around the container. By default, Docker starts containers with a restricted set of capabilities. In most cases, containers will not need “real” root privileges at all. For example, processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the CAP_NET_BIND_SERVICE instead. And therefore, containers can run with a reduced capability set; meaning that “root” within a container has much less privileges than the real “root”.

Capabilities are just one of the many security features provided by modern Linux kernels. To harden a Docker host, you can also leverage other existing, well-known systems like
If your distribution comes with security model templates for Docker containers, you can use them out of the box. For instance, Docker ships a template that works with AppArmor and Red Hat comes with SELinux policies for Docker.

Photo Credit


References

  1. LXC—Linux containers.
  2. Control Centre: The systemd Linux init system
  3. The virtualization API: libvirt
  4. Solomon Hykes and others. What is Docker?
  5. How is Docker different from a normal virtual machine? (Stackoverflow)
  6. Docker 0.9: introducing execution drivers and libcontainer
    • Uses layered filesystems AuFS.
  7. Is there a formula for calculating the overhead of a Docker container?
  8. An Updated Performance Comparison of Virtual Machinesand Linux Containers
  9. capabilities(7) - Linux man page
  10. Netlink (Wikipedia)
  11. The lost packages of docker
  12. ebtables/iptables interaction on a Linux-based bridge
  13. Comparsion Between AppArmor and Selinux
  14. The docker-proxy (netfilter)
  15. Hardware isolation
  16. Understand the architecture (docker)
  17. Linux kernel capabilities FAQ
  18. Docker: Differences between Container and Full VM (Xml and More)
  19. Docker vs VMs
    • There is one key metric where Docker Containers are weaker than Virtual Machines, and that’s “Isolation”. Intel’s VT-d and VT- x technologies have provided Virtual Machines with ring-1 hardware isolation of which, it takes full advantage. It helps Virtual Machines from breaking down and interfering with each other.
  20. Docker Security
  21. Introduction to Control Groups (Cgroups)
  22. Docker Runtime Metrics
    • Control groups are exposed through a pseudo-filesystem. In recent distros, you should find this filesystem under /sys/fs/cgroup
    • On older systems, the control groups might be mounted on /cgroup, without distinct hierarchies.
      • To figure out where your control groups are mounted, you can run:
        • $ grep cgroup /proc/mounts
  23. Oracle WebLogic Server on Docker Containers (white paper)
  24. WebLogic on Docker (GitHub)
    • Sample Docker configurations to facilitate installation, configuration, and environment setup for DevOps users. This project includes quick start dockerfiles and samples for both WebLogic 12.1.3 and 12.2.1 based on Oracle Linux and Oracle JDK 8 (Server).

Sunday, November 1, 2015

Docker: Differences between Container and Full VM

A virtual machine (VM) is an emulation of a particular computer system. Virtual machines operate based on the computer architecture and functions of a real or hypothetical computer, and their implementations may involve specialized hardware, software, or a combination of both.

In this article, we will examine the differences between a Docker Container and a Full VM  (see Note 1).

Docker Container


Docker is a facility for creating encapsulated computer environments, each encapsulated computer environment is called a container.[2,7]

Starting up a Docker container is lightning fast because:
Each container shares the host computer's copy of the kernel.
  • However, each with its own running copy of Linux
  • This means there's no hypervisor, and no extended bootup

In contrast, Virtual Machines implantation in KVM, VirtualBox or VMware is different.


Terminology

  • Host OS vs Guest OS
    • Host OS
      • is the original OS installed on a computer
    • Guest OS
      • is installed in a virtual machine or disk partition in addition to the host or main OS
        • In a virtualization, a guest OS can be different from the host OS
        • In disk partitioning, a guest OS must be the same as the host OS
  • Hypervisor (or virtual machine monitor)
    • is a piece of computer software, firmware or hardware that creates and runs virtual machines.
    • A computer on which a hypervisor is running one or more virtual machines is defined as a host machine
    • Each virtual machine is called a guest machine.
  • Docker Container
    • A encapsulated computer environment created by Docker
    • Docker on Linux platforms
      • Building on top of facilities provided by the Linux kernel (primarily cgroups and namespaces)
      • Unlike a virtual machine, does not require or include a separate operating system
    • Docker on non-Linux platforms 
  • Docker daemon
    • is the persistent process that manages containers. 
      • Docker uses the same binary for both the daemon and client.
    • uses Linux-specific kernel features


Container vs Full VM


A full virtualized system gets its own set of resources allocated to it, and does minimal sharing. You get more isolation, but it is much heavier (requires more resources).  With Docker container you get less isolation, but they are more lightweight and require less resources. So you could easily run 1000's on a host, and it doesn't even blink.[1]

Basically, a Docker container (see Note 1) and a full VM have different fundamental goals
  • VM is to fully emulate a foreign environment
    • Hypervisor in a full VM implementation is required to translate commands between Guest OS and Host OS
    • Each VM requires a full copy of the OS, the application being run and any supporting libraries
    • If you need to simultaneously run different operating systems (like Windows, OS/X or BSD), or run programs compiled for other operating systems: You need to do a full Virtual Machines implantation.
      • In contrast, the container OS (or, more accurately, the kernel) must be the same as the host OS and is shared between container and host (see Note 1).
  • Container is to make applications portable and self-contained
    • Each container shares the host computer's copy of the kernel. 
      • This means there's no hypervisor and no extended bootup.
    • The container engine is responsible for starting and stopping containers in a similar way to the hypervisor on a VM. 
      • However, processes running inside containers are equivalent to native processes on the host and do not incur the overheads associated with hypervisor execution.

Notes

  1. In this article, we only focus on Docker implementations on Linux platforms. In other words, our discussions here exclude non-Linux platforms (i.e, Windows, Mac OS X, etc.).[2]
    • Because the Docker daemon uses Linux-specific kernel features, you can’t run Docker natively in either Windows or Mac OS X.
    • Docker on non-Linux platforms uses a Linux virtual machine to run the containers.

Photo Credit

References

  1. How is Docker different from a normal virtual machine? (Stackoverflow)
  2. Newbie's Overview of Docker
  3. Supported Installation (Docker)
  4. EXTERIOR: Using Dual-VM Based External Shell for Guest-OS Introspection, Configuration, and Recovery
  5. Comparing Virtual Machines and Linux Containers Performance
  6. An Updated Performance Comparison of Virtual Machines and Linux Containers
  7. Security and Isolation Implementation in Docker Containers