Xml and More: November 2015

Saturday, November 7, 2015

Security and Isolation Implementation in Docker Containers

Multitenancy is regarded an important feature of cloud computing. If we consider applications running on a container a tenant, the goal of good security-and-isolation design is to ensure tenants running on a host only use resources visible to them.

As container technology evolves, its implementation of security, isolation and resource control has been continually improved. In this article, we will review how Docker container achieves its security and isolation utilizing native container features of Linux such as namespaces, cgroups, capabilities, etc.

Virtualization and Isolation

Operating system-level virtualization, containers, zones, or even "chroot with steroids" are names that define the same concept of user-space isolation. Product such as Docker makes use of user-space isolation on top of OS-level vitualization facilities to provide extra security.

Since version 0.9, Docker includes the libcontainer library as its own way to directly use virtualization facilities provided by the Linux kernel, in addition to using abstracted virtualization interfaces via LXC, ^[1] systemd-nspawn^[2], and libvert,^[3]

These virtualization libraries all utilize native container features of Linux (see Diagram above):

namespaces
cgroups
capabilities

and more. Docker combines these components into a wrapper which it calls a container format.

libcontainer

The default container format is called libcontainer. Docker also supports traditional Linux containers using LXC. In the future, Docker may support other container formats, for example, by integrating with BSD Jails or Solaris Zones.

Execution driver is the implementation of a specific container format and used for running docker containers. In the latest release, libcontainer

Is the default execution driver for running docker containers
Is shipped alongside the LXC driver
Is a pure Go library which is developed to access the kernel’s container APIs directly, without any other dependencies

Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules – all in a consistent and predictable way, and without depending on LXC or any other userland package.^[6]
You provide a root filesystem and a configuration on how libcontainer is supposed to execute a container and it does the rest.
It allows spawning new containers or attaching to an existing container.
In fact, libcontainer delivered much needed stability that the team had decided to make it the default.

As of Docker 0.9, LXC is now optional

Note that LXC driver will continue to be supported going forward.

To switch back to the LXC driver, simply restart the Docker daemon with

docker -d -e lxc

namespaces

Docker isn't virtualization, as such – instead, it's an abstraction on top of the kernel's support for namespaces, which provides the isolated workspace (or containter). When you run a container, Docker creates a set of namespaces for that container.

Some of the namespaces that Docker uses on Linux are:

pid namespace

Used for process isolation (PID: Process ID).
Processes running inside the container appear to be running on a normal Linux system although they are sharing the underlying kernel with processes located in other namespaces.

net namespace

Used for managing network interfaces (NET: Networking).
DNAT allows you to configure your guest's networking independently of your host's and have a convenient interface for forwarding only the ports you want between them.

However, you can replace this with a bridge to a physical interface.

ipc namespace

Used for managing access to IPC resources (IPC: InterProcess Communication).

mnt namespace

Used for managing mount-points (MNT: Mount).

uts namespace

Used for isolating kernel and version identifiers. (UTS: Unix Timesharing System).

These isolation benefits naturally come with costs. Based on your network access patterns, your memory constraints, you may choose how to configure namespaces for your containers with Docker.

cgroups (or Control Groups)

Docker on Linux makes use of another technology called cgroups. Because each VM is a process, all normal Linux resource management facilities such as scheduling and cgroups apply to VMs. Furthermore, there is only one level of resource allocation and scheduling because a containerized Linux system only has one kernel and the kernel has full visibility into the containers.

In summary, cgroups allow Docker to

Group processes and manage their aggregate resource consumption
Share available hardware resources to containers
Limit the memory and CPU consumption of containers

A container can be resized by simply changing the limits of its corresponding cgroup.
You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup.

Provide a reliable way of terminating all processes inside a container.

Capabilities^[20]

"POSIX capabilities" is what Linux uses.^[9] These capabilities are a partitioning of the all powerful root privilege into a set of distinct privileges. You can see a full list of available capabilities in Linux manpages. Docker drops all capabilities except those needed, a whitelist instead of a blacklist approach.

Your average server (bare metal or virtual machine) needs to run a bunch of processes as root. Those typically include:

SSH
cron
syslogd
Hardware management tools (e.g., load modules)
Network configuration tools (e.g., to handle DHCP, WPA, or VPNs),

and much more.

A container is very different, because almost all of those tasks are handled by the infrastructure around the container. By default, Docker starts containers with a restricted set of capabilities. In most cases, containers will not need “real” root privileges at all. For example, processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the CAP_NET_BIND_SERVICE instead. And therefore, containers can run with a reduced capability set; meaning that “root” within a container has much less privileges than the real “root”.

Capabilities are just one of the many security features provided by modern Linux kernels. To harden a Docker host, you can also leverage other existing, well-known systems like

TOMOYO
AppArmor
SELinux
GRSEC, etc.

If your distribution comes with security model templates for Docker containers, you can use them out of the box. For instance, Docker ships a template that works with AppArmor and Red Hat comes with SELinux policies for Docker.

Photo Credit

Docker Blog

References

LXC—Linux containers.
Control Centre: The systemd Linux init system
The virtualization API: libvirt
Solomon Hykes and others. What is Docker?
How is Docker different from a normal virtual machine? (Stackoverflow)
Docker 0.9: introducing execution drivers and libcontainer

Uses layered filesystems AuFS.

Is there a formula for calculating the overhead of a Docker contain er ?
An Updated Performance Comparison of Virtual Machinesand Linux Containers
capabilities(7) - Linux man page
Netlink (Wikipedia)
The lost packages of docker
ebtables/iptables interaction on a Linux-based bridge
Comparsion Between AppArmor and Selinux
The docker-proxy (netfilter)
Hardware isolation
Understand the architecture (docker)
Linux kernel capabilities FAQ
Docker: Differences between Container and Full VM (Xml and More)
Docker vs VMs

There is one key metric where Docker Containers are weaker than Virtual Machines, and that’s “Isolation”. Intel’s VT-d and VT- x technologies have provided Virtual Machines with ring-1 hardware isolation of which, it takes full advantage. It helps Virtual Machines from breaking down and interfering with each other.

Docker Security
Introduction to Control Groups (Cgroups)
Docker Runtime Metrics

Control groups are exposed through a pseudo-filesystem. In recent distros, you should find this filesystem under /sys/fs/cgroup.
On older systems, the control groups might be mounted on /cgroup, without distinct hierarchies.

To figure out where your control groups are mounted, you can run:

$ grep cgroup /proc/mounts

Oracle WebLogic Server on Docker Containers (white paper)
WebLogic on Docker (GitHub)

Sample Docker configurations to facilitate installation, configuration, and environment setup for DevOps users. This project includes quick start dockerfiles and samples for both WebLogic 12.1.3 and 12.2.1 based on Oracle Linux and Oracle JDK 8 (Server).

Sunday, November 1, 2015

Docker: Differences between Container and Full VM

A virtual machine (VM) is an emulation of a particular computer system. Virtual machines operate based on the computer architecture and functions of a real or hypothetical computer, and their implementations may involve specialized hardware, software, or a combination of both.

In this article, we will examine the differences between a Docker Container and a Full VM (see Note 1).

Docker Container

Docker is a facility for creating encapsulated computer environments, each encapsulated computer environment is called a container.^[2,7]

Starting up a Docker container is lightning fast because:

Each container shares the host computer's copy of the kernel.

However, each with its own running copy of Linux

This means there's no hypervisor, and no extended bootup.

In contrast, Virtual Machines implantation in KVM, VirtualBox or VMware is different.

Terminology

Host OS vs Guest OS

Host OS

is the original OS installed on a computer

Guest OS

is installed in a virtual machine or disk partition in addition to the host or main OS

In a virtualization, a guest OS can be different from the host OS
In disk partitioning, a guest OS must be the same as the host OS

Hypervisor (or virtual machine monitor)

is a piece of computer software, firmware or hardware that creates and runs virtual machines.
A computer on which a hypervisor is running one or more virtual machines is defined as a host machine.
Each virtual machine is called a guest machine.

Docker Container

A encapsulated computer environment created by Docker
Docker on Linux platforms

Building on top of facilities provided by the Linux kernel (primarily cgroups and namespaces)
Unlike a virtual machine, does not require or include a separate operating system

Docker on non-Linux platforms

Uses a Linux virtual machine to run the containers.

Docker daemon

is the persistent process that manages containers.

Docker uses the same binary for both the daemon and client.

uses Linux-specific kernel features

Container vs Full VM

A full virtualized system gets its own set of resources allocated to it, and does minimal sharing. You get more isolation, but it is much heavier (requires more resources). With Docker container you get less isolation, but they are more lightweight and require less resources. So you could easily run 1000's on a host, and it doesn't even blink.^[1]

Basically, a Docker container (see Note 1) and a full VM have different fundamental goals

VM is to fully emulate a foreign environment

Hypervisor in a full VM implementation is required to translate commands between Guest OS and Host OS.
Each VM requires a full copy of the OS, the application being run and any supporting libraries
If you need to simultaneously run different operating systems (like Windows, OS/X or BSD), or run programs compiled for other operating systems: You need to do a full Virtual Machines implantation.

In contrast, the container OS (or, more accurately, the kernel) must be the same as the host OS and is shared between container and host (see Note 1).

Container is to make applications portable and self-contained

Each container shares the host computer's copy of the kernel.

This means there's no hypervisor and no extended bootup.

The container engine is responsible for starting and stopping containers in a similar way to the hypervisor on a VM.

However, processes running inside containers are equivalent to native processes on the host and do not incur the overheads associated with hypervisor execution.

Notes

In this article, we only focus on Docker implementations on Linux platforms. In other words, our discussions here exclude non-Linux platforms (i.e, Windows, Mac OS X, etc.).^[2]

Because the Docker daemon uses Linux-specific kernel features, you can’t run Docker natively in either Windows or Mac OS X.
Docker on non-Linux platforms uses a Linux virtual machine to run the containers.

Photo Credit

Docker Blog

Saturday, November 7, 2015

Security and Isolation Implementation in Docker Containers

Virtualization and Isolation

libcontainer

namespaces

cgroups (or Control Groups)

Capabilities[20]

Photo Credit

References

Sunday, November 1, 2015

Docker: Differences between Container and Full VM

Docker Container

Terminology

Container vs Full VM

Notes

Photo Credit

References

Capabilities^[20]