Insights on Oracle & Tech: Security and Isolation Implementation in Docker Containers

Multitenancy is regarded an important feature of cloud computing. If we consider applications running on a container a tenant, the goal of good security-and-isolation design is to ensure tenants running on a host only use resources visible to them.

As container technology evolves, its implementation of security, isolation and resource control has been continually improved. In this article, we will review how Docker container achieves its security and isolation utilizing native container features of Linux such as namespaces, cgroups, capabilities, etc.

Virtualization and Isolation

Operating system-level virtualization, containers, zones, or even "chroot with steroids" are names that define the same concept of user-space isolation. Product such as Docker makes use of user-space isolation on top of OS-level vitualization facilities to provide extra security.

Since version 0.9, Docker includes the libcontainer library as its own way to directly use virtualization facilities provided by the Linux kernel, in addition to using abstracted virtualization interfaces via LXC, ^[1] systemd-nspawn^[2], and libvert,^[3]

These virtualization libraries all utilize native container features of Linux (see Diagram above):

namespaces
cgroups
capabilities

and more. Docker combines these components into a wrapper which it calls a container format.

libcontainer

The default container format is called libcontainer. Docker also supports traditional Linux containers using LXC. In the future, Docker may support other container formats, for example, by integrating with BSD Jails or Solaris Zones.

Execution driver is the implementation of a specific container format and used for running docker containers. In the latest release, libcontainer

Is the default execution driver for running docker containers
Is shipped alongside the LXC driver
Is a pure Go library which is developed to access the kernel’s container APIs directly, without any other dependencies

Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules – all in a consistent and predictable way, and without depending on LXC or any other userland package.^[6]
You provide a root filesystem and a configuration on how libcontainer is supposed to execute a container and it does the rest.
It allows spawning new containers or attaching to an existing container.
In fact, libcontainer delivered much needed stability that the team had decided to make it the default.

As of Docker 0.9, LXC is now optional

Note that LXC driver will continue to be supported going forward.

To switch back to the LXC driver, simply restart the Docker daemon with

docker -d -e lxc

namespaces

Docker isn't virtualization, as such – instead, it's an abstraction on top of the kernel's support for namespaces, which provides the isolated workspace (or containter). When you run a container, Docker creates a set of namespaces for that container.

Some of the namespaces that Docker uses on Linux are:

pid namespace

Used for process isolation (PID: Process ID).
Processes running inside the container appear to be running on a normal Linux system although they are sharing the underlying kernel with processes located in other namespaces.

net namespace

Used for managing network interfaces (NET: Networking).
DNAT allows you to configure your guest's networking independently of your host's and have a convenient interface for forwarding only the ports you want between them.

However, you can replace this with a bridge to a physical interface.

ipc namespace

Used for managing access to IPC resources (IPC: InterProcess Communication).

mnt namespace

Used for managing mount-points (MNT: Mount).

uts namespace

Used for isolating kernel and version identifiers. (UTS: Unix Timesharing System).

These isolation benefits naturally come with costs. Based on your network access patterns, your memory constraints, you may choose how to configure namespaces for your containers with Docker.

cgroups (or Control Groups)

Docker on Linux makes use of another technology called cgroups. Because each VM is a process, all normal Linux resource management facilities such as scheduling and cgroups apply to VMs. Furthermore, there is only one level of resource allocation and scheduling because a containerized Linux system only has one kernel and the kernel has full visibility into the containers.

In summary, cgroups allow Docker to

Group processes and manage their aggregate resource consumption
Share available hardware resources to containers
Limit the memory and CPU consumption of containers

A container can be resized by simply changing the limits of its corresponding cgroup.
You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup.

Provide a reliable way of terminating all processes inside a container.

Capabilities^[20]

"POSIX capabilities" is what Linux uses.^[9] These capabilities are a partitioning of the all powerful root privilege into a set of distinct privileges. You can see a full list of available capabilities in Linux manpages. Docker drops all capabilities except those needed, a whitelist instead of a blacklist approach.

Your average server (bare metal or virtual machine) needs to run a bunch of processes as root. Those typically include:

SSH
cron
syslogd
Hardware management tools (e.g., load modules)
Network configuration tools (e.g., to handle DHCP, WPA, or VPNs),

and much more.

A container is very different, because almost all of those tasks are handled by the infrastructure around the container. By default, Docker starts containers with a restricted set of capabilities. In most cases, containers will not need “real” root privileges at all. For example, processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the CAP_NET_BIND_SERVICE instead. And therefore, containers can run with a reduced capability set; meaning that “root” within a container has much less privileges than the real “root”.

Capabilities are just one of the many security features provided by modern Linux kernels. To harden a Docker host, you can also leverage other existing, well-known systems like

TOMOYO
AppArmor
SELinux
GRSEC, etc.

If your distribution comes with security model templates for Docker containers, you can use them out of the box. For instance, Docker ships a template that works with AppArmor and Red Hat comes with SELinux policies for Docker.

Photo Credit

Docker Blog

References

LXC—Linux containers.
Control Centre: The systemd Linux init system
The virtualization API: libvirt
Solomon Hykes and others. What is Docker?
How is Docker different from a normal virtual machine? (Stackoverflow)
Docker 0.9: introducing execution drivers and libcontainer

Uses layered filesystems AuFS.

Is there a formula for calculating the overhead of a Docker contain er ?
An Updated Performance Comparison of Virtual Machinesand Linux Containers
capabilities(7) - Linux man page
Netlink (Wikipedia)
The lost packages of docker
ebtables/iptables interaction on a Linux-based bridge
Comparsion Between AppArmor and Selinux
The docker-proxy (netfilter)
Hardware isolation
Understand the architecture (docker)
Linux kernel capabilities FAQ
Docker: Differences between Container and Full VM (Xml and More)
Docker vs VMs

There is one key metric where Docker Containers are weaker than Virtual Machines, and that’s “Isolation”. Intel’s VT-d and VT- x technologies have provided Virtual Machines with ring-1 hardware isolation of which, it takes full advantage. It helps Virtual Machines from breaking down and interfering with each other.

Docker Security
Introduction to Control Groups (Cgroups)
Docker Runtime Metrics

Control groups are exposed through a pseudo-filesystem. In recent distros, you should find this filesystem under /sys/fs/cgroup.
On older systems, the control groups might be mounted on /cgroup, without distinct hierarchies.

To figure out where your control groups are mounted, you can run:

$ grep cgroup /proc/mounts

Oracle WebLogic Server on Docker Containers (white paper)
WebLogic on Docker (GitHub)

Sample Docker configurations to facilitate installation, configuration, and environment setup for DevOps users. This project includes quick start dockerfiles and samples for both WebLogic 12.1.3 and 12.2.1 based on Oracle Linux and Oracle JDK 8 (Server).

Cross Column

Saturday, November 7, 2015

Security and Isolation Implementation in Docker Containers

Virtualization and Isolation

libcontainer

namespaces

cgroups (or Control Groups)

Capabilities^[20]

Photo Credit

References

4 comments:

Cross Column

Saturday, November 7, 2015

Security and Isolation Implementation in Docker Containers

Virtualization and Isolation

libcontainer

namespaces

cgroups (or Control Groups)

Capabilities[20]

Photo Credit

References

4 comments:

Capabilities^[20]