Chapter 6

Using and Managing Containers

“To see a world in a grain of sand ...”

—William Blake, Auguries of Innocence

Containers have become one of the most interesting and versatile alternatives to

virtual machines for encapsulating applications for cloud execution. They enable

you to run an application without source code modiﬁcation or even recompilation

on your Windows PC, your Mac, or any cloud. In this chapter we look at the

history of their development, the key ideas that make them work, and of course

how to use them on clouds. We focus on

Docker

container technology, because it

is the most widely known and used, is easy to download and install, and is free.

We b egin with some basics about containers. We show how to install Jupyter

with Docker, and we mention other sci entiﬁc applications that have been container-

ized. We then provide a simple example of how to create your own container that

you can then run in the cloud or on your laptop.

6.1 Container Basics

People used to think that the best (indeed, in many cases, only) way to encapsulate

software for deployment in the cloud was to create a VM image. That image

could be shared with others, for example by placing it in a repository for Amazon

images or in the Microsoft VM depot. Anybody could then deploy and run the

image within an appropriate data center. However, not all virtualization tools are

the same, so running a VM from Amazon on Azure or other clouds was a real

6.1. Container Basics

problem. Endless debates arose about the merits and demerits of this situation,

which usually went something like this: “This is just another form of evil vendor

lock-in!” A great deal of thought was given to ﬁnding ways to address this evilness.

Meanwhile, others realized that the Linux kernel had some nice features that

could be used to bound and contain the resource utilization of process es: in

particular, control groups and name space isolation. These features allow for the

layering of new private virtual ﬁle system components on top of the host ﬁle system

and a specia l partitioned process space for applications to run usi ng libraries

virtually stored in the layered ﬁle system. For all practical purposes, a program

running in a container looks like it is running in its own V M, but without the

extra baggage of a complete OS. A contained application uses the resources of the

host OS, which can even control the amount of resource devoted to each container:

for example, the CPU percentage and the amount of memory and disk space.

By mid-2013, a l ittle company called dotCloud released a tool that provided

a better way to deploy encapsulated applications. This tool became Docker [

and dotCloud became Docker, Inc.

docker.com

. Microsoft also ﬁgured out how to

do the same thing with Windows. While other container technologies exist, we

focus primarily on Docker in this book.

Docker allows applications to be provisioned in containers that encapsulate all

application dependencies. The application sees a complete, priva te process space,

ﬁle system, and network interface isolated from applications in other containers on

the same host operating system. Docker isolation provides a way to factor large

applications, as well as simple ways for running containers to communicate with

each other. When Docker is installed on Linux, Windows 10, or Mac, it runs on

a base Linux kernel called Alpine that is u sed for every container instance. As

we describe below, additional OS features are layered on top of that base. This

layering is the key to container portabili ty across clouds.

Docker is designed to support a variety of distributed applications. It is now

widely used in the Internet industry, including by major companies like Yelp,

Spotify, Baidu, Yandex, and eBay. Importantly, Docker is supported by major

public cloud providers, including Google, Microsoft, Amazon, and IBM.

To understand how containers are built and used, one must understand how the

ﬁle system in a container is layered on top of the existin g host services. The key is

the

Union File System

(more p recisel y, the advanced multilayered uniﬁcation

ﬁle system (AuFS) and a special property called copy on write that allows the

system to reuse many data objects in multiple containers.

Chapter 6. Using and Managing Containers

Docker images are composed of layers in the Union File System. The image is

itself a stack of read-only directories. The base is a simpliﬁed Linux or Windows

ﬁle system. As illustrated in ﬁgure 6. 1, a ddi tion al tools that the container needs

are then layered on top of that base, each in i ts own layer. When the container is

run, a ﬁnal writable ﬁle system is layered on top.

Figure 6.1: The Docker Union File System is layered on a s tandard base.

As an application in the container executes, it uses th e writable layer. If it

needs to modify an object in the read-only layers, it copies those objects into

the writable layer. Otherwise, it uses the data in the read-only layer, which is

shared with other container instances. Thus, typically only a little of the container

image needs to be a ctually loaded when a container is run, which means that

containers can load and run much faster than virtual machines. In fact, launching

a container typically takes less than a second, while starting a virtual machine can

take minutes.

In addition to the ﬁle system layers in the container image, you can mount

a host machine directory as a ﬁle s ystem in the container’s OS. In this way a

container can share data with the host. Multiple containers can also share these

mounted directories and can use them for basic communication of shared data.

6.2 Docker and the Hub

The Docker website, Docker.com, provides the tools needed to install Docker on a

Linux, Mac, or Windows 10 PC. This site al so links to the Docker Hub, a public

resource where you can store your own containers, and search for and download

any of the hundreds of public containers.

6.2. Docker and the Hub

Install Jupyter with Docker on your laptop

.YoumustﬁrstinstallDockeron

your machine. While the details diﬀer on Linux, Mac, or PC, the installation is a

simple process, similar to that of installing a new browser or other desktop application.

Follow the download and install instructions on the Docker.com website. Docker

does not have a graphical interface: it is based on a command line API. Hence, you

need to open a “powershell” or “terminal” window on your machine. The Docker

commands are then the same on Linux, Mac, or PC.

Once you have installed Docker, you can verify that it is running by execu ting

the

docker ps

command, which tells you which containers are running. You should

see the following outp ut, as no containers are running.

C:\> docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

C:\>

We launch Jupyter with the

docker run

command. The following example uses

two of the many parameters supported. The ﬁrst ﬂag,

-it

,causestheprinting

of a URL with a token that you can use to connect to the new Jupyter instance.

The se cond ﬂag,

-p 8888:8888

, binds port 8888 in the container’s IP stack to port

8888 on our machine. Finally, the command speciﬁ es the name of the container,

jupyter/scipy-notebook,asitcanbefoundintheDockerHub.

C:\> docker run -it -p 8888:8888 jupyter/scipy-notebook

Copy/paste this URL into your browser when you connect for the

first time , to login with a token :

http :// localhost :8888/? token =b9fc19aa8762a6c781308bb8dae27a ...

Rerunning the

docker ps

command shows that our newly started Jupyter note-

book is now running. (Confusingly, each of the two output lines is wrapped.)

C:\> docker ps

CONTAINER ID IMAGE COMMAND CREATED

STATUS PORTS NAMES

6cb4532fa0b jpyter/scipy-notebook "tini--start-note" 6 seconds ago

up 5 secon ds 0.0.0.:8888- >8888/tcp prickly_meitner

C:\>

The ﬁrst time you execute this command for a speciﬁc container, it must search

for and d ownload various elements of the container ﬁle system, which may take

several minutes. Then, because the container

jupyter/scipy-notebook

is in the

Docker Hub, it ﬁnds the container there and begins to download it. Once completed,

it then starts the c ontainer. The container image is now local; thus, the next time

that you run it, it can start in a fe w seconds.

The

docker ps

output includes an autogenerated instance name, in this case

prickly_meitner.Tokilltheinstance,rundocker kill prickly_meitner.

Chapter 6. Using and Managing Containers

There are a number of s tanda rd Docker features that are good to know. The

ﬂag

-it

connects the container’s standard I/O to the shell that ran the

docker

command. We used that when starting Jupyter, which is why we saw the output.

If you do not want to interact with the container while it is running , you can use

the ﬂag -d to make it run in detached mode.

Also useful is the Docker mechanism that allows a container to access a disk

volume on the hos t machine, so that a process in your container can save ﬁles

(when a container is termin ated, its ﬁle system goes away as well), or access your

local data collections. To mount a local directory on your laptop as a volume on

the Docker container ﬁle system, use the

-v localdir:/containername

ﬂag. (If

you are running on Win dows 10, you need to access the Docker settings and give

Docker permission to see and modify drive C.)

The following commands illustrates the use of both

-it

and

-v

.Weﬁrstuse

the

docker

command on a Mac to launch a Linux Ubuntu container with the

Mac’s

/tmp

directory mounted as

/localtmp

.Dueto

-it

, we are presented with

a command prompt fo r the newly started Ubuntu container. We then run

the container to list its ﬁle systems, which include /localtmp.

docker run -it -v /tmp:/ localtmp ubuntu

root@3148dd31e6c7 :/# df

Filesystem 1K-blocks Used Available Use% Mounted on

none 61890340 41968556 16754860 72% /

tmpfs 1022920 0 1022920 0% /dev

tmpfs 1022920 0 1022920 0% /sys/fs/cgroup

osxfs 975568896 143623524 831689372 15% /localtmp

/dev/vda2 61890340 41968556 16754860 72% /etc/hosts

shm 65536 0 65536 0% / dev/ shm

root@3148dd31e6c7 :/#

Notice that when we connect to the new container, we d o so as root. Running

Jupyter always presents a security challenge, especially if you are runn in g it on a

machine with a public IP address. You should certainly use

HTTPS

and a password,

especially if you are running it on a remote VM in the cloud. We can conﬁgure

these options by using the

-e

ﬂag on the

run

command to pass environment ﬂags

through to Jupyter. For example,

-e GEN_CERT=yes

tells Jupyter to generate a

self-signed SSL certiﬁcate and to use

HTTPS

instead of

HTTP

for access. To tel l

Jupyter to use a password, we need to do a bit more work. Start Python and issue

the following commands to created a hashed password:

In [1]: import IPython

In [2]: IPython . lib . passwd()

Enter password :

Verify password :

Out [2]: ' sha1: db02b6ac4747 : fc0561c714e52f9200a058b529376bc1c7cb7398 '

6.3. Containers for Sc ien ce

Remember your password and copy the o utpu t string. Let’s assume that we

also want to mount a local directory

c:/tmp/docmnt

as a local directory

docmnt

inside the container. Jupyter has a user called

jovyan

and the working directory

is /home/jovyan/work. The complete command for running Jupyter is then:

$dockerrun-eGEN_CERT=yes-d-p8888:8888\

-v / tmp/ docmnt :/home/jovyan/work/ docmnt \

jupyter/scipy -notebook start -notebook .sh \

--NotebookApp .password =' sha1 :.... value from above '

This command launches Jupyter via HTTPS with your new password. When

the container is up, you can connect to it via HTTPS at your host’s IP address and

port 8888. Your browser may complain that this is not a valid web page, because

you have created a self-signed certiﬁcate for this s ite. You can accept the risk, and

you should see the page shown in ﬁgure 6.2.

Figure 6.2: View of Jupyter in the browser after accepting the security exception.

We’ll mention one last ﬂag: If you are running on a machine with multiple cores,

you can use

–cpuset-cpus

to specify how many cores to use for your container.

For example, –cpuset-cpus 0-7 speciﬁes that cores 0 through 7 are to be used.

6.3 Containers for Science

An impressive number of scientiﬁc applications have already been containerized,

and the number is growing daily. The following are just a few of those to be found

in the Docker repository.

•

Radio astronomy tools are available, including containers for LOFAR, PyIm-

ager, and MeqTree.

Chapter 6. Using and Managing Containers

•

For bioinformatics, the ever-popular Galaxy toolkit is available in various

forms. The University of Hamburg genome toolkit is also available.

•

For mathematics and statistics, there are R and P ython, which have been

packaged with NumPy; others also are available in various co mbinations.

•

For machine learning, there are complete collections of ML algorithms written

in Julia as well as many versions of Spark, the Vowpal Wabbit tools, and the

scikit-learn Python tools.

• For geos patial data, there is a container with geoserver.

•

For digital archiving and data curation, there are containers for DSpace and

iRODS.

•

The iPlant consortium has developed the Agave science-as-a-service platform

agaveapi.co , the various components of which are now containerized.

•

The UberCloud project

theubercloud.com

has produced many containerized

science and engineering applications.

In each case, you can spin up a running instance of the so ftware in seconds on

a Linux, Mac, or PC with Docker instal led . Can we assume, then, that all the

problems of science and engineering in the cloud are sol ved? Unfortunately not.

What if you want to run a cluster of Docker containers that a re to share a large

workload? Or a big Spark deployment? We cover these topics in chapter 7.

Binder

represents another excellent use of containers. This tool allows you

to take a Jupyter notebook in GitHub and automatically build a container that,

when invoked from GitHub, is launched on a Kubernetes cluster. (We describe

Kubernetes in section 7.6.5 on page 120.)

6.4 Creating Your Own Container

Creating your own container image and storing it in the Docker Hub is simple.

Suppose you have a Python application that opens a web server, waits for you to

provide input, and then us es that input to pull up an image and displ ay it. Now,

let’s build this little server and its image data as a container. We use a Python

application based on the Bottle framework for creatin g the web server. (All code

for this example is available in the

Extras

tab of the book website.) Assume th e

images are all stored as

jpg

ﬁles in a directory called

images

.Wearegoingtouse

6.5. Summary

this container as the basis for other scientiﬁc Python containers, so we make sure

that it includes all the standard SciPy tools. In addition, since future versions of

this container will want to interact with Amazon, we include the Amazon Boto3

SDK in the container. We now have all the elements we need for the container’s

layered ﬁle system. We next create a ﬁle named Dockerfile, as follows.

FROM jupyter/scipy -notebook

MAINTAINER your name <yourname@gmail .com >

RUN pip install bottle

COPY images / images

COPY bottleserver .py /

ENTRYPOINT ["ipython", "/bottleserver.py" ]

The ﬁrst line speciﬁes that we want to build on

jupyter/scipy-notebook

well-maintained container in the Docker Hub. We could have started at a lower

level, such as basic Ubuntu Linux. But

jupyter/scipy-notebook

has everything

we need except for Boto3 and Bottle, so we add those layers by running a

pip

install

for each. We next add our images and ﬁnally our Python source code.

The

ENTRYPOINT

line tells Docker what to execute when the container runs. We

can now build the container with the

docker build

command. The result is shown

in ﬁgure 6.3 on the following page.

The ﬁrst time you run

docker build

, it downloads all the components for

jupyter/scipy-notebook

, Boto3, and Bottle, which takes a minute or so. After

these components have been downloaded, they are cached on your local machine.

Note that all the

pip install

s have been run and layered into the ﬁle system.

Even the Python code was parsed to check for errors (see step 7). Because of this

preinstallation, when the container is run, everything is there already.

Docker run -d -p 8000:8000 yourname/ bottlesamp

Create a free Docker account and save your container to the Docker Hub as

follows. You can then download your container to the cloud and run it there.

docker push yourname/ bottlesamp

6.5 Summary

Containers provide an excellent tool for encapsulating scientiﬁc applications in

a way that allows them to be deployed on any cloud. We have shown how to

deploy Jupyter on any Docker-enabled laptop or cloud VM with one

docker run

command. Many other scientiﬁc applications have also been packaged as containers

Chapter 6. Using and Managing Containers

>dockerbuild-t="yourname/bottlesamp".

Sending build context to Docker daemon 264.2 kB

Step 1 : FROM jupyter /scipy - notebook

---> 3e6809ce29ee

Step 2 : MAINTAINER Your Name <you@gmail .com >

---> Using cache

---> 5f09a5508c7b

Step 3 : RUN pip install boto3

---> Using cache

---> 74cfec535986

Step 4 : RUN pip install bottle

---> Using cache

---> d5a33c1b900a

Step 5 : COPY images /images

---> Using cache

---> 8cff8cd7c147

Step 6 : COPY bottleserver .py /

---> ebcb834dcc23

Removing intermediate container 8b3415f5ab12

Step 7 : ENTRYPOINT ipython /bottleserver .py

---> Running in c6d63ce5b327

---> c0ba12fd36d8

Removing intermediate container c6d63ce5b327

Successfully built c0ba12fd36d8

Figure 6.3: Output from a Docker build command.

and are easily downloaded and used, from standard bioinformatics tools such as

Galaxy to commercial packages such as Matlab.

We have also shown how to build a container from the most basic layer, layering

your own applications and their dependencies on top. Containers can run on a

fraction of a core or control multiple cores on a multicore server. Containers can

mount directories in the host system as volumes, and multiple containers on a

single host can share these volumes. A tool called Docker-compose, not discussed

here, allows containers to communicate with each other by messages. We describe

in chapter 7 how to coordinate hundreds of containers on large clusters.

6.6 Resources

We have covered only a few Docker capabilities here. Fortunately, excellent

online resources are available. Of the many books on Docker, we particularly

6.6. Resources

like The Docker Book [

249

] and the more recent Docker: Up & Running [

192

Other technologies exist for building and executing containers. Singularity [

177

]

singularity.lbl.gov has attracted interest from the HPC community.