Chapter 6
Using and Managing Containers
“To see a world in a grain of sand ...”
—William Blake, Auguries of Innocence
Containers have become one of the most interesting and versatile alternatives to
virtual machines for encapsulating applications for cloud execution. They enable
you to run an application without source code modification or even recompilation
on your Windows PC, your Mac, or any cloud. In this chapter we look at the
history of their development, the key ideas that make them work, and of course
how to use them on clouds. We focus on
Docker
container technology, because it
is the most widely known and used, is easy to download and install, and is free.
We b egin with some basics about containers. We show how to install Jupyter
with Docker, and we mention other sci entific applications that have been container-
ized. We then provide a simple example of how to create your own container that
you can then run in the cloud or on your laptop.
6.1 Container Basics
People used to think that the best (indeed, in many cases, only) way to encapsulate
software for deployment in the cloud was to create a VM image. That image
could be shared with others, for example by placing it in a repository for Amazon
images or in the Microsoft VM depot. Anybody could then deploy and run the
image within an appropriate data center. However, not all virtualization tools are
the same, so running a VM from Amazon on Azure or other clouds was a real
6.1. Container Basics
problem. Endless debates arose about the merits and demerits of this situation,
which usually went something like this: “This is just another form of evil vendor
lock-in!” A great deal of thought was given to finding ways to address this evilness.
Meanwhile, others realized that the Linux kernel had some nice features that
could be used to bound and contain the resource utilization of process es: in
particular, control groups and name space isolation. These features allow for the
layering of new private virtual file system components on top of the host file system
and a specia l partitioned process space for applications to run usi ng libraries
virtually stored in the layered file system. For all practical purposes, a program
running in a container looks like it is running in its own V M, but without the
extra baggage of a complete OS. A contained application uses the resources of the
host OS, which can even control the amount of resource devoted to each container:
for example, the CPU percentage and the amount of memory and disk space.
By mid-2013, a l ittle company called dotCloud released a tool that provided
a better way to deploy encapsulated applications. This tool became Docker [
71
],
and dotCloud became Docker, Inc.
docker.com
. Microsoft also figured out how to
do the same thing with Windows. While other container technologies exist, we
focus primarily on Docker in this book.
Docker allows applications to be provisioned in containers that encapsulate all
application dependencies. The application sees a complete, priva te process space,
file system, and network interface isolated from applications in other containers on
the same host operating system. Docker isolation provides a way to factor large
applications, as well as simple ways for running containers to communicate with
each other. When Docker is installed on Linux, Windows 10, or Mac, it runs on
a base Linux kernel called Alpine that is u sed for every container instance. As
we describe below, additional OS features are layered on top of that base. This
layering is the key to container portabili ty across clouds.
Docker is designed to support a variety of distributed applications. It is now
widely used in the Internet industry, including by major companies like Yelp,
Spotify, Baidu, Yandex, and eBay. Importantly, Docker is supported by major
public cloud providers, including Google, Microsoft, Amazon, and IBM.
To understand how containers are built and used, one must understand how the
file system in a container is layered on top of the existin g host services. The key is
the
Union File System
(more p recisel y, the advanced multilayered unification
file system (AuFS) and a special property called copy on write that allows the
system to reuse many data objects in multiple containers.
86
Chapter 6. Using and Managing Containers
Docker images are composed of layers in the Union File System. The image is
itself a stack of read-only directories. The base is a simplified Linux or Windows
file system. As illustrated in figure 6. 1, a ddi tion al tools that the container needs
are then layered on top of that base, each in i ts own layer. When the container is
run, a final writable file system is layered on top.
Figure 6.1: The Docker Union File System is layered on a s tandard base.
As an application in the container executes, it uses th e writable layer. If it
needs to modify an object in the read-only layers, it copies those objects into
the writable layer. Otherwise, it uses the data in the read-only layer, which is
shared with other container instances. Thus, typically only a little of the container
image needs to be a ctually loaded when a container is run, which means that
containers can load and run much faster than virtual machines. In fact, launching
a container typically takes less than a second, while starting a virtual machine can
take minutes.
In addition to the file system layers in the container image, you can mount
a host machine directory as a file s ystem in the container’s OS. In this way a
container can share data with the host. Multiple containers can also share these
mounted directories and can use them for basic communication of shared data.
6.2 Docker and the Hub
The Docker website, Docker.com, provides the tools needed to install Docker on a
Linux, Mac, or Windows 10 PC. This site al so links to the Docker Hub, a public
resource where you can store your own containers, and search for and download
any of the hundreds of public containers.
87
6.2. Docker and the Hub
Install Jupyter with Docker on your laptop
.YoumustrstinstallDockeron
your machine. While the details dier on Linux, Mac, or PC, the installation is a
simple process, similar to that of installing a new browser or other desktop application.
Follow the download and install instructions on the Docker.com website. Docker
does not have a graphical interface: it is based on a command line API. Hence, you
need to open a “powershell” or “terminal” window on your machine. The Docker
commands are then the same on Linux, Mac, or PC.
Once you have installed Docker, you can verify that it is running by execu ting
the
docker ps
command, which tells you which containers are running. You should
see the following outp ut, as no containers are running.
C:\> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
C:\>
We launch Jupyter with the
docker run
command. The following example uses
two of the many parameters supported. The first flag,
-it
,causestheprinting
of a URL with a token that you can use to connect to the new Jupyter instance.
The se cond flag,
-p 8888:8888
, binds port 8888 in the container’s IP stack to port
8888 on our machine. Finally, the command specifi es the name of the container,
jupyter/scipy-notebook,asitcanbefoundintheDockerHub.
C:\> docker run -it -p 8888:8888 jupyter/scipy-notebook
Copy/paste this URL into your browser when you connect for the
first time , to login with a token :
http :// localhost :8888/? token =b9fc19aa8762a6c781308bb8dae27a ...
Rerunning the
docker ps
command shows that our newly started Jupyter note-
book is now running. (Confusingly, each of the two output lines is wrapped.)
C:\> docker ps
CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
6cb4532fa0b jpyter/scipy-notebook "tini--start-note" 6 seconds ago
up 5 secon ds 0.0.0.:8888- >8888/tcp prickly_meitner
C:\>
The first time you execute this command for a specific container, it must search
for and d ownload various elements of the container file system, which may take
several minutes. Then, because the container
jupyter/scipy-notebook
is in the
Docker Hub, it finds the container there and begins to download it. Once completed,
it then starts the c ontainer. The container image is now local; thus, the next time
that you run it, it can start in a fe w seconds.
The
docker ps
output includes an autogenerated instance name, in this case
prickly_meitner.Tokilltheinstance,rundocker kill prickly_meitner.
88
Chapter 6. Using and Managing Containers
There are a number of s tanda rd Docker features that are good to know. The
flag
-it
connects the container’s standard I/O to the shell that ran the
docker
command. We used that when starting Jupyter, which is why we saw the output.
If you do not want to interact with the container while it is running , you can use
the flag -d to make it run in detached mode.
Also useful is the Docker mechanism that allows a container to access a disk
volume on the hos t machine, so that a process in your container can save files
(when a container is termin ated, its file system goes away as well), or access your
local data collections. To mount a local directory on your laptop as a volume on
the Docker container file system, use the
-v localdir:/containername
flag. (If
you are running on Win dows 10, you need to access the Docker settings and give
Docker permission to see and modify drive C.)
The following commands illustrates the use of both
-it
and
-v
.Werstuse
the
docker
command on a Mac to launch a Linux Ubuntu container with the
Mac’s
/tmp
directory mounted as
/localtmp
.Dueto
-it
, we are presented with
a command prompt fo r the newly started Ubuntu container. We then run
df
in
the container to list its file systems, which include /localtmp.
docker run -it -v /tmp:/ localtmp ubuntu
root@3148dd31e6c7 :/# df
Filesystem 1K-blocks Used Available Use% Mounted on
none 61890340 41968556 16754860 72% /
tmpfs 1022920 0 1022920 0% /dev
tmpfs 1022920 0 1022920 0% /sys/fs/cgroup
osxfs 975568896 143623524 831689372 15% /localtmp
/dev/vda2 61890340 41968556 16754860 72% /etc/hosts
shm 65536 0 65536 0% / dev/ shm
root@3148dd31e6c7 :/#
Notice that when we connect to the new container, we d o so as root. Running
Jupyter always presents a security challenge, especially if you are runn in g it on a
machine with a public IP address. You should certainly use
HTTPS
and a password,
especially if you are running it on a remote VM in the cloud. We can configure
these options by using the
-e
flag on the
run
command to pass environment flags
through to Jupyter. For example,
-e GEN_CERT=yes
tells Jupyter to generate a
self-signed SSL certificate and to use
HTTPS
instead of
HTTP
for access. To tel l
Jupyter to use a password, we need to do a bit more work. Start Python and issue
the following commands to created a hashed password:
In [1]: import IPython
In [2]: IPython . lib . passwd()
Enter password :
Verify password :
Out [2]: ' sha1: db02b6ac4747 : fc0561c714e52f9200a058b529376bc1c7cb7398 '
89
6.3. Containers for Sc ien ce
Remember your password and copy the o utpu t string. Let’s assume that we
also want to mount a local directory
c:/tmp/docmnt
as a local directory
docmnt
inside the container. Jupyter has a user called
jovyan
and the working directory
is /home/jovyan/work. The complete command for running Jupyter is then:
$dockerrun-eGEN_CERT=yes-d-p8888:8888\
-v / tmp/ docmnt :/home/jovyan/work/ docmnt \
jupyter/scipy -notebook start -notebook .sh \
--NotebookApp .password =' sha1 :.... value from above '
This command launches Jupyter via HTTPS with your new password. When
the container is up, you can connect to it via HTTPS at your host’s IP address and
port 8888. Your browser may complain that this is not a valid web page, because
you have created a self-signed certificate for this s ite. You can accept the risk, and
you should see the page shown in figure 6.2.
Figure 6.2: View of Jupyter in the browser after accepting the security exception.
We’ll mention one last flag: If you are running on a machine with multiple cores,
you can use
–cpuset-cpus
to specify how many cores to use for your container.
For example, –cpuset-cpus 0-7 specifies that cores 0 through 7 are to be used.
6.3 Containers for Science
An impressive number of scientific applications have already been containerized,
and the number is growing daily. The following are just a few of those to be found
in the Docker repository.
Radio astronomy tools are available, including containers for LOFAR, PyIm-
ager, and MeqTree.
90
Chapter 6. Using and Managing Containers
For bioinformatics, the ever-popular Galaxy toolkit is available in various
forms. The University of Hamburg genome toolkit is also available.
For mathematics and statistics, there are R and P ython, which have been
packaged with NumPy; others also are available in various co mbinations.
For machine learning, there are complete collections of ML algorithms written
in Julia as well as many versions of Spark, the Vowpal Wabbit tools, and the
scikit-learn Python tools.
For geos patial data, there is a container with geoserver.
For digital archiving and data curation, there are containers for DSpace and
iRODS.
The iPlant consortium has developed the Agave science-as-a-service platform
agaveapi.co , the various components of which are now containerized.
The UberCloud project
theubercloud.com
has produced many containerized
science and engineering applications.
In each case, you can spin up a running instance of the so ftware in seconds on
a Linux, Mac, or PC with Docker instal led . Can we assume, then, that all the
problems of science and engineering in the cloud are sol ved? Unfortunately not.
What if you want to run a cluster of Docker containers that a re to share a large
workload? Or a big Spark deployment? We cover these topics in chapter 7.
Binder
represents another excellent use of containers. This tool allows you
to take a Jupyter notebook in GitHub and automatically build a container that,
when invoked from GitHub, is launched on a Kubernetes cluster. (We describe
Kubernetes in section 7.6.5 on page 120.)
6.4 Creating Your Own Container
Creating your own container image and storing it in the Docker Hub is simple.
Suppose you have a Python application that opens a web server, waits for you to
provide input, and then us es that input to pull up an image and displ ay it. Now,
let’s build this little server and its image data as a container. We use a Python
application based on the Bottle framework for creatin g the web server. (All code
for this example is available in the
Extras
tab of the book website.) Assume th e
images are all stored as
jpg
files in a directory called
images
.Wearegoingtouse
91
6.5. Summary
this container as the basis for other scientific Python containers, so we make sure
that it includes all the standard SciPy tools. In addition, since future versions of
this container will want to interact with Amazon, we include the Amazon Boto3
SDK in the container. We now have all the elements we need for the container’s
layered file system. We next create a file named Dockerfile, as follows.
FROM jupyter/scipy -notebook
MAINTAINER your name <yourname@gmail .com >
RUN pip install bottle
COPY images / images
COPY bottleserver .py /
ENTRYPOINT ["ipython", "/bottleserver.py" ]
The first line specifies that we want to build on
jupyter/scipy-notebook
,a
well-maintained container in the Docker Hub. We could have started at a lower
level, such as basic Ubuntu Linux. But
jupyter/scipy-notebook
has everything
we need except for Boto3 and Bottle, so we add those layers by running a
pip
install
for each. We next add our images and finally our Python source code.
The
ENTRYPOINT
line tells Docker what to execute when the container runs. We
can now build the container with the
docker build
command. The result is shown
in figure 6.3 on the following page.
The first time you run
docker build
, it downloads all the components for
jupyter/scipy-notebook
, Boto3, and Bottle, which takes a minute or so. After
these components have been downloaded, they are cached on your local machine.
Note that all the
pip install
s have been run and layered into the file system.
Even the Python code was parsed to check for errors (see step 7). Because of this
preinstallation, when the container is run, everything is there already.
Docker run -d -p 8000:8000 yourname/ bottlesamp
Create a free Docker account and save your container to the Docker Hub as
follows. You can then download your container to the cloud and run it there.
docker push yourname/ bottlesamp
6.5 Summary
Containers provide an excellent tool for encapsulating scientific applications in
a way that allows them to be deployed on any cloud. We have shown how to
deploy Jupyter on any Docker-enabled laptop or cloud VM with one
docker run
command. Many other scientific applications have also been packaged as containers
92
Chapter 6. Using and Managing Containers
>dockerbuild-t="yourname/bottlesamp".
Sending build context to Docker daemon 264.2 kB
Step 1 : FROM jupyter /scipy - notebook
---> 3e6809ce29ee
Step 2 : MAINTAINER Your Name <you@gmail .com >
---> Using cache
---> 5f09a5508c7b
Step 3 : RUN pip install boto3
---> Using cache
---> 74cfec535986
Step 4 : RUN pip install bottle
---> Using cache
---> d5a33c1b900a
Step 5 : COPY images /images
---> Using cache
---> 8cff8cd7c147
Step 6 : COPY bottleserver .py /
---> ebcb834dcc23
Removing intermediate container 8b3415f5ab12
Step 7 : ENTRYPOINT ipython /bottleserver .py
---> Running in c6d63ce5b327
---> c0ba12fd36d8
Removing intermediate container c6d63ce5b327
Successfully built c0ba12fd36d8
Figure 6.3: Output from a Docker build command.
and are easily downloaded and used, from standard bioinformatics tools such as
Galaxy to commercial packages such as Matlab.
We have also shown how to build a container from the most basic layer, layering
your own applications and their dependencies on top. Containers can run on a
fraction of a core or control multiple cores on a multicore server. Containers can
mount directories in the host system as volumes, and multiple containers on a
single host can share these volumes. A tool called Docker-compose, not discussed
here, allows containers to communicate with each other by messages. We describe
in chapter 7 how to coordinate hundreds of containers on large clusters.
6.6 Resources
We have covered only a few Docker capabilities here. Fortunately, excellent
online resources are available. Of the many books on Docker, we particularly
93
6.6. Resources
like The Docker Book [
249
] and the more recent Docker: Up & Running [
192
].
Other technologies exist for building and executing containers. Singularity [
177
]
singularity.lbl.gov has attracted interest from the HPC community.
94