Part II
Computing in the Cloud
Managing&data&in&the&cloud
File%systems
Object%stores
Datab a s es%(SQL)
NoSQL%and%graphs
Warehouses
Globus%file%services
Computing&in&the&cloud
Virtual%machines
Containers% Docker
MapReduce% Yarn%and%Spark
HPC%clust ers%in%the%cloud
Mesos,%Swarm,%Kubernetes
HTCond or
The&cloud&as&platform
Data%analytics
Spark%&%Hadoop
Public%cloud%Tool s
Streaming%data
Kafka,%Spark,%Beam
Kinesis,%Azure%Events
Machine%learning
Scikit-Learn,%CNTK,%
Tenso rf lo w,%AWS% ML
Building&your&own&cloud
What%you%need%to%know
Using%Eucalyptus
Using%OpenStack
Security&and&other&topics
Securing%services%and%data%
Solutions
History,%critiques,%futures
Research%data%portals
DMZs%and%DT Ns,%Globu s
Science%gateways
Part&I
Part&II
Part&III
Part&IV
Part&V
Part II:
Computing in the Cloud
While scientists were first attracted to the cloud by the ability to store and
share data, it was the introducti on o f cheap on-demand computing that created a
paradigm shift. In this second part of the book, we follow the pattern established
in the preceding one: we first introduce principles and then show how you can use
both cloud portals and Python SDKs to compute on various cloud platforms.
Computing in the cloud has gone through a fascinating evolution. It started
with virtualization, an old computing technology first invented in the context of
mainframe computers and later adopted within data centers as a means of al lowing
customers to create environments and services that are uniquely tailored to their
needs. Virtual machines can be started and stopped easily, an d the customer is
charged only for the time that the machine instance is running. In chapter 5, we
describe how to create and manage virtual machines on cloud platforms.
A second stage of the evolution of computing in the cloud was the introduction
of containers as a means of encapsulating software. Container technologies allow
researchers to share deployed applications that can be deployed rapidly on any
cloud and then run with a single command. In chapter 6, we show you how to
create and deploy containers based on a technology called Docker.
Scale has always been a critical cloud capability and a major requirement of
scientists. By “scale” we mean the ability of computation to be spread over multip le
cloud servers to exploit parallelism in the application. In chapter 7, we consider
four types of parallel application execution:
SPMD clusters in the cloud, for traditional HPC-style computing.
Many task
or
high throughput
parallel computation, characterized by a
large bag of tasks with few or no dependencies and that thus can be executed
in parallel.
MapReduce and BSP
style parallelism, in which a singl e thread of control
applies parallel operators over distributed data. In the cloud, such compu-
tations often involve executing a directed graph of data parallel operations.
This model is used in tools such as Spark, many of the open source data
analytics tools, and most of the deep learning systems that we discuss in
part III.
Microservices
, the most “cloud native” computational model. It uses frame-
works such as M esos and Kubernetes to allow applications to be composed
of swarms of dockerized, mostly stateless, small communicating services.
We also include a short discussion of
serverless computing
, a relatively new
capability of the big public clouds that, l ike many computational ideas, has deep
roots in operating system des ign. Briefly stated, it allows a programmer to define
the code for an application plus the events that should cause that code to execute,
and then release the code to the cloud in such a way that its execution does not
require any cloud resource deployment by the user or programmer.
62