Chapter 4. Computing as a Service
called
MapReduce
[
108
], made popular by the Hadoop computational framework
[
258
]. MapReduce is related to a style of parallel com pu ting known as
bulk
synchro no us parallelism
(BSP) [
139
]. We cover these topics, also, in chapter 7.
A compelling feature of the cloud is that it provides many ways to create highly
scaled applications that are also interactive. The
Spark
system [
265
], originally
developed at University of California Berkeley, is more flexible than Hadoop and is
a form of BSP computing that can be used interactively from Jupyter. Google has
released a service called
Cloud Datalab
, based on Jupyter, for interactive control
of its data analytics cloud. The Microsoft Cloud Business Intelligence (
Cloud BI
)
tool supports interactive access to d ata queries and visualization. We discuss these
tools in chapter 8.
Managing your cloud computing resources can become complicated when
you need to scale beyond a few VMs or containers. Keeping track of many
processes spread over many cloud VMs is not easy. Fortunately, several new
tools have been adopted by the public clouds to help with this challenge. For
managing large numbers of containers, you can use the Docker
Swa rm
tools
docker.com/products/docker-swarm
and Google’s
Kubernetes container man-
agement
[
26
,
79
] (which Google uses for its own container man agement). Many
people use the venerable
HTCondor
system [
243
] to manage many task parallel
computation. (HTCondor is used in the Globus Genomics system that we describe
in chapter 11.)
Mesos
[
154
] provides another di stribu ted operating system with
a web interface that allows you to manage many a ppl icati ons in the cloud at the
same time. All of these systems are already available, or can easil y be deployed,
on cl oud platforms. We describe them in chapter 7.
One other computation service programming model that is common in cloud
computing is
dataflow
. This model plays a significant role in the analysis of
streaming data, as discussed in chapter 9.
4.3 Serverless Computing
An interesting recent trend in cloud computing is the introduction of
serverless
computing
as a new paradigm for service delivery. As we show in the chapters
ahead, computation and data analysis can be deployed in the cloud via a range
of special services. In the majority of cases, the user must deploy VMs, either
directly or indirectly, to support these capabilities. Doing so takes time, and the
user is responsible for deleting the VMs when they are no longer needed. At times.
however, this overhead is not acceptable, such as when you want an action to take
place in response to a relatively rare event. The cost of keeping a VM running
67