Chapter 1
Orienting in the cloud universe
“I’ve also grown weary of reading about clouds in a book. Doesn’t this
piss you o? You’re reading a nice story, and suddenly the writer has
to stop and describe the clouds. Who cares?”
—George Carlin, “Seven Things I’m Tired Of
We start this journey into cloud computing for science and engineering by intro-
ducing important concepts and the structure of the book, and reviewing tools that
you should know in order to obtain the most value from thi s material.
1.1 Cloud: Computer, assistant, and platform
Scientists and engineers can apply cloud capabilities in their work in many dierent
ways. We find it useful to think in terms of three categories of use.
First, a cloud is an
elastic computer
: a source of on-demand comp utin g
and storage that you can call upon when you need computing or storage capacity
larger than, or dierent from, what is available locally. Accessing this capacity
in the cloud may be cheaper, faster, and/or more convenient than acquiring and
operating your own computing and storage systems. While there are dierences
between the cloud computing and storage oerings from dierent cloud providers,
they provide quite similar capabilities: in particular, object storage and execution
of virtual machines and containers. We cover this
infrastructure as a service
(IaaS) technology and its applications in Parts I and II.
1
1.1. Cloud: Computer, assistant, and platform
Figure 1.1: Scientists can use clouds in three distinct ways: As a source of on-demand
computing and storage on which to run their own software (left); as a source of software
that can be run over the network (center) as a source of new platform capabilities that
can allow development of new types of software (right).
Second, a cloud is a tireless
laboratory assistant
: a source of powerful
software that can perform certain tasks more eectively and/or cheaply than you
can yourself: for example, Academia.edu, Google Scholar, and ResearchGate to
access information about publications, facilitating research and citation; GitHub to
manage software and documents, facilitating collaboration, software sharing, and
reproducibility; Google Docs, Box, and Dropbox to share data; Science Exchange
to order experiments; Figshare for publishing data; Globus to move and m ana ge
large data; Skype and other services for communication; and m any others. In each
case, you can avoid substantial cogni tive, administrative, and financial burdens
that you or members of your laboratory would incur if they had to perform these
tasks themselves. These
software as a service (SaaS)
capabilities are important,
but are largely out of scope for this book, although we do discuss how to build
your own software as a service in Chapter 14.
Third, a cloud is a
programming platform
: that is, a collection of powerful
software mechanisms that you can use to build software with capabilities that
would be dicult or expensive to duplicate in your own lab : for example, an event
processing system that can process millions of events per second, a database that can
scale to billions of rows, an identity management service that can handle dozens of
dierent identity providers, a data transfer service that can move terabytes securely
and reliably, or a service that is replicated in multiple geographic regions to ensure
continuous operations. These platform capabilities are arguably the most exciting
2
Chapter 1. Orienting in the cloud u niverse
part of cloud computing, because they enable i ndi vidu al programmers to create
and operate software systems that would otherwise require large teams. They allow
the cloud to be used as an interactive environment for large-scale computational
experimentation and discovery. They can al so be the most challenging to use
eectively, because they have often been developed for use cases rather dierent
from traditional technical computing. In addition, it is in this area that we see the
biggest variation across cloud vendors in terms of capabilities and interfaces. We
discuss these platform as a service (PaaS) capabilities in Part III.
Inevitably, the boundaries between these dierent types of cloud system and
cloud usage are not always crisply defined. For example, a growing number of
software-as-a-service oerings provide APIs that allow them to be used as pl atform
services, and we often see (as discussed in Part III) platform services enhancing
the value of virtual computer oerings.
1.2 The cloud landscape
The cloud landscape is large, diverse, and com pl ex. The U.S. National Institute of
Standards and Technology lists five essential characteristics of cloud co mputing :
on-demand self-service, broad network access, resource pooling, rapid elasticity
or expansion, and measured service [
197
]. Today, thousands of companies oer
services with some or all of these characteristics, from low-level computing and
storage to sop hi sticated software: see Figure 1.2. But apart from the collaboration
and content management systems listed above, few of the commercial cloud services
shown in the figure are relevant to science and engineering.
One major exception is in the realm of clou d infrastructure: the elastic compute
services th at allow individuals to acquire storage and computing on demand. Here,
the landscap e is simpler, particularly when we focus on providers with oerings
relevant to science and engineering. (Others specialize in specific products, such
as Oracle for databases or AT&T for telecom.) Three vendors, Amazon, Google,
and Microsoft, dominate the industry, as shown in Table 1.1 on the next page,
and each has proven useful for science and engineering. We focus in this book
on the services provided by those three providers and by one academic research
cloud, Jetstream [
122
]
jetstream-cloud.org
. Nevertheless, other cloud providers
are al so impressive. For example, the New York-based DigitalOcean is popular
in the software engineering and cloud application development community, while
Rackspace supports those using the Amazon and Microsoft clouds as well as
Ashaded,roundedrectangledenotesan
https
URL, in this case
https://jetstream.org
.
3
1.2. The cloud landscape
Figure 1.2: While dated, Bessemer Venture Partner’s picture of the top 300 cloud
computing companies in 2012 conveys the vast range of cloud service providers.
running its own cloud servers. Europ ean cloud providers include 1&1, UpCloud,
City Cloud, CloudSigma, CloudWatt, and Aruba Cloud. Large telecommunications
and search companies, such as China’s Baidu, are also rapidly building cloud data
centers. Together, these various companies operate more than one hundred data
centers around the globe, containing an estimated ten million servers and vast
storage. (We base these estimates on news articles [63, 169, 96, 204].)
The cloud services operated by Amazon, Google, and Microsoft are commonly
referred to as
public clouds
, by analogy with the public utilities (p ower, telephone,
Table 1.1: Ma jor cloud infrastructure providers of relevance to research.
Amazon
Market leader. Computing, storage, and platform services.
Extensively used in science and engineering.
Microsoft
Second biggest player. Computing, storage, and platform
services with both individual and enterprise customers.
Google
Began with a service called App Engine and is now using that
experience to release a full suite of cloud capabilities.
4