Chapter 16

History, Critiques, Futures

“I’ve looked at clouds from both sides now / From up and down and

still somehow / It’s cloud’s illusions I recall / I really don’t know clouds

at all”

—Joni Mitchell

We have devoted 14 chapters to how you can use the cloud for scientiﬁc research.

We now spend some time on context, covering, in turn, the historical context

from which today’s cloud emerged; contemporary critiques of cloud computing;

and some important directions in which cloud technologies are developing. This

material is brief, but we hope will stimulate thought and discussion.

16.1 Historical Perspectives

The idea of computing as a utility is far from new. Artiﬁcial intelligence pioneer

Professor John McCarthy, speaking at MIT’s centennial celebration in 1961, opined

that: “Computing may someday be organized as a public utility just as the telephone

system is a public utility.” He went on to predict a future in which:

“Each subscriber needs to pay only for the capacity he actually u ses,

but he has access to all programming languages characteristic of a

very large system ... Certain subscribers might oﬀer service to other

subscribers ... The com pu ter utility could become the basis of a new

and important industry.”

16.1. Historical Perspectives

McCarthy’s words were inspired by what he s aw as the possibilities of time

sharing, recently demonstrated in project Multics [

100

]. If many people could

run on the same computer at the same time, then why not leverage economies

of scale and use one computer to serve the needs of many people? This concept

led to the mainframe, but it seems that McCarthy had something more ambitious

in mind: perhaps a single computing utility to serve an entire nation? (At a

similar talk at Stanford, McCarthy was apparently challenged by a physicist who

observed that “this idea will never work: a simple back-of-the-envelope calculation

shows that the amount of copper wire required to connect users to the computing

utility would be impossible.” This exchange provides a useful warning of the

diﬃculties inherent in technological predictions, when new developments—in this

case optical ﬁber—can upend fundamental assumptions. But it was also accurate:

the large-scale realization of computing utilities was for a long ti me hindered by

network limitations.)

These ideas continued to percolate in the imaginations of researchers. In 1966,

Parkhill produced a prescient book-length analysis [

217

] of the challenges and

opportunities of utility computing, and in 1969, when UCLA turned on the ﬁrst

node of the ARPANET, Internet pioneer Leonard Kleinrock claimed that “as

[computer networks] grow up and become more sophisticated, we will probably see

the spread of ‘computer utilities’ which, like present electric and telephone util ities,

will service individual homes and oﬃces across the country” [248].

The large-scale realization of computing utilities had to wait until networks

were faster. In th e early 1990s, various groups started to deploy then-new optical

networking technologies for research purposes. In the US,

gigabit testbeds

linked a number of universities and research laboratories. Inspired by what might

be possible now that computers were connected at speeds cl ose to the memory

bandwidth, researchers started to talk about

metacomputers

[

237

]—virtual

computational systems created by linking components at diﬀerent sites. Out of

these discussions grew the idea of a computational grid, which “by analogy to

the electric power grid provides access to power on demand, achieves economies

of scale by aggregation of supply, and depends on large-scale federation of many

suppliers and consumers for its eﬀective operation” [

126

]. Software and protocols

were developed for remote access to storage and computing, and many scientiﬁc

communities leveraged thes e developments to federate computing facilities on local,

national, and even global scales. For example, high energy physicists designing

the Large Hadron Collid er (LHC) realized that they needed to federate computing

systems at hundreds of sites if they were to analyze the many petabytes of data to

be produced by LHC experiments; in response, they developed the LHC Computing

Grid (LCG) [175].

330

Chapter 16. History, Critiques, Futures

Grid computing enabled on-demand access to computing, storage, and other

services, but its impact was primarily limited to science [

127

]. (One exception

was within the enterprise, where “enterprise Grids” were widely d eployed. These

deployments are today o ften called “private clouds,” with the principal diﬀerence

being the use of virtualization to facilitate dynamic resource provisioning.) The

emergence of cloud computing around 2006 is a fascinating story of marketing,

business model, and technological innovation. A cynic could observe, with some

degree of truth, that many a rticles from the 1990s and early 2000s on grid computing

could be—and often were—republished by replacing every occurrence of “grid”

with “cloud.” But this is more a comment on the fashion- and hype-driven nature

of technology journalism (and, we fear, much academic research in computer

science) than on cloud itself. In practice, cloud is about the eﬀective realization

of the economies of scale to which early grid work aspired but did not achieve

because of inadequate suppl y and demand. The success of cloud is due to profound

transformations in these and other aspects of the computing ecosystem.

Cloud is driven, ﬁrst an d foremost, by a transformation in demand. It is

no accident that the ﬁrst successful infrastructure-as-a-service business emerged

from an e-commerce provider. As Amazon CTO Werner Vogels tel ls the story,

Amazon realized, after its ﬁrst dramatic expansion, that it was building out

literally hundreds of similar work-unit computing systems to support the diﬀerent

services that contributed to A mazo n’s online e-commerce platform. Each such

system needed to be able to sca le rapidly its capacity to queue requests, store

data, and acquire computers for data processing. Refactoring across the diﬀerent

services produced services like Amazon’s Simple Queue Service, Simple Storage

Service, and El astic Computing Cloud. Those services (and other similar services

from other cloud providers, as described in previous chapters) have in turn been

successful in the marketplace because many other e-commerce businesses need

similar capabilities, whether to host simple e-commerce sites or to provide more

sophisticated services such as video on demand .

Cloud is also enabled by a transformation in transmission. While the U.S . and

Europe still lag behind broadband leaders such as South Korea and Japan, the

number of households with megabits per second or faster connections is large and

growing. One consequence is the widespread adoption of data-intensive services

such as YouTube and Netﬂix. Another is th at businesses feel increasingly able to

outsource bu sin ess processes such as em ail , customer relationship management,

and accou nting to software-as-a-service (SaaS) vendors.

Finally, cloud is enabled by a transformation in supply. Both IaaS vendors and

companies oﬀering consumer-facing services (e.g., search: Google, auctions: eBay,

331

16.2. Critiques

social networking: Facebook, Twitter) require enormous quantities of computing

and storage. Leveraging advances in commodity computer technologies, these

and other companies have learned how to meet those needs cost eﬀectively within

enormous data centers themselves [

] or, alternatively, have outsourced this aspect

of their business to IaaS vendors. The commoditization of virtualization [

227

]

has facilitated this transformation, making i t far easier than before to allocate

computing resources on demand, with a precisely deﬁned software stack installed.

In our opinion, it is the transformation al changes in demand, transmission, a nd

supply, and the resulting virtuous circle of increased use, better networks, and

reduced costs, that account for the tremendous succes s of cloud technologies. It

will be interesting to see where the next set of disrup tive changes will occur, a

topic that we consider in section 16.3.

16.2 Critiques

The reader will by now have realized that we are great fans of the power of the

outsourcing and automation that cloud computing provides. We believe that by

enabling users faci ng mundane or challengin g computational tasks to focus on

their problem, rather than the task of acquiring and operating computation al

infrastructure, cloud computing can frequently increase productivity and thus

discovery and innovation.

Nevertheless it is also important to be aware of the various critiques that have

been levied against cloud, some of which, in our opinion, speak to real or potential

limitations and some to misunderstandings or diﬀerences of opinion. We review

some of those critiques in the following. (As we have already discussed security

concerns in chapter 15, we do not revisit them here.)

16.2.1 Cost

A common critique of cloud is that it is too expensive. We do not dismiss the

importance of such concerns, particularly in academic settings where personnel

and equipment spending may not be fungible. But without getting into the details

of cost comparisons between on-premises and commercial cloud providers, we point

out that when performing such comparisons, it is important to consider all costs,

including personnel, space, and power. See, for example, Burt Holzman’s 2016

analysis of in-house vs. public cloud computing costs for high energy physics [

155

He found that when power, cooling, and staﬀ costs were included, on-premises

computing in the Fermilab data center cost 0.9 cents p er core hour under the

332

Chapter 16. History, Critiques, Futures

assumption of 100% utilization, while oﬀ-premises computing on Amazon cost 1.4

cents per core hour. The observed computational speeds for their application were

close to identical. Experience suggests that, depending on the sp eci ﬁcs of your

institutional computing environment and workload, cloud costs can be insigniﬁcant,

greater than local costs, or less than local costs.

16.2.2 Lock In

Free so ftware evangelist Richard Stallman [

162

] h as argued that cloud computing

is “simply a trap aimed at forcing more people to buy into locked, proprietary

systems that [will] cost them more and more over time.” He expands upon this

point in an article in the Boston Review [238].

This is a common critique of cloud computing. At issue is the risk that arises

when we become dependent for our computing o n a third party provider. What

if that provider goes out of business, discontinues services on which we depend,

fails to meet desired quality of service commitments, or raises prices? What if

they lose your data? These are real risks that any potential cloud user needs

to evaluate, bala ncin g them against the beneﬁts that cloud brings. On e partial

hedge is to use only services for which equivalents exist from other providers,

and to develop applications that use those services so that they can easily be

retargeted. One way to do this is to build applications in containers, such as

Docker, that allow them to run without modiﬁcation on any commercial cloud.

However, if the application in the container invokes a special platform service, such

as a cloud-speciﬁc NoSQL service or stream broker, then ch ang es are required.

Good design and encapsulation of these dependencies in microservices can mitigate

this problem. A more fundamental issue is the data stored in the cloud. Moving

data can be diﬃcult if they are large. The best solution may be to maintain an

archive of the d ata elsewhere.

16.2.3 Education

We have heard people critique the use of cloud co mpu ting in education on the basis

that students who rely on cloud services for storage and computing will not gain

the hands-on knowledge that is gained from, for example, installin g an d operating

Linux on a computer cluster. (We have both been asked variants of this rather

disturbing question: “How will graduate students gain employment if they cannot

perform systems administration tasks?”)

333

16.2. Critiques

It is easy to dismiss such concerns as Luddite misunderstan di ngs of new

technologies, but we feel that an important point is being made. One should rejoice

in the capabilities that cloud computing provides, but we are the poorer if in

seizing those beneﬁts we lose understanding of the technologies that we are using.

We should be educating students not simply how to use simple cloud services to

perform simple tasks, but how cloud can be a platform for new approaches to

science. We h ope that this book can help in that process.

16.2.4 Black Box Algorithms

Another critique of clou d computing concerns the impact of handing oﬀ various

aspects of your work to proprietary software developed and operated by third parties.

If one cannot read the source code for a software component, obtain accurate

documentation of the methods that it uses, or even test it comprehensively, then

one has presumably lost the ability to determine the precise provenance of any

results obtained with that software [

205

]. A related concern is that software on

which one depends may be updated by a cloud provider without one’s knowledge,

in ways that turn out to aﬀect your results.

These concerns appear to us to be quite real in the case of, fo r example,

proprietary machine learning, data analytics, or computational modeling packages

operated by cloud providers: in such cases, the result of a computation may

indeed depend on decision s, changes, or errors made deep within complex software

packages. We see fewer concern s in the case of systems software: while we may

lack knowledge of how exactly a cloud provider implements a particular data

management function, for examp le, the range of people using the software is larger

and thus undetected errors are l ess likely.

These concerns are by no means new to cloud com pu ting: they arise whenever

results derive from software that can not easily be studied or understood. (Microsoft

Excel, for example, while simple to use, is a complex black box.) The hi gh

complexity and frequent updates associated with cloud software packages do

arguably raise new challenges, but we suggest that simple approaches can be

adopted. Use signals such as peer opinion and documentation to evaluate software

quality. Test with problems for which you know the answers. Use cloud services for

which source code is available—as it often is, as we have detailed in other chapters.

In the case of machine learning methods, seek methods that yield models that

are interpretable by human readers, so that the implications of a model can be

understood and reviewed for hidden bias es.

334

Chapter 16. History, Critiques, Futures

16.2.5 Hardware Limitations

A common critique of cloud computing, at least in the early days, was that cloud

provided only limited hardware choices: it was ok if you wanted a vanilla x86

box, but not if you wanted something special. Today, the range of availabl e

hardware options is surely far greater than exists in any laboratory. Amazon,

Azure and Google provide dozens of machine types, with varying quantities of

CPU cores, memory, GPU capabilities, and other capabilities as we have described

in section 7.2.2 on page 98.

16.3 Futures

The cloud that we have described in this book is the cloud o f 2017. We believe

that many of the technologies and the principles presented here will have a long

life, but we also know that cloud technologies are evolving with great rapidity.

(Amazon, Azure and Google all regularly announce d ozens of new services and

capabilities.) Thus we spend some time prognosticating about areas in which we

believe cloud com puting is likely to evolve in the next several years.

16.3.1 Cloud-native Applications

What does it mean to develop an application for the cloud? As we saw in chapter 4,

it is straightforward to take many existing ap pli cation s, package them so that they

run in a virtual machine, and deploy that virtual machine onto a cloud compute

service. But in so doing, all you have done is eliminate (or at least shift) hardware

costs. You have not changed the essential nature of your appl ications in ways

that take advantage of cloud features such as fault tolerant storage, elasticity, and

powerful services such as those described in part III.

The term

cloud native

is used to describe applications that are written to

take advantage of the powerful collections of services provided by cloud platforms.

The

Cloud Native Computing Foundation www.cncf.io

writes that: “[c]loud

native computing [deploys] applications as microservices, packaging each part into

its own container, and dynami call y orchestrating those containers to optimize

resource utilization.” They describe open so urce software packages available on

Amazon, Azure, and Google, such as Kubernetes and Prometheus, that can be used

to support such applications. We described microservice architecture in section 7.6

on page 110 and illustrated it with our simple scientiﬁc document analyzer. The

cloud-native concept is more than just microservice implementations [

120

]. Cloud-

335

16.3. Futures

native applications have a clear separation between persistent state, such as a

database, and logic that runs in ephemeral virtual machines or containers, as shown

in ﬁgure 16. 1. The Globus service described in section 14.5 on page 307 has these

characteristics.

The tools that you use to deploy such applicati ons (Kubernetes, Mesos, etc.)

also allow you to easily monitor and manage them. Such applications scale eﬀort-

lessly and can be partitioned so that new versions of an application’s microservices

can be deployed and tested alongside the current “active” deployment. If the new

versions work as planned, the old versions can be scaled back and no interruption

in external service is seen.

Figure 16.1: On the left, the conventional deployment approach: each application is

deployed in a virtual machine or container that contains all application state. On the right,

the cloud-native approach: state is maintained in cloud data services and computation is

performed by ephemeral service instances.

As we discussed in section 4.3 on page 67, serverless computing is about having

the cloud mana ge collections of your functions to be executed on special conditions

you deﬁne. This concept is closely related to cloud-native design. Unlike traditional

scientiﬁc computations which run from start to ﬁnish, cloud-native apps run until

you scale their implementation back to zero. Even in that quiescent state, they can

be restarted simply by telling the deployment tool to increase from zero. One can

set it so that an external event can trigger a serverless responder such as Amazon

Lambda to invoke the deployment system to scale up the application.

So what does cloud-native have to do with the future of science? Consider the

following scenario. Suppose you have a network of experimental instruments that

produce data in large volumes and bursts, which you need to analyze as they arrive

in real-time. This application can naturally be structured as a set of interactive

microservices. One microservice receives and scans data. If something interesting

is spotted, it invokes other microservices to perform additional processing, each

of which may need to scale up to take on these tasks. These various components

336

Chapter 16. History, Critiques, Futures

all send results to cataloging microservices that push data to a persistent data

repository. A second category of events may be triggered by users making queries

concerning the data that has been gathered. These other events can also cause other

analysis tasks to be performed, or may just involve access to the data repository.

The resulting cloud-native experiment management system may have dozens of

individual microservice types, all interacting and scaling according to demand.

16.3.2 Architectural Evolution

Once upon a time, cloud data centers were built with racks of oﬀ-the-shelf servers

from companies like Dell and HP. The relentless economics of competing in the

cloud marketplace has completely changed the way data centers are designed. The

ﬁrst thing to go was oﬀ-the-shelf servers. Google was early in moving to cheap

blade servers packed d ensel y into racks. Amazon followed this practice and was

soon building its own servers in collaboration with companies like Taiwan’s Quanta.

Traditional servers were just too expensive.

Big changes came around 2005 when those building massive data centers were

forced to confront the fact that energy consumption was a major cost of doing

business. Amazon, Google and Microsoft were experimenting with a variety of

ideas to reduce the energy footprint of their data centers. This included tapping

into renewable sources of energy such as geothermal, wind and wave action. Data

center designs began to adopt supercomputer-style hot-cold aisle air conditioning.

Microsoft was able to m ove to a system in which 2000 servers were packaged into

a large shipping container that could be deployed outside.

The next phase of design evolution involved the server and not just its packaging

and cooling. By 2010, many data cloud vendors were designing their own servers.

In 2011, Facebook started the

Open Compute Project

[

] to create an open

source design for the server itself. Facebook, Google, and Microsoft also began

experimenting with ARM processors as a lower power alternative to the traditional

Intel processor. As it became clear that diﬀ erent cloud workloads required diﬀerent

resources, the variety of server conﬁgurations began to explode.

The original data center designs used conventional com mercial networking gear

at the top of each rack and between racks. As these centers grew, their networking

needs became more demanding. Institutions demanded ways to extend th eir private

network directly into the cloud through scalable, virtual private networks. By 2012,

the Azure network was all based on software deﬁned networks [

228

]; the same is

true for Amazon and Google.

337

16.3. Futures

The most recent architectural changes in the cloud are being driven by the

performance requirement of search, analytics, and machine learning . In 2010.

Microsoft research began a study of how to optimize the Bing search algorithms.

This work evolved into a major redesign of server architecture around

Field

Programmable Gate Arrays

(FPGAs) that have been added to the Azure

servers [

]. The FPGAs are situated between the network switches and the

servers so that this programmable logic lies in a plane allowing FPGA-to-FPGA

direct communication. This architecture, called Catapult, allows applications

needing special acceleration to group together a set of FPGAs and servers into a

special purpose mesh. This co nﬁguratio n is used for applications like high speed

encryption and accelerating deep learning [

216

]. Microsoft is not the only cloud tha t

is d epl oying custom hardware. Google recently announced the Tensor Processing

Unit [

164

], which is designed to be a better accelerator for TensorFlow than GPUs.

These examples of cloud data center evolution illustrate that the designs are

moving rapidly toward a possible convergence with supercomputer technology.

While the cloud will always have a diﬀerent use model than the largest supercom-

puters, we expect the value of the cloud for sci ence to only increase.

16.3.3 Edge Computing

Cloud computing has become synonymous with massive, hyper-connected data

centers, within which storage and computation are allocated ﬂuidly in response

to user demand. This highly centralized architecture has been central to cloud

computing’s success, permitting both economies of scale in terms of operations

costs and innovative applications that depend on the aggregation and analysis

of large quantities of data. And as cloud provider services continue to increase

in sophistication, and as business es, homes, and people become increasingly well

connected, it can easily seem that there is no limit to the applications that can

be moved from personal computers to the cloud. Perhaps, we may think, all

computing will soon occur elsewhere.

Yet at the same time as cloud data centers become more powerful and people

become more connected to those data centers, other important trends are pushing

towards decentralization. Increasingly powerful sensors generate vast quantities of

data that often cannot be cost eﬀectively transferred to cloud data centers but must

be processed l ocally. Increasing demands for computer-in-the-loop control make

latency increasingly critical. Consider, for example, an automated observation

system that is to detect migrating birds and then zoom in to obtain high-resolution

images that can be used to identify individual animals. It is likely not practical

to stream real-time video from thousands of cameras to the cloud, process the

338

Chapter 16. History, Critiques, Futures

data, and return results in time to zoom the cameras. But an inexpensive local

processing unit, perhaps running al gorithm s conﬁgured based on large-scale o ﬄine

machine learning, can easily perform such tasks.

For applicatio ns such as these, computing needs to occur “at the edge” of

the network: hence

edge computing

[

232

]; the term “fog computing,” another

nebulous neologism, is sometimes also used [

]. Of course, that is where computing

has always been performed, at least since the PC era. But a new question being

considered is how the edge and the cloud may be connected. Will we see cloud

providers start to engineer cloud services that extend out to the edge? What will

this mean for what we choose to outsource to the cloud? It will be fascinating to

see how these questions are a nswered over the next decade and beyond.

We can already see early examples of cloud providers extending the reach of their

services beyond their primary data centers.

Content distribution networks

(e.g., Akamai, Amazon CloudFront, Azure CDN) run edge servers distributed

worldwide (68 such servers for Amazon CloudFront, as of 2017) to cache content

(e.g., web pages) that is to be made available rapidly to clients. More intriguing

are developments in serverless computing. As we saw in section 4. 3 on page 67,

services such as Amazon Lamb da, Azure Functions, and Google Cloud Functions

allow users to deﬁne fun ctions to be performed when certain events occur. While

these services make it possible to implement powerful reactive applications, their

responsiveness will be limited if every event notiﬁcation and subsequent response

have to travel from the origin site to a cloud data center. Thus, Amazon provides

Lambda"@Edge

, which allows functions to run on Amazon CloudFront content

delivery network nodes. Intriguingly, they have also announced plans to allow

Lambda functions to “execute on hardware that isn’t a part of Amazon ’s cloud or

doesn’t have a consistent connection to the internet” [132]: perhaps, for example,

on computers associated with experimental apparatus in a scientiﬁc laboratory or

on Internet of Things components such as the Array of Things nodes described in

section 9.1 .2 on page 163.

16.4 Resources

The History of the Grid [

126

] reviews many developments relevant to utility, grid,

and clou d computing.

339