Chapter 14

Building Your Own SaaS

“Great services are not canceled by one act or by one single error.”

—Benjamin Disraeli

We saw in part II how software can be made portable and reusable by encapsulating

it within a virtual machine or container. Anyone wi th a cloud account can then

run that code at a click of a mouse or button. Nevertheless, users of such software

must be concerned with obtaining the latest software package, launching it on the

cloud, verifying correct execution, and so forth. Each of these activities introduces

cost and complexity.

Software as a service (SaaS) seeks to overcome these challenges by introducing a

further degree of automation in software delivery. With SaaS, software is operated

on the user’s behalf by a SaaS provider. Users then access the software over the

network, often from a web browser. They need not install, conﬁgure, nor update

any software, and software providers only have to support a single software version,

which they can update at any time to correct errors or add features. SaaS providers

typically use techniques such as replication to achieve high reliability, so that

the service is not liabl e to be “cancelled by one single error.” SaaS advocates

argue that this approach reduces both complexity for software consumers and

costs for providers. Subscription-bas ed payment schem es further reduce friction

for consumers and enable providers to scale delivery with demand. As we saw in

chapter 1, thousands of SaaS providers now operate software in this manner.

In this chapter, we examine how SaaS methods can be applied in science. We

ﬁrst dissect why it is that SaaS is conventionally deﬁned as both a technology and

business model. We then review approaches to on-demand software in science,

14.1. The Meaning of SaaS

including the popular concept of a science gateway, a form of remote software

access that has primarily targeted research supercomputer systems for computa-

tion. We then look at more cloud native versions of the science gateway SaaS

concept, using two examples to illustrate how you can build SaaS solutions for

your own purposes. The ﬁrst,

Globus Genomics

, provides on-demand access to

bioinformatics pipelines. The secon d is the

Globus

research data management

service, already introduced in chapter 11. W hil e it so happens that both systems

that we present here build on Amazon cloud services, similar implementation and

deployment strategies can be followed on other public clouds. A key messag e is

that leveraging cloud platform services can facilitate the creation of SaaS oﬀerings

that are reliable, secure, and scalable in terms of the number of users supported

and amount of service delivered.

14.1 The Meaning of SaaS

Gartner deﬁnes SaaS as software that is “owned, delivered, and managed remotely

by [a provider who] delivers software based on one set of common code and data

deﬁnitions that is consumed in a one-to-many model by all contracted customers at

any time, on a pay-for-use basis or as a subscription” [

136

]. Analyst David Terrar

adds: “[and t]he application behind the service is properly web architected—not

an existing application web enabled.”

This deﬁnition is as much about business model as technology. From a technol-

ogy perspective, it speaks to a delivery model: the provider runs a si ngl e version of

the software, deployed so as to permit access over the Internet via web interfaces.

By implication, the software is architected to scale to large numbers of consumers,

and to enable multitenant operations, meaning that multiple cons um ers can access

the software at the same time without interfering with each other. From a business

model perspective, the deﬁnition speaks to software that con su mers do not buy,

as they would a home appliance, but instead pay per use, as they would a movie

online, or subscribe to, as they woul d a newspaper.

This juxtaposition of technology and business model may seem odd, but in fact

it is central to the success of SaaS, in in dus try at least. (The fact that some SaaS

providers use advertising revenue rather than subscriptions to cover their costs

does not change the essential economics.) In brief, the centralized operation model

has allowed SaaS providers to slash per-user costs relative to software distributed

via traditional channels, because there is, for example, no longer a need to support

multiple computer architectures and versions. It has also greatly reduced barriers

to access: most SaaS software is accessible to anyone with a Web browser, in

298

Chapter 14. Building Your Own SaaS

contrast to much enterprise software that might require specialized hardware and

expertise to install and run. These two factors mean that SaaS providers can deliver

software to many more people at far lower costs than were previously possible:

literally cents on the dollar.

While the cost of serving each new customer may be low, it is not zero.

Furthermore, the upfront cost of establishing and operating a SaaS system involves

signiﬁcant ﬁxed costs. (For example, 24x7 monitoring to ensure high availability.)

The SaaS industry has d etermined that pay-per-use or subscription-based payment

models are the best way to recoup these costs. Such approaches provide a low

barrier to entry (anyone with a credit card can access a service) and mean that

revenue scales linearly with usage. Thus, many users and per-user payments

make S aaS sustainable by providing positive returns to scale: more users means

more income that can pay for the scaled-up operations and/or reduce subscription

charges, encouraging yet broader adoption.

14.2 SaaS Architecture

Leaving business mod el aside, we next introduce approaches that have proven

successful for architecting and engineering SaaS systems.

In general, the goal of a SaaS architect is to create a system that can deliver

powerful capabilities to many customers at low cos t and with high reliability,

security, and performance. This overarching goal allows for many tradeoﬀs: for

example, between capabilities and reliability, or between optimizing for base cost

or per-user cost. Nevertheless, some basic principles can be identiﬁed.

The most sophisticated SaaS systems are often architected using a microservice

architecture, in which state is maintained i n one or more persistent, replicated

storage services, and computation is performed in short-lived, stateless services

that can be rerun if necessary. This architecture provides for a high degree of

fault tolerance and also facilitates scaling: more virtual machines can be allocated

dynamically as load increases. The following example illustrates these principles.

Video rendering SaaS

.Acompanycalled

Animoto

has had a lot of success with

their video rendering service. You upload a set of photos; th e y create an animated

video from the set images, with smooth transitions from picture to picture and musical

accompaniment. They thus need to run many somewhat data- and computation-

intensive tasks. As shown in ﬁgure 14.1 on the next page, their architecture uses

Amazon S3 cloud object storage to hold all images and data; dynamically managed

Amazon EC2 virtual machine ins tances to run web se rvers, data ingest servers,

299

14.2. SaaS Architecture

rendering servers, and the like; and the Amazon SQS queuing service to coordinate

between these activities, for example to maintain pending data ingest, analysis, and

rendering tasks. At least that is how they described their architecture almost a

decade ago; they likely use a richer set of services now. Animoto describes scaling to

support 750,000 new users in three days by adding virtual machine instances [4].

Figure 14.1: A schematic of the Animoto SaaS pipeline, showing the logical structure at

the top and its realization in terms of Amazon services below.

Let us now dive more deeply into the process by which a SaaS system is created.

Imagine that you developed your own video rendering application,

myrender

,that

creates videos like those generated by Animoto, but from the command line, for

example as follows.

myrender -i image-directory-name -o video-file-name

Your application runs nicely on your GP U-equipped workstation, but it is

becoming too popular for you to handle the growing number of requests by hand.

You share the source code, but people keep asking you to perform more and more

tasks. You want to deliver this application to many people as SaaS. How might

you proceed? L et us consider three alternatives.

In the ﬁrst, you write a program that accepts user rendering requests via a

web form, and for each request launches

myrender

on your workstation. However,

this approach breaks down as the number of requests continues to grow: your

workstation is overwhelmed. A second approach is to adapt your program to

instead instantiate a cloud-hosted virtual machine instance for each request. The

300

Chapter 14. Building Your Own SaaS

virtual machine image is conﬁgured to accept user images, run the

myrender

application, wait for the user to download the resulting video, and then terminate.

Athirdapproachistopartitionthe

myrender

code into separate analysis,

rendering, and assembly components; create a virtual machine or container for

each component; and create or use a framework akin to that used by Animoto to

dispatch appropriate tasks to the diﬀerent virtual machines or containers, while

scaling their numbers up and down in response to changing load. Each individu al

request is thus represented solely by a set of objects in object storage and transient

requests in a queue, plus perhaps entries in other database tables.

Which approach is better? The ﬁrst does not scale, so let us consider the

second and third. The second is su rely l ess work to implem ent than the third:

you do not need to alter your application code at all. However, the fact that

each user rendering request is run on a separate virtual machine instance also

has disad vantages. In particular: (1) Cost: You pay for each virtual machine,

even when it is not full y occupied, for example because it is waiting on user

input. (2) Speed: Each user request is processed sequentially, one image at a

time. Opportunities for parallelism, for example to process all images in parallel,

cannot be exploited without changes to the application. (3) Reliability: Failure of

a virtual machine instance at any time results in the loss of all work performed by

that instance to that time. Recovery requires restarting from the beginning. The

third approach, on the other hand, requires m ore work up front but can provide

big improvements in cloud cost, execution speed, and service relia bil ity.

The third approach is commonly referred to as a

multitenant architecture

because all requests are fulﬁlled by the same (albeit elastically scalable) set of cloud

resources, each of which may host diﬀerent user requests at diﬀerent times, and

indeed multiple requests at the same time. A vital issue in multitenant systems is

ensuring

isolation

among diﬀerent users. It is the approach that is adopted in

the vast majority of large SaaS systems.

14.3 SaaS and Science

The scientiﬁc community has a long history of providing online access to software.

After all, the origi nal motivation for the ARPANET, precursor to today’s Internet,

was to enable remote access to scarce computers [

257

]. With the advent of high-

speed networks followed by the World Wide Web, m any such experiments were

conducted [

126

239

]. One early system, the Network Enabled Optimization S erver

(NEOS)

neos-server.org

, has been in operation for more than 20 years [

105

solving optimization problems delivered via email or the web.

301

14.3. SaaS and Science

The term

science gateways

has become increasingly often used to denote

a system that provides online access to scientiﬁc software [

261

]. In general, a

science gateway is a (typically web) portal that allows users to conﬁgure and invoke

scientiﬁc applications, often on supercomputers, providing a convenient gateway

to otherwise hard-to-access computers and software.

The impact of such systems on science has been considerable. For example, the

MG-RAST metagenomics analysis service

metagenomics.anl.gov

, which provides

online access to services for the analysis of genetic material in environmental

samples [

199

], has more than 22,000 registered users as of 2017, who have collec-

tively uploaded for analysis some 280,00 0 metagenomes containing more than 10

base pairs. That is a tremendous amount of science being supported by a single

service! Other s uccess ful systems, such as CIPRES [

201

], which provides access to

phylogenetic reconstruction software; CyberGIS [

185

], for collaborative geospatial

problem solving; and nanoHUB [

172

], which provides access to hundreds of com-

putational simulation codes in nanotechnology, also have thousands of users a nd

correspondingly large impacts on both science and education. A recent survey [

176

]

provides further insights into how and where science gateways are used.

While it is hard to generalize across such a broad spectrum of activities, we can

state that the typical science software service has some but not all of the properties

of SaaS as commonly und erstood. First, from a technology perspective: Most such

services commonly make a single version of a science application available to many

people, and many leverage modern web interface technologies to provide intuitive

interactive interfaces. Some also provide REST APIs and even SDKs to permit

programmatic access. On the other hand, many are less than fully elastic, due to

a need to run on specialized and typically overloaded supercomputers, and few are

architected to leverage the power of modern cloud platforms. Thus, they handle

modest numbers of users well, but may not scale.

From a business model perspective, few science software systems implement

pay-by-use or subscription-based payment schemes. Instead, they typically rely

on research grant support and/or allocations of compute and storage resources on

scientiﬁc computing centers. This lack of a business model can be a subject of

concern, because it raises a question about their long-term sustainability (what

happens when grants end?) and also hinders scaling (an allocation of supercomputer

time may be enough to support 10 concurrent users, but what happens when

demand increases to 1000 concurrent users? 10,000?).

We next use two example systems that have each taken a diﬀ erent approach

to science SaaS from both technology and business model perspectives: Globus

Genomics and the Globus service.

302

Chapter 14. Building Your Own SaaS

14.4 The Globus Genomics Bioinformatics System

Globus Genomics [

187

]

globus.org/genomics

, developed by Ravi Madduri, Paul

Davé, Alex Rodriguez, Dina S ula khe, and others, is a cloud-hosted software service

for the rapid analysis of biomedical, and in particular next generation sequencing

(NGS), data. The basic idea is as follows: a customer (individual researcher,

laboratory, community) signs up for the service. The Globus Genomics team

then establishes a service instance, conﬁgured with applications and pipelin es

speciﬁc to the new customer’s disciplinary requirements. Access to this instance

is managed via a Globus Grou p. Any authorized user can then sign on to the

instance, use its Gal axy interface to select an existing application or pipeline (or

create a new pipeline), specify the data to be processed, and launch a computation

that processes the speciﬁed data with the speciﬁed pipeline. Computational results

can be maintained within the instance or, alternatively, returned to the user’s

laboratory for further processing or long-term storage.

Figure 14.2 shows Globus Genomics in action. We see its Galaxy interface

being used to display a pipeline com monly employed for the analysis of data from

an RNA-seq experiment, which is a method for determining the type and quantity

of RNA in a biological sample [

]. By providing research teams with a persona l

cloud-powered data storage and analysis “virtual computer,” Globus Genomics

allows resea rchers to perform fully automated analysis of large genetic sequence

datasets from a web browser, without any need for software installation or indeed

any expertise in cloud or parallel computing. In one common use case, a researcher

sends a biological sample to a commercial sequencing provider, has the resulting

data communicated over the network to cloud storage (e.g., Amazon S3); and

then accesses an d analyzes the data by an analysis pipeline running within Globus

Genomics.

14.4.1 Globus Genomics Archite cture and Implementation

As sh own in ﬁgure 14.3 on page 307, the Globus Genomics implementation com-

prises six components, all deployed on a single Amazon EC2 node: Galaxy and

web server for workﬂow management and user interface; HTCondor and an elastic

provisioner for computation management; and Globus Connect Server (GCS) and

shared ﬁle system for data management. These services themselves engage other

cloud services, notably Globus identity and data m anagement services (see sec-

tion 3.6 on page 51) for user authentication and to initiate data transfers; Amazon

EC2 to create and delete the virtual machine instances on which user computations

run; and the Amazon Relational Datab ase Service and Elastic File System for

303

14.4. The Globus Genomics Bioinformatics System

Figure 14.2: The Galaxy web interface used by Globus Genomics, showing an RNA-seq

pipeline that allows a researcher to detect various features in experimental samples.

storing user data that needs to persist over time. We describe each element in

turn.

The

Galaxy

system [

141

] supports construction and execution of workﬂows.

A user signs on to the cloud-hosted Galaxy, using campus credentials thanks to

integration with Globus Auth [

247

]. They can then select an existing workﬂow or

create a new one, identify data to be processed, and launch comp utatio nal tasks.

The

web server

, an integral part of the Galaxy system, serves the Galaxy user

interface to Globus Genomics users. Users need only a web browser to access

Globus Genomics capabilities.

The

HTCondor

system [

243

] (see section 7.7 on page 128) maintains a queue

of tasks to be executed, dispatches tasks from that queue to available EC2 worker

nodes, and monitors those tasks for successful completion or failure. The elastic

provi sioner

manages the pool of worker nodes, allocating nodes of the right type

for the tasks that are to be executed, increasing the number of nodes when the

HTCondor queue becomes l ong, and de-allocating nodes when there is little or

no work to do. The elastic provisioner is designed to use spot instances (see

section 5.2.2 on page 77) where possible, in order to reduce costs.

Globus Connect Server

(GCS), as discussed in section 3.6 on page 51,

implements the protocols that the Globus cloud service uses to manage data

transfers between pairs of Globus Connect instances. (The related Globus Connect

Personal service is designed for use on s ing le-user personal computers.) We can

304

Chapter 14. Building Your Own SaaS

think of this component as being equivalent to the agent that runs on your personal

computer to interact with the Dropbox ﬁle sharing system, although GCS supports

specialized high-speed protocols.

AW S B a t ch aws.amazon.com/batch

.Amazonrecentlyreleasedthisjobscheduling

and management service, which they in d icate can run hundreds of thousands of batch

computing jobs on the Amazon cloud, dynamically provisioning compute resources

of types (e.g., CPU or memory optimized instances; EC2 and Spot instances) and

numbers required to mee t the resource needs of the jobs submitted. If AWS Batch

behaves as advertised, it might obviate the need f or the HTCondor and elastic

provisioner components of the Globus Genomics solution.

Finally, the

shared ﬁle system

uses the Network File System (NFS) to

provide a uniform ﬁle system name space across the manager and worker nodes.

This mechanism simpliﬁes the execution of Galaxy workﬂows, which are designed

to run in a shared ﬁle system environment.

Globus Genomics uses Chef

chef.io

for Conﬁguration Management, allowing

service components to be updated and replaced without any error-prone manual

conﬁguration steps [186]. It uses Chef recipes to encode the following steps.

Provision an Identity and Access Management (IAM: see section 15.2 on

page 319) user un der the customer’s Amazon account, with a security policy

that allows the IAM user to create and remove AWS resources and perform

other actions required to set up and run a production-grade science SaaS.

Provision an EC2 instance with HTCondor software, conﬁgured to serve as

the head node for an HTCondor computational cluster; Network File Server

software for data sharing with computational cluster nodes; a NGINX web

proxy and WSGI Python web server; a Galaxy process; a Globus Connect

server for external access to the network ﬁle system, conﬁgured for optimal

performance; Unix accounts for the a dmi ni strators; security updates/patches;

and Domain Name System (DNS) support, via Amazon’s

Route 53

service.

3. Provision the following additional components:

(a)

An Amazon

Virtual Private Cloud

with appropriate network routes

between the head nodes and compute nodes;

(b)

An EBS/EFS-based network ﬁle system with optimized I/O conﬁgura-

tion;

305

14.4. The Globus Genomics Bioinformatics System

(c)

An elastic provisioner along with network conﬁgurations to support the

creation of spot instances across multiple Availability Zones;

(d)

A read-only network volume conﬁgured with the tools an d pipelines

required for a speciﬁc scientiﬁc domain, and with reference datasets

that may be used by a nalysis pipelines;

(e)

Monitoring of the health of various system components, generating

alerts as required;

(f)

An Amazon

Relational Database Service

database for persisting

the state of application workﬂows; and

(g)

Identity integration with Globu s, and groups conﬁgured for authoriza-

tion of user accesses to the instance.

14.4.2 Globus Genomics as SaaS

Globus Genomics has many SaaS attributes. From a technology perspective, it is

accessible remotely over the network, runs a single copy of its software (Galaxy, the

various genomics analysis tools a nd pipelines), and leverages cloud platform s ervices

provided by Amazon a nd Globus for scalabil ity and to simplify its implementation.

Globus Genomics is not multitenant: it creates a separate instance of the

system (the manager node in ﬁgure 14.3) for each customer, rather than having

one scalable instance serving all customers. This characteristic of the Globu s

Genomics system is not a problem for users: indeed, a single-tenant architecture

may appear preferable to some due to (at l east an appearance of) increased security

and the clean, transparent bil lin g for Amazon cloud charges that it allows. However,

single tenancy increases costs for the Globus Genomics team over time, as each

new customer requires the instantiation of a complete new Globus Genomics

conﬁguration, increasing Amazon usage and other operations costs, and d iﬀerent

customers cannot share compute nodes.

We note also that the implementation does not guard aga ins t failure of the

node on which the manager logic runs. The Globus Genomics team can detect

such failures and restart the service, but the failure is not transparent to users.

One sol ution would be to re-engineer the system to leverage more microservices,

as we described for Animoto above.

From a business model perspective, Globus Genomics also has SaaS character-

istics in that its use is supported by a subscription model. A l ab or indi vidu al user

signs up for a Globus Genomics subscription that covers the base cost of operating

their private instance. As part of the conﬁguration of a Globus Genomics i ns tance,

306

Chapter 14. Building Your Own SaaS

Galaxy

Elastic

provisioner

Globus3Genomics3manager

Globus

endpoints

HTCondor

Shared3file3

system

Globus3cloud3services

Globus3

transfe r

Globus3

Auth

GCS

Web3server

Elastic3File3

System

Relational3

Database3

Service

Dynamic3

EC23pool

Figure 14.3: The Globus Genomics system dispatches tasks to a dynamically instantiated

HTCondor pool, with virtual n odes add ed and removed by the elastic provisioner in

response to changing load.

Amazon account detai ls are provided so that resources consumed by any us ers

granted access to that i nstan ce can be charged to that account.

14.5 The Globus Research Data Management Service

The two li mitati ons n oted in ou r discu ssi on of Glo bus Gen omi cs il lus trate tradeoﬀs

that frequently arise when developing cloud-hosted SaaS, particularly in science.

Multitenancy and microservice architectures tend to reduce costs and increase

reliability, but ca n increase up-front costs. In our second exampl e, we describe a

system that is more cloud native, namely the Globus research data managem ent

service introduced in section 1.5.4 on page 15.

Globus, developed at the University of Chicago since 2010, leverages software-

as-a-service methods to deliver data management capabilities to the research

community. As shown in ﬁgure 14.4 on the next p age, those capabilities, which

include data transfer, sharing, publication, and discovery as well as identity and

credential management, are implemented by software running on the Amazon

cloud. Globus Connect software deployed on ﬁle systems at research institutions

and on personal computers enable those systems to participate in the Globus ﬁle

sharing network. REST APIs support programmatic access, as we have described

in chapter 11 and in the precedin g Globus Genomics section.

Globus is popular because researchers and developers of research tools alike

307

14.5. The Globus Research Data Management Service

Auth & groups

…

Globus APIs

Data publication & discovery

File sharing

File transfer & replication

Applications

Storage/

systems

Figure 14.4: Globus SaaS provides authentication and data transfer, sharing, publication,

and di sc overy capabilities, accessible via APIs (left) and web clients (not shown). Globus

Connect software on storage systems enables access to data in many locations.

can hand oﬀ to the Globus service responsibility for otherwise time-consuming

tasks, such as babysitting ﬁle transfers. For example, consider a researcher who

wants to transfer data from site A to B. With Globus, the researcher can simply

make a request to the cloud-hosted service, via API or web interface. The Globus

service then handles user a uthentication, negotiation of access at sites A and B,

conﬁguration of the transfer, and monitoring and control of the transfer activity.

Because many important projects depend on Globus for authentication, au-

thorization, d ata access, and other purposes, high availability is essential. Thus

the Globus implementation leverages public cloud services to replicate state data

in multiple locations, operate redundant servers with dynamic failover, monitor

service status, and so forth. Table 14.1 provides a partial list of the Amazon

services used by Globus.

14.5.1 Globus Service Architect ure

The Globus SaaS is broken down into logi cal units of services. Each service

comprises three key components: a

REST API

, a set of on e or more

backend

task workers

,anda

persistence layer

. Additional components may be needed

for so me services, and some components may be colocated to save cost or complexity.

Having this common breakdown, and exposing the services to one another only

via their REST APIs, provides several key properties that allow diﬀerent parts of

the SaaS to scale independently of one an other.

308

Chapter 14. Building Your Own SaaS

Table 14.1: Some of the Amazon cloud services used in the Globus SaaS implementation.

Service Use made by Globus

EC2

Provide high availability instances of Globus services; serve web

APIs; run background tasks; internal infrastructure

RDS

Store Globus service state with high availability and durability

DynamoDB

Store Globus service state with high availability and durability

VPC Establish private Amazon cloud with secure virtual network

ELB Direct client requests to an available service instance

Store state of in-progress tasks, service data backups, static

web content

IAM Manage access to Am azon resources within Globus

CloudWatch Monitor the status of Globus resources

SNS

Simple Notiﬁcation Service to sen d notiﬁcations to Globus staﬀ

SES Simple Email Service to deliver email to users

Globus REST APIs are typically deployed on EC2 instances. All logic used

to hand le REST API requests is performed synchronously: any asynchronous or

long-term activity requested of a service is handled by creating records of the

desired activity in the persistence layer. The REST API handlers do not wait for

these actions to be completed, but simply register the desired activity in persistent

storage and termin ate. Further processing is then handled by the backend task

workers, which either poll the persistence layer or are notiﬁed by API workers. This

approach gives the REST API instances the powerful property of being

stateless

their contents on disk and in memory are ephemeral, and any REST API instance

can process any request. As a result, th ese microservices can scale up and down

more or less trivially, allowing the Glo bus team to add or remove capacity to serve

APIs in direct proportion to observed system load. So long as the backend task

workers rely on the persistence layer and are also stateless, they can scale up or

down just as easily.

Globus employs the same public-facing REST APIs for all internal com mu-

nications between services and thus no two Globu s services are tightly coupled.

Each service can scale, rearrange infrastructure, and alter core service components

without impacting one another at all. This sep aration of concerns is key to Globus

operations, and allows improvements to be made safely, easily, and frequently.

The persistence l ayer is implemented on Amazon storage services, leveraging

their replication across availability zones for fault tolerance and creating periodic

remote snapshots for disaster recovery. Globus uses, i n particular, S3 and the

309

14.6. Summary

PostgreSQL Relational Database Service (RDS). One service uses Dyna moDB. The

various system components are encapsulated in Virtual Private Clouds (VPCs),

which allow for the provisioning of logically isolated sections of the Am azon cloud

within which Amazon resources can be launched in a man aged virtual network.

14.5.2 Globus Service Operations

In addition to segmenting the SaaS into compon ent services, Globus leverages a

number of common practices across all of these services to ma intain them uniformly.

By so doing, operational and infrastructural improvements are applied across the

entire product oﬀering and their eﬀects are ampliﬁed. Internal components that

are thus shared across multiple or all Globus services include service health and

performance monitoring; continuous integration and continuous delivery pipelines;

security moni toring and intrusion detection; log aggregation; conﬁguration man-

agement; and backups of disk, database, S3, and other storage.

14.6 Summary

Our goal in providing th is brief review of SaaS methods is to give you a framework

for your thinking on software as a service and i ts role in science. True SaaS,

realized as cloud-hosted software with support from pay-for-use or subscriptions,

can address in a convenient manner the three major challenges of science software,

namely usability, scalability, and sustainability. But a certain scale of use is

required to jus tify the costs of multitenant architecture, and not all software will

generate the interest and subscription in come required to support that scale of use.

The science community will surely learn a lot more about the pros and cons of

SaaS in the next few years.

14.7 Resources

Dubey and Wagle provide a somewhat dated but still excellent overview of soft-

ware as a service [

113

]. Fox, Patterson, and Joseph’s Engineering Software as a

Service [

128

], designed to accompany their EdX online course, provides in-depth

discussion of many issues that arise when building software as a service.

310