Chapter 14
Building Your Own SaaS
“Great services are not canceled by one act or by one single error.”
—Benjamin Disraeli
We saw in part II how software can be made portable and reusable by encapsulating
it within a virtual machine or container. Anyone wi th a cloud account can then
run that code at a click of a mouse or button. Nevertheless, users of such software
must be concerned with obtaining the latest software package, launching it on the
cloud, verifying correct execution, and so forth. Each of these activities introduces
cost and complexity.
Software as a service (SaaS) seeks to overcome these challenges by introducing a
further degree of automation in software delivery. With SaaS, software is operated
on the user’s behalf by a SaaS provider. Users then access the software over the
network, often from a web browser. They need not install, configure, nor update
any software, and software providers only have to support a single software version,
which they can update at any time to correct errors or add features. SaaS providers
typically use techniques such as replication to achieve high reliability, so that
the service is not liabl e to be “cancelled by one single error.” SaaS advocates
argue that this approach reduces both complexity for software consumers and
costs for providers. Subscription-bas ed payment schem es further reduce friction
for consumers and enable providers to scale delivery with demand. As we saw in
chapter 1, thousands of SaaS providers now operate software in this manner.
In this chapter, we examine how SaaS methods can be applied in science. We
first dissect why it is that SaaS is conventionally defined as both a technology and
business model. We then review approaches to on-demand software in science,
14.1. The Meaning of SaaS
including the popular concept of a science gateway, a form of remote software
access that has primarily targeted research supercomputer systems for computa-
tion. We then look at more cloud native versions of the science gateway SaaS
concept, using two examples to illustrate how you can build SaaS solutions for
your own purposes. The first,
Globus Genomics
, provides on-demand access to
bioinformatics pipelines. The secon d is the
Globus
research data management
service, already introduced in chapter 11. W hil e it so happens that both systems
that we present here build on Amazon cloud services, similar implementation and
deployment strategies can be followed on other public clouds. A key messag e is
that leveraging cloud platform services can facilitate the creation of SaaS oerings
that are reliable, secure, and scalable in terms of the number of users supported
and amount of service delivered.
14.1 The Meaning of SaaS
Gartner defines SaaS as software that is “owned, delivered, and managed remotely
by [a provider who] delivers software based on one set of common code and data
definitions that is consumed in a one-to-many model by all contracted customers at
any time, on a pay-for-use basis or as a subscription” [
136
]. Analyst David Terrar
adds: “[and t]he application behind the service is properly web architected—not
an existing application web enabled.”
This definition is as much about business model as technology. From a technol-
ogy perspective, it speaks to a delivery model: the provider runs a si ngl e version of
the software, deployed so as to permit access over the Internet via web interfaces.
By implication, the software is architected to scale to large numbers of consumers,
and to enable multitenant operations, meaning that multiple cons um ers can access
the software at the same time without interfering with each other. From a business
model perspective, the definition speaks to software that con su mers do not buy,
as they would a home appliance, but instead pay per use, as they would a movie
online, or subscribe to, as they woul d a newspaper.
This juxtaposition of technology and business model may seem odd, but in fact
it is central to the success of SaaS, in in dus try at least. (The fact that some SaaS
providers use advertising revenue rather than subscriptions to cover their costs
does not change the essential economics.) In brief, the centralized operation model
has allowed SaaS providers to slash per-user costs relative to software distributed
via traditional channels, because there is, for example, no longer a need to support
multiple computer architectures and versions. It has also greatly reduced barriers
to access: most SaaS software is accessible to anyone with a Web browser, in
298
Chapter 14. Building Your Own SaaS
contrast to much enterprise software that might require specialized hardware and
expertise to install and run. These two factors mean that SaaS providers can deliver
software to many more people at far lower costs than were previously possible:
literally cents on the dollar.
While the cost of serving each new customer may be low, it is not zero.
Furthermore, the upfront cost of establishing and operating a SaaS system involves
significant fixed costs. (For example, 24x7 monitoring to ensure high availability.)
The SaaS industry has d etermined that pay-per-use or subscription-based payment
models are the best way to recoup these costs. Such approaches provide a low
barrier to entry (anyone with a credit card can access a service) and mean that
revenue scales linearly with usage. Thus, many users and per-user payments
make S aaS sustainable by providing positive returns to scale: more users means
more income that can pay for the scaled-up operations and/or reduce subscription
charges, encouraging yet broader adoption.
14.2 SaaS Architecture
Leaving business mod el aside, we next introduce approaches that have proven
successful for architecting and engineering SaaS systems.
In general, the goal of a SaaS architect is to create a system that can deliver
powerful capabilities to many customers at low cos t and with high reliability,
security, and performance. This overarching goal allows for many tradeos: for
example, between capabilities and reliability, or between optimizing for base cost
or per-user cost. Nevertheless, some basic principles can be identified.
The most sophisticated SaaS systems are often architected using a microservice
architecture, in which state is maintained i n one or more persistent, replicated
storage services, and computation is performed in short-lived, stateless services
that can be rerun if necessary. This architecture provides for a high degree of
fault tolerance and also facilitates scaling: more virtual machines can be allocated
dynamically as load increases. The following example illustrates these principles.
Video rendering SaaS
.Acompanycalled
Animoto
has had a lot of success with
their video rendering service. You upload a set of photos; th e y create an animated
video from the set images, with smooth transitions from picture to picture and musical
accompaniment. They thus need to run many somewhat data- and computation-
intensive tasks. As shown in figure 14.1 on the next page, their architecture uses
Amazon S3 cloud object storage to hold all images and data; dynamically managed
Amazon EC2 virtual machine ins tances to run web se rvers, data ingest servers,
299
14.2. SaaS Architecture
rendering servers, and the like; and the Amazon SQS queuing service to coordinate
between these activities, for example to maintain pending data ingest, analysis, and
rendering tasks. At least that is how they described their architecture almost a
decade ago; they likely use a richer set of services now. Animoto describes scaling to
support 750,000 new users in three days by adding virtual machine instances [4].
Figure 14.1: A schematic of the Animoto SaaS pipeline, showing the logical structure at
the top and its realization in terms of Amazon services below.
Let us now dive more deeply into the process by which a SaaS system is created.
Imagine that you developed your own video rendering application,
myrender
,that
creates videos like those generated by Animoto, but from the command line, for
example as follows.
myrender -i image-directory-name -o video-file-name
Your application runs nicely on your GP U-equipped workstation, but it is
becoming too popular for you to handle the growing number of requests by hand.
You share the source code, but people keep asking you to perform more and more
tasks. You want to deliver this application to many people as SaaS. How might
you proceed? L et us consider three alternatives.
In the first, you write a program that accepts user rendering requests via a
web form, and for each request launches
myrender
on your workstation. However,
this approach breaks down as the number of requests continues to grow: your
workstation is overwhelmed. A second approach is to adapt your program to
instead instantiate a cloud-hosted virtual machine instance for each request. The
300
Chapter 14. Building Your Own SaaS
virtual machine image is configured to accept user images, run the
myrender
application, wait for the user to download the resulting video, and then terminate.
Athirdapproachistopartitionthe
myrender
code into separate analysis,
rendering, and assembly components; create a virtual machine or container for
each component; and create or use a framework akin to that used by Animoto to
dispatch appropriate tasks to the dierent virtual machines or containers, while
scaling their numbers up and down in response to changing load. Each individu al
request is thus represented solely by a set of objects in object storage and transient
requests in a queue, plus perhaps entries in other database tables.
Which approach is better? The first does not scale, so let us consider the
second and third. The second is su rely l ess work to implem ent than the third:
you do not need to alter your application code at all. However, the fact that
each user rendering request is run on a separate virtual machine instance also
has disad vantages. In particular: (1) Cost: You pay for each virtual machine,
even when it is not full y occupied, for example because it is waiting on user
input. (2) Speed: Each user request is processed sequentially, one image at a
time. Opportunities for parallelism, for example to process all images in parallel,
cannot be exploited without changes to the application. (3) Reliability: Failure of
a virtual machine instance at any time results in the loss of all work performed by
that instance to that time. Recovery requires restarting from the beginning. The
third approach, on the other hand, requires m ore work up front but can provide
big improvements in cloud cost, execution speed, and service relia bil ity.
The third approach is commonly referred to as a
multitenant architecture
because all requests are fulfilled by the same (albeit elastically scalable) set of cloud
resources, each of which may host dierent user requests at dierent times, and
indeed multiple requests at the same time. A vital issue in multitenant systems is
ensuring
isolation
among dierent users. It is the approach that is adopted in
the vast majority of large SaaS systems.
14.3 SaaS and Science
The scientific community has a long history of providing online access to software.
After all, the origi nal motivation for the ARPANET, precursor to today’s Internet,
was to enable remote access to scarce computers [
257
]. With the advent of high-
speed networks followed by the World Wide Web, m any such experiments were
conducted [
126
,
239
]. One early system, the Network Enabled Optimization S erver
(NEOS)
neos-server.org
, has been in operation for more than 20 years [
105
],
solving optimization problems delivered via email or the web.
301
14.3. SaaS and Science
The term
science gateways
has become increasingly often used to denote
a system that provides online access to scientific software [
261
]. In general, a
science gateway is a (typically web) portal that allows users to configure and invoke
scientific applications, often on supercomputers, providing a convenient gateway
to otherwise hard-to-access computers and software.
The impact of such systems on science has been considerable. For example, the
MG-RAST metagenomics analysis service
metagenomics.anl.gov
, which provides
online access to services for the analysis of genetic material in environmental
samples [
199
], has more than 22,000 registered users as of 2017, who have collec-
tively uploaded for analysis some 280,00 0 metagenomes containing more than 10
14
base pairs. That is a tremendous amount of science being supported by a single
service! Other s uccess ful systems, such as CIPRES [
201
], which provides access to
phylogenetic reconstruction software; CyberGIS [
185
], for collaborative geospatial
problem solving; and nanoHUB [
172
], which provides access to hundreds of com-
putational simulation codes in nanotechnology, also have thousands of users a nd
correspondingly large impacts on both science and education. A recent survey [
176
]
provides further insights into how and where science gateways are used.
While it is hard to generalize across such a broad spectrum of activities, we can
state that the typical science software service has some but not all of the properties
of SaaS as commonly und erstood. First, from a technology perspective: Most such
services commonly make a single version of a science application available to many
people, and many leverage modern web interface technologies to provide intuitive
interactive interfaces. Some also provide REST APIs and even SDKs to permit
programmatic access. On the other hand, many are less than fully elastic, due to
a need to run on specialized and typically overloaded supercomputers, and few are
architected to leverage the power of modern cloud platforms. Thus, they handle
modest numbers of users well, but may not scale.
From a business model perspective, few science software systems implement
pay-by-use or subscription-based payment schemes. Instead, they typically rely
on research grant support and/or allocations of compute and storage resources on
scientific computing centers. This lack of a business model can be a subject of
concern, because it raises a question about their long-term sustainability (what
happens when grants end?) and also hinders scaling (an allocation of supercomputer
time may be enough to support 10 concurrent users, but what happens when
demand increases to 1000 concurrent users? 10,000?).
We next use two example systems that have each taken a dierent approach
to science SaaS from both technology and business model perspectives: Globus
Genomics and the Globus service.
302
Chapter 14. Building Your Own SaaS
14.4 The Globus Genomics Bioinformatics System
Globus Genomics [
187
]
globus.org/genomics
, developed by Ravi Madduri, Paul
Davé, Alex Rodriguez, Dina S ula khe, and others, is a cloud-hosted software service
for the rapid analysis of biomedical, and in particular next generation sequencing
(NGS), data. The basic idea is as follows: a customer (individual researcher,
laboratory, community) signs up for the service. The Globus Genomics team
then establishes a service instance, configured with applications and pipelin es
specific to the new customer’s disciplinary requirements. Access to this instance
is managed via a Globus Grou p. Any authorized user can then sign on to the
instance, use its Gal axy interface to select an existing application or pipeline (or
create a new pipeline), specify the data to be processed, and launch a computation
that processes the specified data with the specified pipeline. Computational results
can be maintained within the instance or, alternatively, returned to the user’s
laboratory for further processing or long-term storage.
Figure 14.2 shows Globus Genomics in action. We see its Galaxy interface
being used to display a pipeline com monly employed for the analysis of data from
an RNA-seq experiment, which is a method for determining the type and quantity
of RNA in a biological sample [
99
]. By providing research teams with a persona l
cloud-powered data storage and analysis “virtual computer,” Globus Genomics
allows resea rchers to perform fully automated analysis of large genetic sequence
datasets from a web browser, without any need for software installation or indeed
any expertise in cloud or parallel computing. In one common use case, a researcher
sends a biological sample to a commercial sequencing provider, has the resulting
data communicated over the network to cloud storage (e.g., Amazon S3); and
then accesses an d analyzes the data by an analysis pipeline running within Globus
Genomics.
14.4.1 Globus Genomics Archite cture and Implementation
As sh own in figure 14.3 on page 307, the Globus Genomics implementation com-
prises six components, all deployed on a single Amazon EC2 node: Galaxy and
web server for workflow management and user interface; HTCondor and an elastic
provisioner for computation management; and Globus Connect Server (GCS) and
shared file system for data management. These services themselves engage other
cloud services, notably Globus identity and data m anagement services (see sec-
tion 3.6 on page 51) for user authentication and to initiate data transfers; Amazon
EC2 to create and delete the virtual machine instances on which user computations
run; and the Amazon Relational Datab ase Service and Elastic File System for
303
14.4. The Globus Genomics Bioinformatics System
Figure 14.2: The Galaxy web interface used by Globus Genomics, showing an RNA-seq
pipeline that allows a researcher to detect various features in experimental samples.
storing user data that needs to persist over time. We describe each element in
turn.
The
Galaxy
system [
141
] supports construction and execution of workflows.
A user signs on to the cloud-hosted Galaxy, using campus credentials thanks to
integration with Globus Auth [
247
]. They can then select an existing workflow or
create a new one, identify data to be processed, and launch comp utatio nal tasks.
The
web server
, an integral part of the Galaxy system, serves the Galaxy user
interface to Globus Genomics users. Users need only a web browser to access
Globus Genomics capabilities.
The
HTCondor
system [
243
] (see section 7.7 on page 128) maintains a queue
of tasks to be executed, dispatches tasks from that queue to available EC2 worker
nodes, and monitors those tasks for successful completion or failure. The elastic
provi sioner
manages the pool of worker nodes, allocating nodes of the right type
for the tasks that are to be executed, increasing the number of nodes when the
HTCondor queue becomes l ong, and de-allocating nodes when there is little or
no work to do. The elastic provisioner is designed to use spot instances (see
section 5.2.2 on page 77) where possible, in order to reduce costs.
Globus Connect Server
(GCS), as discussed in section 3.6 on page 51,
implements the protocols that the Globus cloud service uses to manage data
transfers between pairs of Globus Connect instances. (The related Globus Connect
Personal service is designed for use on s ing le-user personal computers.) We can
304
Chapter 14. Building Your Own SaaS
think of this component as being equivalent to the agent that runs on your personal
computer to interact with the Dropbox file sharing system, although GCS supports
specialized high-speed protocols.
AW S B a t ch aws.amazon.com/batch
.Amazonrecentlyreleasedthisjobscheduling
and management service, which they in d icate can run hundreds of thousands of batch
computing jobs on the Amazon cloud, dynamically provisioning compute resources
of types (e.g., CPU or memory optimized instances; EC2 and Spot instances) and
numbers required to mee t the resource needs of the jobs submitted. If AWS Batch
behaves as advertised, it might obviate the need f or the HTCondor and elastic
provisioner components of the Globus Genomics solution.
Finally, the
shared file system
uses the Network File System (NFS) to
provide a uniform file system name space across the manager and worker nodes.
This mechanism simplifies the execution of Galaxy workflows, which are designed
to run in a shared file system environment.
Globus Genomics uses Chef
chef.io
for Configuration Management, allowing
service components to be updated and replaced without any error-prone manual
configuration steps [186]. It uses Chef recipes to encode the following steps.
1.
Provision an Identity and Access Management (IAM: see section 15.2 on
page 319) user un der the customer’s Amazon account, with a security policy
that allows the IAM user to create and remove AWS resources and perform
other actions required to set up and run a production-grade science SaaS.
2.
Provision an EC2 instance with HTCondor software, configured to serve as
the head node for an HTCondor computational cluster; Network File Server
software for data sharing with computational cluster nodes; a NGINX web
proxy and WSGI Python web server; a Galaxy process; a Globus Connect
server for external access to the network file system, configured for optimal
performance; Unix accounts for the a dmi ni strators; security updates/patches;
and Domain Name System (DNS) support, via Amazon’s
Route 53
service.
3. Provision the following additional components:
(a)
An Amazon
Virtual Private Cloud
with appropriate network routes
between the head nodes and compute nodes;
(b)
An EBS/EFS-based network file system with optimized I/O configura-
tion;
305
14.4. The Globus Genomics Bioinformatics System
(c)
An elastic provisioner along with network configurations to support the
creation of spot instances across multiple Availability Zones;
(d)
A read-only network volume configured with the tools an d pipelines
required for a specific scientific domain, and with reference datasets
that may be used by a nalysis pipelines;
(e)
Monitoring of the health of various system components, generating
alerts as required;
(f)
An Amazon
Relational Database Service
database for persisting
the state of application workflows; and
(g)
Identity integration with Globu s, and groups configured for authoriza-
tion of user accesses to the instance.
14.4.2 Globus Genomics as SaaS
Globus Genomics has many SaaS attributes. From a technology perspective, it is
accessible remotely over the network, runs a single copy of its software (Galaxy, the
various genomics analysis tools a nd pipelines), and leverages cloud platform s ervices
provided by Amazon a nd Globus for scalabil ity and to simplify its implementation.
Globus Genomics is not multitenant: it creates a separate instance of the
system (the manager node in figure 14.3) for each customer, rather than having
one scalable instance serving all customers. This characteristic of the Globu s
Genomics system is not a problem for users: indeed, a single-tenant architecture
may appear preferable to some due to (at l east an appearance of) increased security
and the clean, transparent bil lin g for Amazon cloud charges that it allows. However,
single tenancy increases costs for the Globus Genomics team over time, as each
new customer requires the instantiation of a complete new Globus Genomics
configuration, increasing Amazon usage and other operations costs, and d ierent
customers cannot share compute nodes.
We note also that the implementation does not guard aga ins t failure of the
node on which the manager logic runs. The Globus Genomics team can detect
such failures and restart the service, but the failure is not transparent to users.
One sol ution would be to re-engineer the system to leverage more microservices,
as we described for Animoto above.
From a business model perspective, Globus Genomics also has SaaS character-
istics in that its use is supported by a subscription model. A l ab or indi vidu al user
signs up for a Globus Genomics subscription that covers the base cost of operating
their private instance. As part of the configuration of a Globus Genomics i ns tance,
306