hosted on the cloud, these same tools can become collaborative information-
processing laboratories.
In general, then, a cloud platform comprises a set of software components that
are operated by the cloud provider and that software developers can incorporate
into their applications, for example by REST API calls. Many systems satisfy
this broad defini tion, and a surprising number of those systems have been used
in sci ence and engineering in one way or another. (Scientists and en gin eers are
enterprising people!) For example, Facebook provides a set of programming
interfaces and tools that developers can use to integrate with the “social graph”
that Facebook maintains of personal relations and information. Researchers have
used this platform’s capabilities to implement scientific collaboration systems and
even peer-to-peer resource-sharing systems in which Facebook friends share storage
space on their computers [
88
]. The Twitter and Salesforce platforms have seen
similar use.
The number of cloud platform capabilities is so large that we cannot hope to
do them justice here. Instead, we focus on four classes of cloud platform services:
• Data analytics
, as implemented with the Hadoop and YARN tools includ-
ing Spark. We show how data analytics can be used on Amazon Elastic
MapReduce and Azure HDInsight and Google’s Cloud Datalab. We also look
at data warehouse tools such as Azure Data Lake and Amazon Athena.
• Streaming data
services, which have become a fully integrated part of
the public cloud landscape. Amazon Kinesis and its analytics tools, along
with Azure Event Hubs and Stream Analytics, are easily used and powerful.
The open source community also has developed a rich collection of tools for
monitoring and analyzing streaming data.
• Machine learning
services, which combine open source libraries and in-
teractive clou d-bas ed development environments to provide exciting new
capabilities. Deep learning is revolutionizing the field because of the avail-
ability of extremely large data collections and powerful computing platforms.
• Globus platform services
, which provide identity, group, and research
data management capabilities that simplify the development of applications
and systems that integrate people an d data at disparate locations, such as
research data management portals.
134