Index
23andMe genotyping service, 253
academic cloud, 6, 284
examples
Aristotle, see Aristotle academic cloud
Bionimbu s, see Bionimbus academic cloud
Chameleon, see Chameleon academic cloud
Jetstream, see Jetstream academic cloud
RedCloud, see RedCloud academic cloud
access tokens, in OAuth2, 231
ACID semantics, in database, 27
activation functions, 204
actor model of computation, 96, 111
ADIOS, 165
Advanced Message Queuing Protocol, 112, 122
Advanced Photon Source, 240
Amazon cloud services, 29
Amazon Mach ine Learning , 202
Athe na analytics, 149
Aurora MySQL-compatible database, 30, 33
Batch, 305
CloudFormation, 100
CloudFront, 339
CloudTrail auditing, 318
CloudWatch metri cs, 318
Deep Learning AMI, 212
DynamoDB, 31, 116, 310
EC2 Container Service, 114–120, 343
Elastic Block Store (EBS), 29, 77, 264, 305
Elastic Compute Cloud (EC2), 30, 75–80, 98,
100, 303, 309
Elastic File System (EFS), 29, 80, 305
Elastic MapReduce (EMR), 31, 134, 143–146
Elasticsearch, 34
Glacier arch ival storage, 6, 31
Ident ity and Access Management (IAM), 115,
320
Kinesis, 34, 162, 167, 343
Analytics, 167
compared with Simple Queue Service, 170
Firehose, 167, 170
Streams, 167–170
Lambda, 336, 346
Lambda@Edge, 339
Lex voice input, 202
Polly text to speech, 202
Redshift data warehouse, 33
Rekog nition dee p learning, 202
Relational Database Service (RDS), 33, 306, 310
Route 53 service, 305
Simple Email Service (SES), 309
Simple Notification Service (SNS), 309
Simple Queue Service (SQS), 34, 116, 118, 170,
300
Simple Storage Service (S3), 10, 31, 38–41, 309
Titan graph database, 34
Virtual Private Cloud (VPC), 305, 310
Amazon machine image, 212, 326
AMI, see Amazon machine image
Anaconda data science platform, 13
Animoto, 299
Apache Software Foundation projects
Beam, 184
CloudStack, 260, 261
Edgent edge-analytics tools, 162
Flink stream processing framework, 187–188
Hadoop YARN, 33, 108, 137
HBase NoSQL database, 32, 47
Hive data warehouse, 158
Kafka message system, 180
Libcloud, 38
Mesos, 62, 67, 97, 98, 124, 137, 336
Oozie wo rkflow scheduler for Hadoop, 158
Parquet, 150
ZooKeeper, 124
API, see application programming interface
application programming interface, 9
application whitelisting, 318
Argonne National Laboratory, 240
Aristotle academic cloud, 7, 70
Array of Things urban observatory, 164, 171, 339
artificial neural networks, 204
arXiv, document classifier for, 113, 197
Atm osphere cloud platform, 35, 82
367
Index
Aurora, see Amazon cloud services
availability zones, in Amazon cloud, 33, 78, 306
availability zones, in Eucalyptus cloud, 263, 273
AWS B a t ch, see Amazon cloud services
Azure cloud services, 29
Azure Stack private cloud, 260
Batch, 105
Batch Shipyard, 129, 218
Blob storage service, 31
Cloud BI, 67
Content Delivery Network (CDN), 339
Cortana cognitive services, 220
Data Lake, 33, 148
DocumentDB, 33
Event Hubs, 34, 162, 175–179
File Storage, 30
Functions, 339
Graph Engine, 34
HDInsight, 32, 134, 147–1 49
notebook, 344
with Spark MLlib, 193
Machine Learning, 197–201
Queue storage service, 34
Quick S tart orchestration, 105
role-based access control, 319
Security Center service, 318
SQL Database service, 33
Storage Explorer, 31, 44
Stream Analytics, 175–179
Table storage service, 32
Threat Analytics service, 318
U-SQL data analytics tool, 149
Virtual Machi nes, 80–81
back propagation, 205
bag of tasks parallelism, 107
BigQuery, see Google cloud services
Binder, tool for creating contai ners, 91
Bionimbu s academic cloud, 7
bisection bandwidth, 106
Blob, 31
blob, binary large object, 19, 24
boto3, see Python packages
Bridges computer system, 284
BSP, see bulk synchron ous parallelism
bucket, storage aggregation concep t, 1 0
as used in Amazon cloud, 31
bulk synchronou s parallelism, 67, 96, 108
CAP theorem, 28
Catapult, Microsoft FPGA architecture, 338
Cayley, see Google cloud services
Celery, a Python package, 122
Ceph object store software, 35, 51, 265, 289
CfnCluster (CloudFormation Cluster), 100
Chameleon academic cloud, 7, 69, 284
Cinder, see OpenStack cloud software
CIPRES science gateway, 302
client-side encryption, 323
Cloud BI, see Azure cloud services
cloud bursting, 6, 70
cloud data center networks, 129
Cloud DataLab, see Google cloud services
cloud native application, 62, 298, 335–337
Cloud Native Computing Foundation, 335
Cloud Pub/Sub, see Google cloud services, 34
Cloud Security Alliance, 328
cloud, private vs. public, 6, 68
cloud, types of
academic, see academic cloud
community, see community cloud
discovery, see discovery cloud
hybrid, see hybrid cloud
private, see private cloud
public, see public cloud
CloudBridge Python SDK, 38, 50, 84
CloudStack, see Apache Software Foundation projects
CloudTrail, see Amazon cloud services
CloudWatch, see Amazon cloud services
CNTK
see Microsoft Cognitive Toolkit, 210
community cloud, 6
container, aggregation construct in object store, 24
container, server virtualization method, 64, 85–94
compared with virtual machine, 66
Docker support for, 86
sharing secrets, 320
Singularity as alternative to Docker, 94
content distribution network, 339
convolutional neural network, 206
Cortana, see Azure cloud services
cost
of cloud for pCT analysis, 100
of cloud vs. supercomputer, 98
of dierent instanc e types, 80
of physics applicat ions on cloud, 128, 332
savings by elastic provisioner, 80
CyberGIS science gateway, 302
data analytics, 135
examples
famous people in Wikipedia, 144
k-means clustering, 140
rubella in Washington and Indiana, 152
weath er statio n anomalies, 153
tools
Amazon Athen a analytics, 149
Amazon Elastic MapReduce, 143
Azure HDInsight and Data Lake, 147
Google Cloud Datalab, 150
Hadoop, 136
Spark, 137
368
Index
data model, 26
data stream analytics, 161
data transfer node, 244
data warehouse, 20, 29, 148, 151
database management system, 26
dataflow, 67
Datalab, see Google cloud services
deep learning, 134, 204–212
toolkits
Microsoft Cognitive Toolkit, 210
MXNet, 205
Reko gnition dee p learning, 202
see also Tensorflow machine learning
deep neural network, 98, 206
Department of Energy, xiii, 128, 245
DevOps, 111
digital object identifier, 240
discovery cloud, 346
DMagic system, 240
Docker, 85
and Azure Batch Shi pyard, 218
and Datalab, 151
and EC2 Container Service, 114
and Jetstream, 82, 83
and Microsoft Cognitive Toolkit, 218
and Spark, 141
and the Hub, 87, 123
and Union File System, 87
creating your own container, 91
history, 86
security i ssues, 320, 325
Swarm container management, 67, 125, 320
document store, 27
DOI, see digital object identifier
DSpace, 91
EBS, see Amazon cloud services
EC2, see Amazon cloud services
edge computing, 162, 338
EFS, see Amazon cloud services
Elastic MapReduce, see Amazon cloud services
elastic provisioner, 303
enhanced networking, 102
ESnet, 128, 245
Eucalyptus cloud software, 73, 259, 26 1–281
deployment planning, 263
euca2ools command line interface, 275
single cluster cloud, 267
Walrus object store, 264, 265
European cloud providers, 4
European science cloud, 69
eventual consistency, 27
Fermilab, 128, 332
FGPA, see field programmable gate array
field programmable gate array, 338
file shares, 29
file system, 19
filtered back projection, 99
firewall, 64, 69, 324
and science DMZ, 244
in Eucalyptus, 263, 268
in OpenStack, 287
fourth paradigm, 128
Galaxy workflow system, 82, 91, 304
used in Globus Genomics, 303
Ganglia monitoring tool, 101
gcfuse, 128
genome wide association study, 253
GeoDeepDive, 128
gigabit testbeds, 330
GitHub, 13
and cloud access keys, 317, 326
Glance, see OpenStack cloud software
Globus Genomics, 108, 128
Globus platform, 225–255
application examples
data sharing at Ad vanced Photon Source, 240
NCAR Research Data Archive, 247
Sanger Imputation Service, 253
example code
creating a shared endpoint, 227
remotely accessible service, 241
research data portal, 248
SDKs
Globus Auth SDK, 231–239
Globus Transfer SDK, 227–230, 239
Globus research data management service, 15, 298
accounts, 236
as software as a service, 15, 307–310
data search service, 240
endpoints, 226
Globus Connect, 51, 304
identity providers supported, 236
publication service, 240
GlusterFS, 35
Google cloud services, 29
AppEngine, 82
BigQuery, 33, 150–157
Bigtable NoSQL, 32, 47–48
Cayley graph data base, 30, 34
Cloud Dataflow, 184
Cloud Datalab, 67, 134, 150–157
Cloud Datastore, 30, 32, 122
Cloud Functions, 339
Cloud Pub/Sub, 34, 122
Cloud SQL relational database, 33
Cloud Storage, 31
Coldline archiva l storage, 31
Compute Engine, 30, 82
Datastore NoSQL, 48–50
369
Index
Drive storage , 19, 51
Kubernetes, see Kubernetes container manage-
ment
local SSD storage, 30
persistent disk storage, 30
Spanner distributed relational database, 33
storage services, 46–50
GPFS file system, 51
graph database, 20, 28
graph execution model, 96
graphics processing unit, 80, 98, 212, 218, 335
GWAS, see genome wide association study
Hadoop, 96
Hadoop Distributed File System, 35, 51, 136
happy scientist, 96, 221
HBase, see Apache Open Source projects
HDInsight, see Azure cloud services, 147
Health Insurance Portability and Accountability Act,
69, 323
HEPCloud project, 128
high performance computing, 69, 70, 94, 97–107,
222, 283, 284
and streaming, 165
on Amazon cloud, 100–103
on Azure cloud, 105–106
scaling challenges, 106
types of parallel execution, 61
HIPAA, see Health Insurance Portability and Ac-
countability Act
Hive, see Apache Software Found ation projects
HPC, see high performance computing
HPSS hierarchical storage management system, 51
HTCondor job management system, 67, 128, 304
hybrid cloud, 6, 259
hypervisor, 64, 74, 286
IaaS, see infrastructure as a service
IAM, see Amazon cloud services
InCommon identity management federation, 226
infinite loop, see loop, infinite
infrastructure as a service, 1, 63, 262, 318
security re sponsibilities, 319
Intern et of Things, 175, 339
iRODS, 91
iSCSI Extensions for RDMA, 287
Javascript, 9, 235, 252
Jetstream academic cloud, 7, 35, 69, 284
Jupyter, 13
and Google Cloud Datalab, 67
create container with Binder, 91
environment, 342
JupyterHub multiuser system, 326
notebooks, 341–344
Kafka, see Apache Software Foundation projects
Kerberos, used by Lustre, 288
key pair, obtainin g for Amazon cloud, 38, 75, 112
key-value store, 27
Keystone, see OpenStack cloud software
Kinesis, see Amazon cloud services
Kubernetes container management, 67, 97, 120–124,
335
and Binder, 91
used by Google, 67
KVM, 74, 272, 294
Lambda, see Amazon cloud services
Lambda@Edge, see Amazon cloud services
Lex, see Amazon cloud services
Libcloud, see Apache Software Foundation projects
local SSD, Google cloud service, 30
logistic function, 193
logistic regression, 193
long-short term memory, 209
loop, infinite, see infinite loop
Lustre parallel file system, 24, 51, 288
machine learning, 134, 191–223
examples
admitting students, 216
classifying scientific papers, 197
generating text, 210
inspecting restaurants, 193
labeling faces, 220
recognizing images, 214
methods
clustering, 192
deep learning, 204
logistic regression, 193
neural network, 200
random forest, 200
tools
Amazon Mach ine Learning pla tform, 202–203
Azure Machine L earning, 197–2 01
MXNet open source library, 212–215
scikit-learn package, 91
see also Spark MLlib machine learning
see also TensorFlow machine learning
Vowpal Wabbit, 91, 198
magic operators, 143
manager worker parallelism, 107
many task parallelism, 66, 96, 107–108
MapReduce, 67, 96, 108
Mesos, see Apache Software Foundation projects
Mesosphere, 124
Message Passing Interface, 66, 218, 222
application to proton therapy, 98
in the cloud, 97
metacomputer, 330
MG-RAST metagenomics analysis service, 302
370
Index
microservice, 96, 110–122
and cloud native applications, 335
managing keys for, 320
Microsoft cloud, see Azure cloud services
Microsoft Cognitive Toolkit, 96, 210, 218
multitenancy, 288, 298, 301, 306
MySQL database, 26
Amazon Aurora compatible with, 33
nanoHUB science gateway, 302
National Center for Atmospheric Research, 247
National Institute of Standards and Technology, xiii,
3, 328
National Institutes of Health, xiii
National Science Foundation, xiii , 163
National Security Agency, 321
Network Enabled Optimization Server, 301
Network File System, 80, 305
Networks File System, 35
Neutron, see OpenStack cloud software
NGINX web proxy with load balancing, 125
Nimb us cloud software, 73
non-uniform memory access, 285, 287, 295
NoSQL database, 27–28, 36, 113
Amazon DynamoDB, 31
Azure DocumentDB, 33
Azure Table, 32
document store variant, 27
Google Cloud Bigtable, 32
Google Cloud Datastore, 32
HBase, 32, 157
Nova, see Op enS tack cloud software
NUMA, see non-uniform memory access
OAu th 2.0 authorizatio n framework, 231
object store, 24, 51, 264
object, cloud storage unit, 10
Ocean Observatories Initiative, 163
Oozie, see Apache Software Foundation projects
Open Compute Project, 337
Open Researcher and Contributor ID, 236
Open Science Data Cloud, 159
OpenID Connect Core 1.0, 231, 236
OpenNebula cloud software, 73, 259, 261
OpenStack cloud software, 73, 259, 283–296
and high performance computing, 284
and scientific workloads, 285
Cinder block storage, 284
core services, 284
deployment, 288
Glance image service, 284
Keystone identity component, 284
Neutron networking component, 284
Nova compute component, 284
Shared File Systems, 35
Swift object storage, 34, 284
OrangeFS file system, 51
ORCID, see Open Researcher and Contributor ID
PaaS, see platform as a service
parallel computing paradigms, 96
Parquet, see Apache Software Foundation projects
PCollection, in Apache Beam, 184
persistent disk, Google cloud service, 30
personal health information, 323
PHI, see personal health information
Phoenix relational layer over HBase, 158
Pig, 158
platform as a service, 3, 318
see also data analytics
see also Globus platform
see also machine learning
security responsibil ities, 319
see also streaming
Portable Operating System Interface, 23, 51, 56
POSIX, see Portable Operating System Interface
PostgreSQL, 80
PostgreSQL database, 26, 33, 310
precision, in machine learning, 197
private cloud, 5, 259
with Azure Stack, 260
with Eucalyptus, 261
with OpenStack, 283
Prometheus, 335
public cloud, 4, 95
pros and cons, 68
public datasets, sources of, 158
publish/subscribe, 34
in Apache Kafka, 180
Python packages
Apache Libcloud SDK, 38
Azure Data Lake Store SDK, 148
Boto3 SDK, 11, 40, 77, 92, 168
Celery remote procedure call, 122
CloudBridge SDK, 38, 50, 84
Globus Auth SDK, 231–239
Globus Transfer SDK, 53, 227–230, 239
Google Cloud SDK, 46
Requests HTTP library, 253
scikit-learn machin e le arning, 91, 192
query language, 19, 26, 177
RabbitMQ message broker, 122, 126
Rackspace, 3, 5
RDD, see resilient distributed data set
RDS, see Amazon cloud services
recall, in machine learning, 197
recurrent neural network, 209
RedCloud academic cloud, 7
reinforced learning, 220
relational database, 19, 26, 33
371
Index
Relational Database Service, see Amazon cloud ser-
vices
representati on al state transfer, 9, 33, 37, 226
use by cloud platforms, 134
research data portal, 243
design pattern, 243–252
examples
data delivery at Advance Photon Source, 240
NCAR Research Data Archive, 247
Sanger Imputation Service, 253
resilient distrib uted dataset, 138, 171, 192, 194
resource owner, 231
resource server, 231
REST, see representatio nal state transfer
RIAK CS, 265
RNN, see recurrent neural network
role-based security, 113
Route 53, see Amazon cloud services
S3, see Amazon cloud services
SaaS, see software as a service
Sanger Imputation Service, 253
Scala, 138
scale, challenges of, 66
science DMZ, 244
science gateway, 302
see also research data portal
scikit-learn machine learning, 91
SDK, see software development kit
Secure Socket Layer, 321
server-side encryption, 322
serverless co mputing, 62, 67
service level agreem ent, 107, 262
SES, see Amazon cloud services
Simple Azure, 81
single program multiple data, 96
Single-Root I/O Virtualization, 287
Singularity container technology, 94
SMB protocol, used by Azure File Storage, 30
software as a service, 2, 299–301, 318
and multitenancy, 301
and science gateways, 301
examples
Animoto video rendering, 299
Globus Genomics bioinformatics, 303–307
Globus research data management, 307–310
security re sponsibilities, 319
software development kit, 9, 38
solid state disk, 30, 98
Southern California Earthquake Center, 162
Spark, 67, 96, 137–143
and SQL, 142
DataFrames, 142, 192
in a container, 141
simple example program, 138
Streaming, 170
Spark MLlib machine learning, 192–197
Chicago restaurant example, 193
estimators, 192
pipeline, 192
transformers, 192
Spectra Logic BlackPe arl, 51
SPIDAL data analytics tools, 222
SSD, see solid state disk
SSH back door in AMI, 326
Storage Service Encryption, in Azure, 323
supervised learning, 223
Swarm, 67
Swift, see OpenStack cloud software
Swift parallel scripting language, 34
Tensor Processing Unit, 338
TensorFlow machine learning, 96, 157, 206–208, 212,
215–218, 338
notebook, 344
school adm issions example, 216–218
tensors, in Microsoft Cognitive Toolkit, 218
Titan graph database, 34
topologies, 180
training set, 194
Transport Layer Security, 321
tumbling window, 178
U-SQL data analytics tool, 149
Union File System, 86
unsupervised learning, 223
urban informatics, 163
UUID, universally unique identifier
use to name buckets, 46
used by Globus, 53, 236
virtual machine, 64, 73–84
compared with container, 66
instance storage, 77
Virtual Private Cloud, see Amazon cloud services
virtual private network, 324, 326
virtualization, 74
VMware Cloud Foundation, 260
Vowpal Wabbit learning system, 91, 198
VPC, see Amazon cloud services
Walrus, see Eucalyptus cloud software
WebHDFS, 33, 148
Wikipedia, analysis of names within, 144
XSEDE, 7, 35, 82
YARN, see Apache Software Foundation projects
Zeppelin web-based notebook, 143
372
Scientific and Engineering Computation
William Gropp and Ewing Lusk, editors; Janusz Kowali k, founding editor
Data-Parallel Programming on MIMD Computers,PhilipJ.HatcherandMichaelJ.Quinn,1991
Enterprise Integration Modeling: Proceedings of the First International Conference, edited by Charles
J. Petrie, Jr., 1992
The High Performance Fortran Ha ndbook, Charles H. Koelbel, David B. Loveman, Robert S. Schreiber,
Guy L. Steele Jr. and Mary E. Zosel, 1994
PVM: A User’s Guide and Tutorial for Network Parallel Computing, Al Geist, Adam Beguelin, Jack
Dongarra, Weicheng Jia ng, Robe rt Manchek, and Vaidyalingham S. Sunderam, 1994
Practical Parallel Programming, Gregory V. Wilson, 1995
Enabling Technologies for Petaflops Computing, Thomas Sterling, Paul Messina, and Paul H. Smith,
1995
An Introduction to High-Performance Scientific Computing, Lloyd D. Fosdick, Elizabeth R. Jessup,
Carolyn J. C. Schauble, and Gitta Domik, 1995
Parallel Programming Using C++, edited by Gregory V. Wilson and Paul Lu, 1996
Using PLAPACK: Parallel Linear Algebra Package, Robert A. van de Geijn, 1997
Fortran 95 Handbook, Jeanne C. Adams, Walter S. Brainerd, Jeanne T. Martin, Brian T. Smith, and
Jerrold L. Wagener, 1997
MPI—The Complete Reference: Volume 1, The MPI Core, Marc Snir, Steve Otto, Steven Huss-Lederman,
David Walker, and Jack Dongarra, 1998
MPI—The Complete Reference: Volume 2, The MPI-2 Extensions, William Gropp, Steven Huss-Lederman,
Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir, 1998
AProgrammersGuidetoZPL, Lawrence Snyder, 1999
How to Build a Beowulf, Thomas L. Sterling, John Salmon, Donald J. Becker, and Daniel F. Savarese,
1999
Using MPI-2: Advanced Features of the Message-Passing Interface, William Gropp, Ewing Lusk, an d
Rajeev Thakur, 1999
Beowulf Cluster Computing with Windows, edited by Thomas Sterling, William Gropp, and Ewing Lusk,
2001
Beowulf Cluster Computing with Linux, second edition, edited by Thomas Sterli ng, William Gropp, and
Ewing Lusk, 2003
Scalable Input/Output: Achieving System Balance,editedbyDanielA.Reed,2003
Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost, and
Ruud van der Pas, 2008
Quantum Computing without Magic: Devices, Zdzislaw Meglicki, 2008
Quantum Computing: A Gentle Introduction,EleanorG.RieelandWolfgangH.Polak,2011
Using MPI: Portable Parallel Programming with the Message-Passing Interface, third edition,William
Gropp, Ewing Lusk, and Anthony Skjellum, 2015
Using Advanced MPI: Beyond the Basics, Pavan Balaji, William Gropp, Torsten Hoefler, Rajeev Thakur,
and Ewing Lusk, 2015
Scientific Programming and Computer Architecture, Divakar Viswanath, 2017
Cloud Computing for Science and Engineer ing, Ian Foster and Dennis B. Gannon, 2017