3.2. Using Amazon Cloud Storage Services
need to do on a daily basi s. For such tasks, we need an interface to the cloud that
we can program. Cloud providers make this possible by providing REST APIs that
programmers can use to acces s their services programmatically. For programming
convenience, you will usually access these APIs via software development kits
(SDKs), which give programmers language-specific functions for interacting with
cloud services. We discuss the Python SDKs here. The code below is all for
Python 2.7, but is easily converted to Python 3.
Each cloud has special features that make it unique, and thus the different
cloud provider’s REST AP Is and SDKs are not identical. Two efforts are under
way to create a standard Python SDK:
CloudBridge
[
11
]and
Apache Libcloud
libcloud.apache.org
. While both aim to support the standard tasks for all clouds,
those tas ks are only the lowest common denominator of cloud capabilities; many
unique cap abi liti es of each cloud are available only through the REST API and
SDK for that platform. At the time of this writing, Libcloud is not complete; we
will provide an online update when it is ready an d fully docum ented. However, we
do make use of CloudBrid ge in o ur OpenS tack examples.
Building a data sample collection in the cloud
. We use the following simple
example throughout this chapter to illustrate the use of Amazon, Azure, and Google
cloud storage services. We have a collection of data samples stored on our personal
computer and for each sample we have four items of metadata: item number, creation
date, experiment id, and a text string comment. To enable access to these samples by
our collaborators, we want to upload them to cloud storage and to create a searchable
table, also hosted in the cloud, containing the metadata and cloud storage URL for
each object, as shown in figure 3.1 on the following page.
We assume that each data sample is in a binary file on our personal computer
and that the ass ociated metadata are contained in a comma separated value (CSV)
file, with one line per item, also on our personal computer. Each line in this CSV file
has the following format:
item id, experiment id, date, filename, comment string
3.2 Using Amazon Cloud Storage Services
Our Amazon solution to the example problem uses S3 to store the blobs and
DynamoDB to store the table. We first need our Amazon key pair, i.e., access
key plus secret key, which we can obtain from the Amazon
IAM Management
Console
. Having created a new user, we select the create access key button to
create our security credentials, which we can then download, as shown in figure 3.2
on the following page.
38