This notebook will work through the demo in section 3.2 of the book. If you do not already have boto3, the amazon python sdk installed, then uncomment and run the following line
#!pip install boto3
import boto3
Follow the instructions on the IAM portal to get an access key and a secret key
s3 = boto3.resource('s3',
aws_access_key_id='your access key',
aws_secret_access_key='your secret key'
)
Next let's test this by creating our bucket "datacont" in the Oregon data center. The creationBucket location is optional, but the location will be encoded into our URLs later. The creation function will through an exception if the bucket already exists.
try:
s3.create_bucket(Bucket='datacont', CreateBucketConfiguration={
'LocationConstraint': 'us-west-2'})
except:
print "this may already exist"
Now we will make this bucket publicly readable. We will also need to make each blob in the bucket publically readable
bucket = s3.Bucket("datacont")
bucket.Acl().put(ACL='public-read')
now let's try to upload a file into the bucket.
#upload a new object into the bucket
body = open('path-to-a-file\exp1', 'rb')
o = s3.Object('datacont', 'test').put(Body=body )
s3.Object('datacont', 'test').Acl().put(ACL='public-read')
url for the test item in the buck should be https://s3-us-west-2.amazonaws.com/datacont/test.
Next we will create the dynamodb table. Note that creating the resource does not create the table a table. the following try-block creates the table. We need to give a a Key schema. One element is hashed to produce a partion to store a row, the second key is RowKey. The pair (PartitionKey, RowKey) is a unique identifier to a row in the table.
dyndb = boto3.resource('dynamodb',
region_name='us-west-2',
aws_access_key_id='your access key',
aws_secret_access_key='your secret key'
)
try:
table = dyndb.create_table(
TableName='DataTable',
KeySchema=[
{
'AttributeName': 'PartitionKey',
'KeyType': 'HASH'
},
{
'AttributeName': 'RowKey',
'KeyType': 'RANGE'
}
],
AttributeDefinitions=[
{
'AttributeName': 'PartitionKey',
'AttributeType': 'S'
},
{
'AttributeName': 'RowKey',
'AttributeType': 'S'
},
],
ProvisionedThroughput={
'ReadCapacityUnits': 5,
'WriteCapacityUnits': 5
}
)
except:
#if there is an exception, the table may already exist. if so...
table = dyndb.Table("DataTable")
#wait for the table to be created
table.meta.client.get_waiter('table_exists').wait(TableName='DataTable')
print(table.item_count)
import csv
We assume that each row of the csv file looks like: (experimentname, id-number, name-of-ith-file, date, comments) We create a url based on where we know the blobs are stored and append that to the tuple above and insert that list into the table.
with open('c:\users\dennis\documents\experiments.csv', 'rb') as csvfile:
csvf = csv.reader(csvfile, delimiter=',', quotechar='|')
for item in csvf:
print item
body = open('c:\users\dennis\documents\datafiles\\'+item[3], 'rb')
s3.Object('datacont', item[3]).put(Body=body )
md = s3.Object('datacont', item[3]).Acl().put(ACL='public-read')
url = " https://s3-us-west-2.amazonaws.com/datacont/"+item[3]
metadata_item = {'PartitionKey': item[0], 'RowKey': item[1],
'description' : item[4], 'date' : item[2], 'url':url}
try:
table.put_item(Item=metadata_item)
except:
print "item may already be there or another failure"
now let's search for an item'
response = table.get_item(
Key={
'PartitionKey': 'experiment3',
'RowKey': '4'
}
)
item = response['Item']
print(item)
response