The Amazon Data Demo¶

This notebook will work through the demo in section 3.2 of the book. If you do not already have boto3, the amazon python sdk installed, then uncomment and run the following line

#!pip install boto3

import boto3

create an s3 instance object¶

Follow the instructions on the IAM portal to get an access key and a secret key

s3 = boto3.resource('s3',
    aws_access_key_id='your access key',
    aws_secret_access_key='your secret key'
)

Next let's test this by creating our bucket "datacont" in the Oregon data center. The creationBucket location is optional, but the location will be encoded into our URLs later. The creation function will through an exception if the bucket already exists.

try:
    s3.create_bucket(Bucket='datacont', CreateBucketConfiguration={
        'LocationConstraint': 'us-west-2'})
except:
    print "this may already exist"

this may already exist

Now we will make this bucket publicly readable. We will also need to make each blob in the bucket publically readable

bucket = s3.Bucket("datacont")

bucket.Acl().put(ACL='public-read')

{'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',
   'date': 'Thu, 07 Jul 2016 18:37:44 GMT',
   'server': 'AmazonS3',
   'x-amz-id-2': 'RM2zILiBLYtOVnnuVsK0j/7YyEsZFdcGF5PnSQO21HxFGz42U88skmpBXuBQsQ3/f/+E7tKPuXI=',
   'x-amz-request-id': '2D764B7DE7A58577'},
  'HTTPStatusCode': 200,
  'HostId': 'RM2zILiBLYtOVnnuVsK0j/7YyEsZFdcGF5PnSQO21HxFGz42U88skmpBXuBQsQ3/f/+E7tKPuXI=',
  'RequestId': '2D764B7DE7A58577'}}

now let's try to upload a file into the bucket.

#upload a new object into the bucket
body = open('path-to-a-file\exp1', 'rb')

o = s3.Object('datacont', 'test').put(Body=body )

s3.Object('datacont', 'test').Acl().put(ACL='public-read')

{'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',
   'date': 'Thu, 07 Jul 2016 18:38:33 GMT',
   'server': 'AmazonS3',
   'x-amz-id-2': 'rVO6eBJDldB19+sUQLfv/Zmaq7HBl+UBFhVLpW2AdHFNffUF9LP6koE4XKFZXVf5rt19JIG/zSs=',
   'x-amz-request-id': '839011F5955BA066'},
  'HTTPStatusCode': 200,
  'HostId': 'rVO6eBJDldB19+sUQLfv/Zmaq7HBl+UBFhVLpW2AdHFNffUF9LP6koE4XKFZXVf5rt19JIG/zSs=',
  'RequestId': '839011F5955BA066'}}

url for the test item in the buck should be https://s3-us-west-2.amazonaws.com/datacont/test.

Next we will create the dynamodb table. Note that creating the resource does not create the table a table. the following try-block creates the table. We need to give a a Key schema. One element is hashed to produce a partion to store a row, the second key is RowKey. The pair (PartitionKey, RowKey) is a unique identifier to a row in the table.

dyndb = boto3.resource('dynamodb',
    region_name='us-west-2',
    aws_access_key_id='your access key',
    aws_secret_access_key='your secret key'
 )

try:
    table = dyndb.create_table(
        TableName='DataTable',
        KeySchema=[
            {
                'AttributeName': 'PartitionKey',
                'KeyType': 'HASH'
            },
            {
                'AttributeName': 'RowKey',
                'KeyType': 'RANGE'
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'PartitionKey',
                'AttributeType': 'S'
            },
            {
                'AttributeName': 'RowKey',
                'AttributeType': 'S'
            },

        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 5,
            'WriteCapacityUnits': 5
        }
    )
except:
    #if there is an exception, the table may already exist.   if so...
    table = dyndb.Table("DataTable")

#wait for the table to be created
table.meta.client.get_waiter('table_exists').wait(TableName='DataTable')

print(table.item_count)

0

import csv

reading the csv file, uploading the blobs and creating the table¶

We assume that each row of the csv file looks like: (experimentname, id-number, name-of-ith-file, date, comments) We create a url based on where we know the blobs are stored and append that to the tuple above and insert that list into the table.

with open('c:\users\dennis\documents\experiments.csv', 'rb') as csvfile:
    csvf = csv.reader(csvfile, delimiter=',', quotechar='|')
    for item in csvf:
        print item
        body = open('c:\users\dennis\documents\datafiles\\'+item[3], 'rb')
        s3.Object('datacont', item[3]).put(Body=body )
        md = s3.Object('datacont', item[3]).Acl().put(ACL='public-read')
        
        url = " https://s3-us-west-2.amazonaws.com/datacont/"+item[3]
        metadata_item = {'PartitionKey': item[0], 'RowKey': item[1], 
                 'description' : item[4], 'date' : item[2], 'url':url} 
        try:
            table.put_item(Item=metadata_item)
        except:
            print "item may already be there or another failure"

['experiment1', '1', '3/15/2002', 'exp1', 'this is the comment']
['experiment1', '2', '3/15/2002', 'exp2', 'this is the comment2']
['experiment2', '3', '3/16/2002', 'exp3', 'this is the comment3']
['experiment3', '4', '3/16/2002', 'exp4', 'this is the comment233']

now let's search for an item'

response = table.get_item(
    Key={
        'PartitionKey': 'experiment3',
        'RowKey': '4'
    }
)
item = response['Item']
print(item)

{u'url': u' https://s3-us-west-2.amazonaws.com/datacont/exp4', u'date': u'3/16/2002', u'PartitionKey': u'experiment3', u'description': u'this is the comment233', u'RowKey': u'4'}

response

{u'Item': {u'PartitionKey': u'experiment3',
  u'RowKey': u'4',
  u'date': u'3/16/2002',
  u'description': u'this is the comment233',
  u'url': u' https://s3-us-west-2.amazonaws.com/datacont/exp4'},
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '198',
   'content-type': 'application/x-amz-json-1.0',
   'date': 'Thu, 07 Jul 2016 18:55:49 GMT',
   'x-amz-crc32': '3835589557',
   'x-amzn-requestid': 'LBV3KQ5GJTK9I2A85EB4MJ2ENVVV4KQNSO5AEMVJF66Q9ASUAAJG'},
  'HTTPStatusCode': 200,
  'RequestId': 'LBV3KQ5GJTK9I2A85EB4MJ2ENVVV4KQNSO5AEMVJF66Q9ASUAAJG'}}