Guide to Creating an S3 Bucket and
Using It with SageMaker
Table of Contents
Introduction ..................................................................................................................... 2
Step 1: Create an S3 Bucket.............................................................................................. 2
Step 2: Upload Dataset to S3 Bucket ................................................................................. 2
Step 3: Link S3 Dataset to SageMaker Notebook ................................................................ 3
Step 4: Request Permission (If Necessary) ........................................................................ 4
References ...................................................................................................................... 4
Introduction
This document outlines the process of creating an S3 bucket on AWS, uploading data to the
bucket, and linking it to an Amazon SageMaker notebook for further data analysis and
processing. It includes a detailed step-by-step guide and relevant code snippets.
Additionally, information on IAM roles and permissions is provided to ensure smooth
integration between the S3 bucket and SageMaker.
Step 1: Create an S3 Bucket
To begin, log in to the AWS Management Console and create an S3 bucket where the
dataset will be stored. Follow these steps:
1. Log in to AWS Management Console: Navigate to the AWS Management Console and log
in.
2. Access S3 Service: In the console, search for 'S3' and select the service.
3. Create a New S3 Bucket:
- Provide a unique name for your bucket (e.g., mybucket, mybucket1name).
- Select the AWS Region closest to your location.
- Configure the bucket settings as needed (versioning, logging, etc.).
4. Set Permissions: Set appropriate permissions for the bucket.
5. Create the Bucket: Click 'Create' to finalize the creation of the bucket.
Step 2: Upload Dataset to S3 Bucket
After creating the S3 bucket, you can upload your dataset. Follow these steps:
1. Open Your S3 Bucket: Navigate to the bucket you created.
2. Upload Data:
- Click on 'Upload'.
- Click 'Add files' to select the dataset you want to upload.
- Follow the steps to complete the upload process.
3. Verify the Upload: Ensure the dataset is listed in the bucket's contents.
Step 3: Link S3 Dataset to SageMaker Notebook
Once the dataset is uploaded to the S3 bucket, you need to link it to your SageMaker
notebook for analysis. Follow the steps below:
1. Create a SageMaker Notebook: Follow the tutorial to create and set up your
SageMaker notebook:
https://www.youtube.com/watch?v=J5l7P593beg.
2. Configure SageMaker to Access S3: In your notebook, import the 'boto3' library to access
AWS services, including S3. Use the following Python code to connect to the S3 bucket and
list the files:
```python
import boto3
conn_s3 = boto3.client('s3')
bucket = 'myrecommendation1system'
content = conn_s3.list_objects(Bucket=bucket)['Contents']
content
```
This code will return the list of files stored in your S3 bucket and this will help provide clarity
your database is connected.
Step 4: Request Permission (If Necessary)
If you encounter permission issues while accessing the S3 bucket or SageMaker, you may
need to request appropriate IAM roles and permissions. Contact your AWS administrator to
ensure the following permissions are granted to the IAM role associated with your
SageMaker notebook:
1. s3:ListBucket
2. s3:GetObject
3. s3:PutObject
These permissions allow your notebook to access and manipulate the data stored in the S3
bucket.
References
1. How to Upload Dataset to S3 and Link It to SageMaker: Watch this tutorial to
understand how to upload your dataset to the S3 bucket and link it to the notebook:
https://www.youtube.com/watch?v=ab-rU8MdbI0.
2. How to Create and Setup SageMaker Notebook: This tutorial explains how to
create and set up SageMaker and notebooks:
https://www.youtube.com/watch?v=J5l7P593beg.