Lab4Handout

advertisement
Lab4
Computing on the cloud I: Amazon Web services (aws)
Dec 11, 2014
Overview of Lab4: We will create a developer account on amazon (aws.amazon.com). We will
work on several exercises to explore the features of amazon cloud and understand how it can
support the existing IT infrastructure you have. We will work on complete example in collecting
live twitter data and performing a simple analysis on the data.
Preliminaries:
1. Create an aws developer account at http://aws.amazon.com
2. Update your credits.
3. Navigate the aws console. Browse through the wide range of infrastructure services offered
by aws.
4. Create an amazon key pair. This is the key-pair we will use for accessing
applications/services on the cloud. Call it Richs2014 or something you will remember. This is
like a key your safety box. Don’t lose. Store Richs1.pem and equivalent private key for putty
Richs2014.ppk in a safe location.
5. Identify the credentials of your amazon; just be knowledge about where to locate them
when you need them for authenticate yourself/your application using these credentials.
https://console.aws.amazon.com/iam/home?#security_credential
6. Identity and Access Management (IAM) is an important service. Read
http://docs.aws.amazon.com/IAM/latest/UserGuide/IAM_Introduction.html
Exercise 1: Launch an instance EC2:
1. Click on the first one: Services window EC2 from the services dashboardStudy the
various items click on the lunch button
Step 1: Choose an AMI (Amazon machine image) for the instance you want: this can be single
CPU machine to a sophisticated cluster of powerful processors.
The instances can be from Amazon Market Place, Community (contributed) AMI, My own AMIs
(that I may have created: Eg., RichsAMI) Choose a “free-tier” eligible Windows AMI.
Step 2: Choose an instance type: small, large, micro, medium , etc.
Step 3: Review and launch. We are done, we have Windows machine.
Step 4: Create a new key-pair to access the instance that will be created. We will be accessing
the instance we create using Public-private key pair. Download the pair of the key and store it.
Launch instance. Once it is ready you will use its public IP and the key pair we saved and the
RDP protocol to access the instance on the cloud.
Exercise 2: Working with S3: using S3Fox Organizer.
1. Use the security credential to create an account and manage on s3Fox and transfer
folder and files up and down.
2. You can also use S3 console itself to upload and download, create folder, update access
privileges to files and folders etc.
Exercise 3: Hosting a static web site on amazon aws.
Overview: Simply load the web site components into an appropriate S3 folder/directories
created. Configure a few parameters, policy file and the web site all set to go!
Step 1:
1. When you host a website on Amazon S3, AWS assigns your website a URL based on the
name of the storage location you create in Amazon S3 to hold the website files (called an S3
bucket) and the geographical region where you created the bucket.
2. For example, if you create a bucket called richs on the east coast of the United States and
use it to host your website, the default URL will be http://richs.s3-website-us-east1.amazonaws.com/.
3. We will not use Route 53 and CloudFront for this proof-of-concept implementation.
Step 2:
1. Open the Amazon S3 console at https://console.aws.amazon.com/s3/.
2. Create 3 buckets in s3: richs.com, www.richs.com, logs.richs.com
3. Upload the files of your static web page into richs.com bucket. Upload index.html and
rangen.js from your lab documents.
4. Set the permissions of richs.com to allow others to view: In the policy edit window enter the
code given below and save.
{
"Version":"2008-10-17",
"Statement":[{
"Sid":"Allow Public Access to All Objects",
"Effect":"Allow",
"Principal": {
"AWS": "*"
},
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::richs.com/*"
]
}
]
}
Step 3: Enable logging and redirection (note: for some reason this collides with richs.com)
1. In the logging window enter logs.richs.com and /root in the next box; right click on
www.richs.com properties and redirect it to richs.com
2. In the richs.com bucket, enable web hosting and enter index,html as the index
document. If you an error document, you can ad that in the next box.
3. Click on the endpoint address that shows up in properties window of richs.com
4. You should be able to see the static application.
Exercise 4: Twitter Live Data Collection and Analytics using AWS
Overview: Collect live data from twitter and analyze it. For this we need twitter developer
account and credentials. We will collect data using Cloud Formation template provided by
amazon, store the data in S3; Then we can do any analysis: We will do “wordcount”
Step 1: Create a twitter account is you do not have one. Log into twitter and examine how you
can manually view/collect tweets about a topic of interest to you.
Step 2: In order for an application to directly connect to twitter and automatically collect
tweets, the application (that is working on your behalf) needs some strong security credentials.
Log into Twitter http://apps.twitter.com, and click on Create New App button.
Step 3: Follow the on-screen instructions. For the application Name, Description, and Website,
you can specify any text — you're simply generating credentials to use with this demo, rather
than creating a real application.
Twitter displays the details page for your new application. Click the <Key and Access Tokens>
tab collect your Twitter developer credentials. You'll see a Consumer key and Consumer secret.
Make a note of these values; you'll need them later in this demo. You may want to store your
credentials in a text file in a secure location.
At the bottom of the page click Create my access token. Make a note of the Access token and
Access token secret values that appear, or add them to the text file you created in the
preceding step.
Step 4: Create an amazon S3 bucket to store the data you application will collect from twitter.
1. Open the Amazon S3 console.
2. Click Create Bucket.
3. In the Create Bucket dialog box, do the following:
a. Specify a name for your bucket, such as tweetanalytics. To meet Hadoop
requirements, your bucket name is restricted to lowercase letters, numbers,
periods (.), and hyphens (-).
b. For the region, select US Standard.
c. Click Create.
4. Select your new bucket from the All Buckets list and click Create Folder. In the text box,
specify input as the folder name, and then press Enter or click the check mark.
5. For the purposes of this demo (to ensure that all services can use the folders), make the
folders public as follows:
a. Select the input folder.
b. Click Actions and then click Make Public. When prompted, click OK.
Step 5: In this step, we'll use an AWS CloudFormation template to launch an instance, and then
use a command-line tool on the instance to collect the Twitter data. We'll also use a commandline tool on the instance to store the data in the Amazon S3 bucket that we created.
Step 6: To launch the AWS CloudFormation stack
1. Open the AWS CloudFormation console.
2. Make sure that US East (N. Virginia) is selected in the region selector of the navigation
bar.
3. Click Create New Stack.
4. On the Select Template page, do the following:
a. Under Stack, in the Name box, specify a name that is easy for you to remember.
For example, my-sentiment-stack.
b. Under Template, select Specify an Amazon S3 template URL, and specify
https://s3.amazonaws.com/awsdocs/gettingstarted/latest/sentiment/sentime
ntGSG.template in the text box.
c. Click Next.
5. On the Specify Parameters page, do the following:
a. In the KeyPair box, specify the name of the key pair that you created in Create a
Key Pair. Note that this key pair must be in the US East (N. Virginia) region.
b. In the TwitterConsumerKey, TwitterConsumerSecret, TwitterToken, and
TwitterTokenSecret boxes, specify your Twitter credentials. For best results, copy
and paste the Twitter credentials from the Twitter developer site or the text file
you saved them in.
c. Click Next.
6. On the Options page, click Next.
7. On the Review page, select the I acknowledge that this template might cause AWS
CloudFormation to create IAM resources check box, and then click Create to launch the
stack.
8. Your new AWS CloudFormation stack appears in the list with its status set to
CREATE_IN_PROGRESS.
Note
Stacks take several minutes to launch. To see whether this process is complete, click
Refresh. When your stack status is CREATE_COMPLETE, it's ready to use
Step 7: Collect Tweets Using the Instance
The instance has been preconfigured with Tweepy, an open-source package for use with the
Twitter API. Python scripts for running Tweepy appear in the sentiment directory.
To collect tweets using your AWS CloudFormation stack
1. Select the Outputs tab. Copy the DNS name of the Amazon EC2 instance that AWS
CloudFormation created from the EC2DNS key.
2. Connect to the instance using SSH. Specify the name of your key pair and the user name
ec2-user. For more information, see Connect to Your Linux Instance.
3. In the terminal window, run the following command:
$ cd sentiment
4. To collect tweets, run the following command, where term1 is your search term. Note
that the collector script is not case sensitive. To use a multi-word term, enclose it in
quotation marks.
$ python collector.py term1
Examples:
$ python collector.py ebola
$ python collector.py "ebola"
5. Press Enter to run the collector script. Your terminal window displays the following
message:
Collecting tweets. Please wait.
When the script has finished running, your terminal window displays the following message:
Finished collecting tweets.
Step 8: Store the Tweets in Amazon S3
Your sentiment analysis stack has been preconfigured with s3cmd, a command-line tool for
Amazon S3. You'll use s3cmd to store your tweets in the bucket that you created in Create an
Amazon S3 Bucket.
To store the collected tweets in your bucket
1. In the SSH window, run the following command. (The current directory should still be
sentiment. If it's not, use cd to navigate to the sentiment directory.)
$ ls
You should see a file named tweets.date-time.txt, where date and time reflect when the
script was run. This file contains the ID numbers and full text of the tweets that matched
your search terms.
2. To copy the Twitter data to Amazon S3, run the following command, where tweet-file is
the file you identified in the previous step and your-bucket is the name of your bucket.
Important: Be sure to include the trailing slash, to indicate that input is a folder.
Otherwise, Amazon S3 will create an object called input in your base S3 bucket.
$ s3cmd put tweet-file s3://your-bucket/input/
For example:
$ s3cmd put tweets.Dec11-1227.txt s3://tweetanalytics/input/
3. To verify that the file was uploaded to Amazon S3, run the following command:
$ s3cmd ls s3://tweetanalytics/input/
You can also use the Amazon S3 console to view the contents of your bucket and
folders.
Step 9: Run analytics on the data. I will run “wordcount” as analytics. We can also run
sophisticated Bayesian analysis on the data once it is processed using Natural Language
Processing commands (NLP).
Step 10: Download and examine the results. Understand the sentiments about the topic.
Exercise 5: Billing calculator.
1.
2.
3.
4.
5.
6.
It is located at http://calculator.s3.amazonaws.com/index.html
This calculator itself is a web application hosted on s3.
We will study the various features of the “simple monthly calculator” service.
On the right panel you will see a lot of sample applications.
On the left panel you will see the various services
You can configure your system and estimate the monthly cost!
Exercise 6: Public data sets available on aws.
1. Amazon provides access to a huge number of a variety of data sets. See
http://aws.amazon.com/datasets/
2. You can browse by category, look at sample applications etc.
Exercise 7: Clean up, shut down, delete and closed down any unwanted services and
resources. Otherwise you will accrue charges; though these may be small amount per day, if
you forget it that may add up to a lot of money.
Download