Lab4Handout

Lab4 Computing on the cloud I: Amazon Web services (aws) Dec 11, 2014 Overview of Lab4: We will create a developer account on amazon (aws.amazon.com). We will work on several exercises to explore the features of amazon cloud and understand how it can support the existing IT infrastructure you have. We will work on complete example in collecting live twitter data and performing a simple analysis on the data. Preliminaries: 1. Create an aws developer account at http://aws.amazon.com 2. Update your credits. 3. Navigate the aws console. Browse through the wide range of infrastructure services offered by aws. 4. Create an amazon key pair. This is the key-pair we will use for accessing applications/services on the cloud. Call it Richs2014 or something you will remember. This is like a key your safety box. Don’t lose. Store Richs1.pem and equivalent private key for putty Richs2014.ppk in a safe location. 5. Identify the credentials of your amazon; just be knowledge about where to locate them when you need them for authenticate yourself/your application using these credentials. https://console.aws.amazon.com/iam/home?#security_credential 6. Identity and Access Management (IAM) is an important service. Read http://docs.aws.amazon.com/IAM/latest/UserGuide/IAM_Introduction.html Exercise 1: Launch an instance EC2: 1. Click on the first one: Services window EC2 from the services dashboardStudy the various items click on the lunch button Step 1: Choose an AMI (Amazon machine image) for the instance you want: this can be single CPU machine to a sophisticated cluster of powerful processors. The instances can be from Amazon Market Place, Community (contributed) AMI, My own AMIs (that I may have created: Eg., RichsAMI) Choose a “free-tier” eligible Windows AMI. Step 2: Choose an instance type: small, large, micro, medium , etc. Step 3: Review and launch. We are done, we have Windows machine. Step 4: Create a new key-pair to access the instance that will be created. We will be accessing the instance we create using Public-private key pair. Download the pair of the key and store it. Launch instance. Once it is ready you will use its public IP and the key pair we saved and the RDP protocol to access the instance on the cloud. Exercise 2: Working with S3: using S3Fox Organizer. 1. Use the security credential to create an account and manage on s3Fox and transfer folder and files up and down. 2. You can also use S3 console itself to upload and download, create folder, update access privileges to files and folders etc. Exercise 3: Hosting a static web site on amazon aws. Overview: Simply load the web site components into an appropriate S3 folder/directories created. Configure a few parameters, policy file and the web site all set to go! Step 1: 1. When you host a website on Amazon S3, AWS assigns your website a URL based on the name of the storage location you create in Amazon S3 to hold the website files (called an S3 bucket) and the geographical region where you created the bucket. 2. For example, if you create a bucket called richs on the east coast of the United States and use it to host your website, the default URL will be http://richs.s3-website-us-east1.amazonaws.com/. 3. We will not use Route 53 and CloudFront for this proof-of-concept implementation. Step 2: 1. Open the Amazon S3 console at https://console.aws.amazon.com/s3/. 2. Create 3 buckets in s3: richs.com, www.richs.com, logs.richs.com 3. Upload the files of your static web page into richs.com bucket. Upload index.html and rangen.js from your lab documents. 4. Set the permissions of richs.com to allow others to view: In the policy edit window enter the code given below and save. { "Version":"2008-10-17", "Statement":[{ "Sid":"Allow Public Access to All Objects", "Effect":"Allow", "Principal": { "AWS": "*" }, "Action":["s3:GetObject"], "Resource":["arn:aws:s3:::richs.com/*" ] } ] } Step 3: Enable logging and redirection (note: for some reason this collides with richs.com) 1. In the logging window enter logs.richs.com and /root in the next box; right click on www.richs.com properties and redirect it to richs.com 2. In the richs.com bucket, enable web hosting and enter index,html as the index document. If you an error document, you can ad that in the next box. 3. Click on the endpoint address that shows up in properties window of richs.com 4. You should be able to see the static application. Exercise 4: Twitter Live Data Collection and Analytics using AWS Overview: Collect live data from twitter and analyze it. For this we need twitter developer account and credentials. We will collect data using Cloud Formation template provided by amazon, store the data in S3; Then we can do any analysis: We will do “wordcount” Step 1: Create a twitter account is you do not have one. Log into twitter and examine how you can manually view/collect tweets about a topic of interest to you. Step 2: In order for an application to directly connect to twitter and automatically collect tweets, the application (that is working on your behalf) needs some strong security credentials. Log into Twitter http://apps.twitter.com, and click on Create New App button. Step 3: Follow the on-screen instructions. For the application Name, Description, and Website, you can specify any text — you're simply generating credentials to use with this demo, rather than creating a real application. Twitter displays the details page for your new application. Click the <Key and Access Tokens> tab collect your Twitter developer credentials. You'll see a Consumer key and Consumer secret. Make a note of these values; you'll need them later in this demo. You may want to store your credentials in a text file in a secure location. At the bottom of the page click Create my access token. Make a note of the Access token and Access token secret values that appear, or add them to the text file you created in the preceding step. Step 4: Create an amazon S3 bucket to store the data you application will collect from twitter. 1. Open the Amazon S3 console. 2. Click Create Bucket. 3. In the Create Bucket dialog box, do the following: a. Specify a name for your bucket, such as tweetanalytics. To meet Hadoop requirements, your bucket name is restricted to lowercase letters, numbers, periods (.), and hyphens (-). b. For the region, select US Standard. c. Click Create. 4. Select your new bucket from the All Buckets list and click Create Folder. In the text box, specify input as the folder name, and then press Enter or click the check mark. 5. For the purposes of this demo (to ensure that all services can use the folders), make the folders public as follows: a. Select the input folder. b. Click Actions and then click Make Public. When prompted, click OK. Step 5: In this step, we'll use an AWS CloudFormation template to launch an instance, and then use a command-line tool on the instance to collect the Twitter data. We'll also use a commandline tool on the instance to store the data in the Amazon S3 bucket that we created. Step 6: To launch the AWS CloudFormation stack 1. Open the AWS CloudFormation console. 2. Make sure that US East (N. Virginia) is selected in the region selector of the navigation bar. 3. Click Create New Stack. 4. On the Select Template page, do the following: a. Under Stack, in the Name box, specify a name that is easy for you to remember. For example, my-sentiment-stack. b. Under Template, select Specify an Amazon S3 template URL, and specify https://s3.amazonaws.com/awsdocs/gettingstarted/latest/sentiment/sentime ntGSG.template in the text box. c. Click Next. 5. On the Specify Parameters page, do the following: a. In the KeyPair box, specify the name of the key pair that you created in Create a Key Pair. Note that this key pair must be in the US East (N. Virginia) region. b. In the TwitterConsumerKey, TwitterConsumerSecret, TwitterToken, and TwitterTokenSecret boxes, specify your Twitter credentials. For best results, copy and paste the Twitter credentials from the Twitter developer site or the text file you saved them in. c. Click Next. 6. On the Options page, click Next. 7. On the Review page, select the I acknowledge that this template might cause AWS CloudFormation to create IAM resources check box, and then click Create to launch the stack. 8. Your new AWS CloudFormation stack appears in the list with its status set to CREATE_IN_PROGRESS. Note Stacks take several minutes to launch. To see whether this process is complete, click Refresh. When your stack status is CREATE_COMPLETE, it's ready to use Step 7: Collect Tweets Using the Instance The instance has been preconfigured with Tweepy, an open-source package for use with the Twitter API. Python scripts for running Tweepy appear in the sentiment directory. To collect tweets using your AWS CloudFormation stack 1. Select the Outputs tab. Copy the DNS name of the Amazon EC2 instance that AWS CloudFormation created from the EC2DNS key. 2. Connect to the instance using SSH. Specify the name of your key pair and the user name ec2-user. For more information, see Connect to Your Linux Instance. 3. In the terminal window, run the following command: $ cd sentiment 4. To collect tweets, run the following command, where term1 is your search term. Note that the collector script is not case sensitive. To use a multi-word term, enclose it in quotation marks. $ python collector.py term1 Examples: $ python collector.py ebola $ python collector.py "ebola" 5. Press Enter to run the collector script. Your terminal window displays the following message: Collecting tweets. Please wait. When the script has finished running, your terminal window displays the following message: Finished collecting tweets. Step 8: Store the Tweets in Amazon S3 Your sentiment analysis stack has been preconfigured with s3cmd, a command-line tool for Amazon S3. You'll use s3cmd to store your tweets in the bucket that you created in Create an Amazon S3 Bucket. To store the collected tweets in your bucket 1. In the SSH window, run the following command. (The current directory should still be sentiment. If it's not, use cd to navigate to the sentiment directory.) $ ls You should see a file named tweets.date-time.txt, where date and time reflect when the script was run. This file contains the ID numbers and full text of the tweets that matched your search terms. 2. To copy the Twitter data to Amazon S3, run the following command, where tweet-file is the file you identified in the previous step and your-bucket is the name of your bucket. Important: Be sure to include the trailing slash, to indicate that input is a folder. Otherwise, Amazon S3 will create an object called input in your base S3 bucket. $ s3cmd put tweet-file s3://your-bucket/input/ For example: $ s3cmd put tweets.Dec11-1227.txt s3://tweetanalytics/input/ 3. To verify that the file was uploaded to Amazon S3, run the following command: $ s3cmd ls s3://tweetanalytics/input/ You can also use the Amazon S3 console to view the contents of your bucket and folders. Step 9: Run analytics on the data. I will run “wordcount” as analytics. We can also run sophisticated Bayesian analysis on the data once it is processed using Natural Language Processing commands (NLP). Step 10: Download and examine the results. Understand the sentiments about the topic. Exercise 5: Billing calculator. 1. 2. 3. 4. 5. 6. It is located at http://calculator.s3.amazonaws.com/index.html This calculator itself is a web application hosted on s3. We will study the various features of the “simple monthly calculator” service. On the right panel you will see a lot of sample applications. On the left panel you will see the various services You can configure your system and estimate the monthly cost! Exercise 6: Public data sets available on aws. 1. Amazon provides access to a huge number of a variety of data sets. See http://aws.amazon.com/datasets/ 2. You can browse by category, look at sample applications etc. Exercise 7: Clean up, shut down, delete and closed down any unwanted services and resources. Otherwise you will accrue charges; though these may be small amount per day, if you forget it that may add up to a lot of money.

Lab4Handout

Related documents

Products

Support

Lab4Handout

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib