Access to MIDAS AWS Twitter Data

advertisement
Access to MIDAS AWS Twitter Data
This document describes how MIDAS researchers can request access to the 1% API sample Twitter
data archive that is being maintained on AWS EC2 cloud resources. The first step is to request an
account on the AWS Twitter data analysis compute instance by sending an email to the system
maintainer, Doug Roberts, droberts@rti.org. Access control to this compute instance is maintained via
ssh key pairs.
The following sections of this document describe how to generate a key pair, and how to log on to
the instance. Once the requestor’s public ssh key has been sent to the systems maintainer, a user
account for the requestor will be created, and instructions will be sent for how to access the data and
the keyword search code.
It should be noted that this AWS instance is a Linux host.
SSH Login to MIDAS-AWS twitter instance
SSH login facilitates authentication without the actual transmission of password over the network
thus eliminating chances of spoofing by hackers. This combined with the proper configuration on the
server one can reduce the chances of hackers getting in. This is the choice of authentication/login for
the MIDAS-AWS-Twitter instance. If you are familiar with SSH Keys you can skip the next section.
SSH Keys Basics: (https://wiki.archlinux.org/index.php/SSH_Keys;
https://help.ubuntu.com/community/SSH/OpenSSH/Keys)

SSH keys serve as a means of identifying yourself to an SSH server using public-key
cryptography and challenge-response authentication. One immediate advantage this method
has over traditional password authentication is that you can be authenticated by the server
without ever having to send your password over the network.

SSH keys always come in pairs, one private and the other public. The private key is known only
to you and it should be safely guarded. By contrast, the public key can be shared freely with any
SSH server to which you would like to connect.

When an SSH server has your public key on file and sees you requesting a connection, it uses
your public key to construct and send you a challenge. This challenge is like a coded message
and it must be met with the appropriate response before the server will grant you access. What
makes this coded message particularly secure is that it can only be understood by someone with
the private key. While the public key can be used to encrypt the message, it cannot be used to
decrypt that very same message. Only you, the holder of the private key, will be able to
correctly understand the challenge and produce the correct response.

This challenge-response phase happens behind the scenes and is invisible to the user. As long as
you hold the private key, which is typically stored in the ~/.ssh/ directory, your SSH client
should be able to reply with the appropriate response to the server.

Because private keys are considered sensitive information, they are often stored on disk in an
encrypted form. In this case, when the private key is required, a passphrase must first be
entered in order to decrypt it. While this might superficially appear the same as entering a login
password on the SSH server, it is only used to decrypt the private key on the local system. This
passphrase is not, and should not, be transmitted over the network.
Creating the key files on a linux client:
Create a key-pair on a linux box using the command
ssh-keygen -t rsa
This will generate 2 files named “yourid” and “yourid.pub” in the location you specified. It may
default to the working directory.
Provide the “yourid.pub” file to the MIDAS-AWS server admin (Doug) so that he can add you as a
user on the AWS instance.
Logging from a linux PC
If the key files were properly generated and located, you should be able to long in as indicated
below.
ssh -l yourid 23.21.236.206
If you would like to see the error messages use the verbose (-v) option as below
ssh -v yourid@23.21.236.206
Using a SSH config file to point to the proper key file for authentication:
If the server fails to find your private key file, it is a good idea to have a .ssh/config file which
tells the server where to look for the key file. A sample config file is shown below. If you already have a
config file with entries for other keys, append the file with information about the new key file.
# .ssh/config
# host example
Host 23.21.236.206
User yourID
IdentityFile ~/.ssh/keyfile
Please change the file permission to read only by the following command
chmod 600 .ssh/keyfile
In the above type the contents in bold as is and substitute the remaining as applicable in your case.
If this is successful you should be able to login by typing the command
ssh -l yourid 23.21.236.206
Working from Windows PC:

Tools required:
1. puttyGen to generate the key-pair: You can down load the puTTYgen tool at
(http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) . Note that it
also comes packaged with winSCP.
2. winSCP to transfer files between the local PC and the remote server
3. putty for logging to the remote server via SSH
Generating key pairs from a windows machine:
Instructions for generating the key pairs can be found at:
http://the.earth.li/~sgtatham/putty/0.53b/htmldoc/Chapter8.html. Pay attention to section 8.2.10
"Public key for pasting into authorized_keys file". This is the best way to save your public key file to the
remote Administrator.
Start the puttyGEN tool on your machine:
Click the “Generate” button
Move the mouse around as instructed to generate some randomness.
You will see the above screen after the key is generated.
Saving Private Key:

Click the “Save Private Key” button. Enter a “Key passphrase for extra security. Save your private
key file in a location of your choice as “yourid.ppk”
Saving the Public Key:
 Public key needs to be sent to the remote administrator and needs to be in a suitable format.
Please refer to section 8.2.10. Resist the temptation to click the “save public key” button.
Instead select all the contents of the text box "Public key for pasting into authorized_keys file".

Copy and paste this content to a text file (use notePad, textPpad or wordpad tool) called
“yourid.pub” and send it to your administrator.
Using puTTYgen to convert the private key:
If you have a key generated on a linux machine which you plan to use on your PC, you can use puttyGEN
to convert the private key file to the ppk format suitable for your PC.
Start puTTYgen either by selecting the tool under winSCP or by selecting the puTTYgen.exe file
Click the “Load” button
Browse to the location of the private key file by selecting the “All Files” option
Click “Open” button
Click on OK
Click “Save private key”
At this stage you can enter a pass phrase or choose to have none by clicking “Yes”
Choose a proper name and location for you private key with a “ppk” extention and click on “Save”
button
Logging from a windows machine:
Logging to MIDAS-AWS Twitter instance (23.21.236.206)
Start PuTTY. Enter the IP address in the “Host Name” box in the “Session” category.
Then select Connection-SSH-Auth
Browse to the location of the “privateKey” (youFile.ppk) and click on open
Type in your username (Provided by MIDAS-AWS admin) and you should be authenticated
Download