Project 1

advertisement
CS143 Project 1
Due: Oct 24th, 11:59 PM
All the materials will be posted in courseweb.
Before we start
• Two things to do:
• Find your partner
• At most 2 students
• Send team information (you and your partner's name, UID, email,
expected password for your MySQL account) to one of TAs by Oct 12th
• Note: This team is for your project, NOT homework. You need to finish
your homework individually.
• Get familiar with Linux and MySQL
Linux
• A Unix-like, open source operating system
• All the projects will be done on the SEASNET linux server.
• lnxsrv03.seas.ucla.edu
• How to access the server?
• If you are using SEASNET machine, all SEASNET machines already
have a secure shell client installed, so you simply need to run the
client.
• If you need to access from a personal machine that does not have a
secure shell client, you can download a windows secure shell client
http://www.filewatcher.com/m/SSHSecureShellClient3.2.9.exe.5517312.0.0.html
Or Putty http://www.chiark.greenend.org.uk/~sgtatham/putty/
• Mac OS X or Unix machines have a secure shell client preinstalled.
Simply type "ssh -l <userid> lnxsrv03.seas.ucla.edu" within your
command line interface
Linux
• Account for Linux Server
• Apply SEASNET account if you don’t have one
• Frequently used Linux command:
• http://linuxcommand.org/learning_the_shell.php
• Try it!
MySQL
• Already installed on lnxsrv03 server
• Username & password will be assigned after you submit your
team request.
• MySQL document: http://dev.mysql.com/techresources/articles/mysql_intro.html
Project 1
• Step 1: Loading the data
• Step 2: Running easy queries
• Step 3: Applying some constraints
• Step 4: Join operation
• Step 5: A more complicated query
• Step 6: Putting all together
Step 1: Loading the data
• There are 5 data files located at
/u/cs/class/cs143/cs143ta/proj1/data/
• Author.csv, Coauthored.csv, Authored.csv, Paper.csv, Cites.csv
• Also available at the courseweb
• Task: Load these 5 data files to MySQL using the “load data”
command
• Before Loading, you should:
• 1. Log in MySQL
• 2. Use your own database
• 3. Create 5 tables: Author, Coauthored, Authored, Paper, Cites
Step 1: Loading the data
Table Paper:
• ID (INTEGER)
• paper_id (INTEGER)
• title_str (VARCHAR)
• authors_str (VARCHAR)
• area (VARCHAR)
• num_abstract_wds (INTEGER)
• num_authors (INTEGER)
• num_kb (INTEGER)
• num_pages (INTEGER)
• num_revisions (INTEGER)
• num_title_wds (INTEGER)
• comments_str (VARCHAR)
• submit_date (DATE)
• submitter_email (VARCHAR)
• submitter_name (VARCHAR)
Table Authored:
• ID (INTEGER)
• AuthorID (INTEGER)
• paperID (INTEGER)
• Email (VARCHAR)
• rank_in_author_list (INTEGER)
• original_name_str (VARCHAR)
• email_domain (VARCHAR)
• email_country (VARCHAR)
• affiliation_str (VARCHAR)
• affil (VARCHAR)
Table CoAuthored:
• ID (INTEGER)
• author1ID (INTEGER)
• author2ID (INTEGER)
• paper_ID (INTEGER)
Table Cites:
• ID (INTEGER)
• paper1ID (INTEGER)
• paper2ID (INTEGER)
• is_self_citation (INTEGER)
Table Author:
• ID (INTEGER)
• author_name (VARCHAR)
• first_name (VARCHAR)
• last_name (VARCHAR)
• preferred_name(VARCHAR)
Step 2: Running some easy queries
• Write queries that return the answers to these questions:
1) “Give me the author_name of all the Authors with first_name
‘Kevin’.”
2) “Return author_name and preferred_name of all the Authors who
have different author_name and preferred_name.” Sort your
results first by author_name then by preferred_name.
Step 3: Applying some constraints
• Add a unique key constraint to the CoAuthored table in
which the combination of (author1ID, author2ID, paper_ID)
should be unique.
• Add foreign key constraints for author1ID and author2ID.
• More details in project description.
Step 4: Join operation
• Write queries that return the answers to these questions:
1) “Return the author_name of all co-authors of the author with ID
‘42673’.”
2) “Return the author_name of all authors who have more than 10
co-authors.”
Step 5: A more complicated query
• Write one query that returns the answer to following
question:
• “Give me the author_name of all the authors with the number of
papers they co-authored in, in the decreasing order of the number of
paper.”
Step 6: Putting all together
• Create a script named P1 that shows every step in this part
of the project. You can use the ‘--' tag to make comments
within your SQL script. Make sure you give adequate
comments documenting each part of each step.
• Execute the script and save all outputs in a file call
P1_Output.
• Add one README file, which includes you and your partner's
name, UID, email, and any other information you think is
useful.
• Make a zip file and submit through courseweb.
Download