CS143 Project 1 Due: Oct 24th, 11:59 PM All the materials will be posted in courseweb. Before we start • Two things to do: • Find your partner • At most 2 students • Send team information (you and your partner's name, UID, email, expected password for your MySQL account) to one of TAs by Oct 12th • Note: This team is for your project, NOT homework. You need to finish your homework individually. • Get familiar with Linux and MySQL Linux • A Unix-like, open source operating system • All the projects will be done on the SEASNET linux server. • lnxsrv03.seas.ucla.edu • How to access the server? • If you are using SEASNET machine, all SEASNET machines already have a secure shell client installed, so you simply need to run the client. • If you need to access from a personal machine that does not have a secure shell client, you can download a windows secure shell client http://www.filewatcher.com/m/SSHSecureShellClient3.2.9.exe.5517312.0.0.html Or Putty http://www.chiark.greenend.org.uk/~sgtatham/putty/ • Mac OS X or Unix machines have a secure shell client preinstalled. Simply type "ssh -l <userid> lnxsrv03.seas.ucla.edu" within your command line interface Linux • Account for Linux Server • Apply SEASNET account if you don’t have one • Frequently used Linux command: • http://linuxcommand.org/learning_the_shell.php • Try it! MySQL • Already installed on lnxsrv03 server • Username & password will be assigned after you submit your team request. • MySQL document: http://dev.mysql.com/techresources/articles/mysql_intro.html Project 1 • Step 1: Loading the data • Step 2: Running easy queries • Step 3: Applying some constraints • Step 4: Join operation • Step 5: A more complicated query • Step 6: Putting all together Step 1: Loading the data • There are 5 data files located at /u/cs/class/cs143/cs143ta/proj1/data/ • Author.csv, Coauthored.csv, Authored.csv, Paper.csv, Cites.csv • Also available at the courseweb • Task: Load these 5 data files to MySQL using the “load data” command • Before Loading, you should: • 1. Log in MySQL • 2. Use your own database • 3. Create 5 tables: Author, Coauthored, Authored, Paper, Cites Step 1: Loading the data Table Paper: • ID (INTEGER) • paper_id (INTEGER) • title_str (VARCHAR) • authors_str (VARCHAR) • area (VARCHAR) • num_abstract_wds (INTEGER) • num_authors (INTEGER) • num_kb (INTEGER) • num_pages (INTEGER) • num_revisions (INTEGER) • num_title_wds (INTEGER) • comments_str (VARCHAR) • submit_date (DATE) • submitter_email (VARCHAR) • submitter_name (VARCHAR) Table Authored: • ID (INTEGER) • AuthorID (INTEGER) • paperID (INTEGER) • Email (VARCHAR) • rank_in_author_list (INTEGER) • original_name_str (VARCHAR) • email_domain (VARCHAR) • email_country (VARCHAR) • affiliation_str (VARCHAR) • affil (VARCHAR) Table CoAuthored: • ID (INTEGER) • author1ID (INTEGER) • author2ID (INTEGER) • paper_ID (INTEGER) Table Cites: • ID (INTEGER) • paper1ID (INTEGER) • paper2ID (INTEGER) • is_self_citation (INTEGER) Table Author: • ID (INTEGER) • author_name (VARCHAR) • first_name (VARCHAR) • last_name (VARCHAR) • preferred_name(VARCHAR) Step 2: Running some easy queries • Write queries that return the answers to these questions: 1) “Give me the author_name of all the Authors with first_name ‘Kevin’.” 2) “Return author_name and preferred_name of all the Authors who have different author_name and preferred_name.” Sort your results first by author_name then by preferred_name. Step 3: Applying some constraints • Add a unique key constraint to the CoAuthored table in which the combination of (author1ID, author2ID, paper_ID) should be unique. • Add foreign key constraints for author1ID and author2ID. • More details in project description. Step 4: Join operation • Write queries that return the answers to these questions: 1) “Return the author_name of all co-authors of the author with ID ‘42673’.” 2) “Return the author_name of all authors who have more than 10 co-authors.” Step 5: A more complicated query • Write one query that returns the answer to following question: • “Give me the author_name of all the authors with the number of papers they co-authored in, in the decreasing order of the number of paper.” Step 6: Putting all together • Create a script named P1 that shows every step in this part of the project. You can use the ‘--' tag to make comments within your SQL script. Make sure you give adequate comments documenting each part of each step. • Execute the script and save all outputs in a file call P1_Output. • Add one README file, which includes you and your partner's name, UID, email, and any other information you think is useful. • Make a zip file and submit through courseweb.