Haas Career Management Center Database IEOR 115 Ken Goldberg Final Presentation December, 10, 2011 Samuel Lin | AlliZheng | Tao Luo | Yorke Lee | Carney Lu Justin Pao | Butian Li | AdibKashem Overview Background Haas School of Business Career Management Group Provides candidates access to employers Organizes on campus career events Handles 1000+ electronic job postings per year Ranked #4 by BusinessWeek survey Maintains relationships with hiring managers and recruiters to track industry firm nights, career panels, corporate visits, alumni mixers and job fairs. Plans workshops, panels, networking events and receptions Background Existing Database Support Excel Data stored in unrelated Excel spreadsheets Custom reports formulated manually by tracking across many spreadsheets •Extremely tedious and time consuming •Possibility of Human Errors Project Objective Clients Expectation Ability to build customized reports instantly Ability to search for records with specific properties Ability to come up with intelligent recommendations for events, job listings etc. for students. Reduce time and stress involved in data entry and management Reduce human errors Project Objective Database Objective To provide ranking metrics to magazines such as US News To provide all kinds of employment reports including, but not limited to: Top Full-Time Employers Class-wise Full-Time Employment Report Employment By Industry/ Function Advance functions Salary Forecast Class preferences based to career inclination Search alumni nearby EER Diagram EER Diagram Relational Schema Relational Schema 1. Person(pid,fname, lname, mi, street_address, apt_no, zip_code4, salutation, gender) a. Student(sid,pid1, date_of_admission, date_of_graduation, department7) b. Professor(faculty_id,pid1) c. Supervisor(pid1, cid2) d. Staff(SSN,pid1) e. Advisor(SSN, pid1, department7) f. Philanthrophist(pid1) g. Potential_applicant(pid1, sourced_through) h. Subscriber(pid1) 2. Company (cid, name, address, liaison_name, liaison_tel, liaison_email) 2a. Branch (cid2,branch no., address) 3. University (uid, name, university_location4, liaison_name,liaison_tel, liaison_email) 4. Location (zipcode, country, state, city, latitude, longitude) 5. Job_Listing (listingid, title, description, job_location4, uploaded_by_company2, salary_range) 6. Job_Offer (sid1, company_id2, position, job_location4, salary, bonus, stock_options, date, accepted_or_declined) 7. Department (did, name, head_of_department1b) 8. Course_instance (ccn, course_no8a, semester, instructor1b) 8a. Course(course_no, title, department7) Relational Schema 9. Magazine_Publication (mid, name, address, liaison_name, liaison_tel, liaison_email) 10. Data_Request (drid, date, magazine_or_publication9, data_requested, handled_by_staff1d, deadline, fulfillment_status) 11. Ranking (rid, magazine_or_publication9, date, ranking_details) 12. Social_Media_Account (acc_id, profile_link, staff_in_charge1d, acc_email, acc_password) 13. Social_Media_Subscription(pid1, social_media_acc_id12, profile_email) 14. Announcement (aid, date, title, description, posted_by_staff1d, posted_on_account12) 15. Event (eid, created_by_staff1d, held_at_venue17, caterer_id16, title, date) a. Workshop (eid15, advisor_in_charge1e) b. Recruitment_Event (eid15, advisor_in_charge1e) c. Alumni_Mixer (eid15) d. Outreach_Event (eid15, hosted_by_university3) 16. Caterer (caterer_id, name, address, liaison_name, liaison_tel, liaison_email) 17. Venue (vid, building, room, contact_name, contact_tel, contact_email) 18. Donation (donation_id, donation_date, donation_amount, donated_by1, received_by1dspecial_notes) Relational Schema •These are junction tables to help describe the n to nrelationships 19. Working_Experience(sid1a, company_worked_for2, type_of_employment, start_date, end_date) 20. Studied_At(sid1a, school_attended3, degree_type, field, start_date, end_date, GPA) 21. Supervisor_Recommendation(sid1a, supervisor_pid1, cid2, date, letter) 22. Professor_Recommendation(sid1a, professor_pid1, date, letter) 23. Event_Attendance(eid15, attendee_pid1) 24. Professor_In_Department(prof_pid1b, department7 ) 25. Event_Invite(eid15, cid2) 26. Advisor_Hosts_Event(pid1,eid15) 27. Venue_Booking(staff_in_charge1d, booking_for_event15, date_reserved, venue_booked17) 28. Caterer_Order(staff_in_charge1d, caterer_id16, date_of_order) 29. Applies_For(sid, listingid, date, application) 30. Student Takes Course in Semester Course_Taken(sid1, ccn8,grade) 31. Professor Teaches Course in Semester Course_Taught(prof_pid1b, ccn8) 32. Email_Address(pid1, email_address) 33. Contact_Number(pid1, contact_no) 34. Student_In_Department(sid1, department7) 35. Advisor_In_Department(advisor_id1e, department7) Normalization Analysis Normalization Unnormalized Relation 1. Person (pid,fname, lname, mi, street_address, apt_no, city, state, country, zip_code, contact_no, email, salutation, gender) 1a. Student(sid,pid1, date_of_admission, date_of_graduation, department) 1e. Advisor(SSN, pid1, department7) 1NF Person (pid,fname, lname, mi, street_address, apt_no, zip_code4, salutation, gender) 1a. Student(sid,pid1, date_of_admission, date_of_graduation) 1e.Advisor(advisor_id,SSN, pid1) 32. Email_Address(pid1, email_address) 33. Contact_Number(pid1, contact_no) 24. Student_In_Department(sid1, department7) 25. Advisor_In_Department(advisor_id1e, department7) 1. Normalization Unnormalized Relation 1. Person(pid,fname, lname, mi, street_address, apt_no, city, state, country, zip_code, contact_no, email, salutation, gender) 2NF 1. Person (pid,fname, lname, mi, street_address, apt_no, zip_code4, salutation, gender) 4. Location (zipcode, country, state, city, latitude, longitude) 3NF (Not Implemented) 4. Location (zipcode, country, state, city) 5. City_Location(Longitude,Latitude, City) Normalization Unnormalized Relation 8. Course (course_id, title, department7, semester, instructor1b) BCNF 8. Course_instance (ccn, course_no8a, semester, instructor1b) 8a. Course(course_no, title, department7) Queries Queries Query 1: Salary forecast Description: •Given a selected industry, predict the expected average salary for a new job position within that industry over the next X years •Useful to help job-seeking graduates easily visualize the salary trends for each industry Queries Java Code: •Connects to Access database and executes the sql query: select o.year_offered, avg(o.salary) from [Job Offer] as o, Company as c, Department as i where o.status = 'Accepted' and o.cid = c.cid and c.iid = <iid> group by o.year_offered order by o.year_offered ASC; •This extracts the average salary for each year for the given industry. • The Java code performs a linear regression in order to determine the linear relationship between the year and expected salary, then creates a new table and inserts those predicted values (over the next X years) into the database Queries Demo Queries Query 2: Nearby alumni Description: •Given a selected city, select all the alumni who are currently working within X km from that city •Used for current students to seek out seniors in the region (for advice, meet-ups), or to invite alumni to attend events that are being held in a certain city Queries VBA Code: • Contains function needed to determine distance between two geographical coordinates • Iterates through location database and determines which cities are within x km of given city using the haversine formula: • Creates and populates NearbyCities table with list of cities To retrieve all alumni working in nearby cities: SELECT s.SID, c.city, p.First_Name, p.Last_Name, o.position FROM [Job Offer] AS o, nearbyCities AS c, Student AS s, Person AS p WHERE o.sid=s.SID And o.job_location=c.LID And p.pid=s.pid; Queries Demo Queries Query 3: Donation Capacity Description •Ranks alumni by their donation capacity •The donation capacity of each alumni is calculated by a weighted combination of several factors: salary, years out of school, donation history •The weights of these factors were researched and simplified from research papers) to result in the following formula: Donation_Capacity = 0.26 * salary/10000 + 0.17 * years_out_of_school + 0.64 * sum_of_previous_donations/5000 Queries Retrieve the last recorded salary of each student SQL: SELECT s.sid, o.salary FROM student AS s, [Job Offer] AS o WHERE o.sid = s.sid and o.status = 'Accepted' and o.[date accepted] in ( select max(o2.[date accepted]) from [Job Offer] o2 where s.sid = o2.sid ); Retrieve the sum of all donations made by each student SQL: SELECT s.pid, Sum(d.donation_amount) AS total_donations FROM student AS s, donation AS d WHERE d.donated_by=s.pid GROUP BY s.pid; Queries SQL: SELECT DISTINCT s.sid, p.First_name, p.Last_Name, ((0.26*ls.salary)/10000+(0.64*ds.total_donations)/5000+0.17*((Date()-s.[date of graduation])/365)) AS Donation_Capacity FROM person AS p, student AS s, donation AS d, (SELECT s2.pid, Sum(d2.donation_amount) AS total_donations FROM student AS s2, donation AS d2 WHERE d2.donated_by=s2.pid GROUP BY s2.pid) AS ds, (SELECT s3.sid, o3.salary FROM student AS s3, [Job Offer] AS o3 WHERE o3.sid = s3.sid and o3.status = 'Accepted' and o3.date accepted in ( select max(o2.[date accepted]) from [Job Offer] o2 where s3.sid = o2.sid )) AS ls WHERE s.sid=ls.sid And s.pid=p.pid And s.pid=ds.pid ORDER BY ((0.26*ls.salary)/10000+(0.64*ds.total_donations)/5000+0.17*((Date()-s.[date of graduation])/365)) DESC; Queries Query 3: Donation Capacity Queries Query 4: Magazine Influence Description •Rank each business magazine/publication rankings by their influence on the number of job offers our graduates get •Help the career management center prioritize data requests from each magazine/publication •Perform a linear regression to obtain the coefficient of determination for each magazine Queries Query 4: Magazine Influence Java Code •Connects to Access database and executes the dynamically created SQL query for each magazine in the table: SELECT Count(*) AS Offer_Count, r.ranking, o.year_offered FROM [Job Offer] AS o, ranking AS r, [Magazine Publication] AS mp WHERE (((r.magazine_or_publication)=mp.mid) And ((o.year_offered)=r.year) And ((mp.mid)=<selected mid>)) GROUP BY o.year_offered, r.ranking; •This extracts a pairing of the number of offers received by graduates together with how highly we were ranked by that magazine that year. With these data pairs •Performs a regression to calculate the correlation factor between number of job offers received and our ranking in the magazine •Writes these correlation factors back into the database under a table called Influence Queries Query 4: Magazine Influence SQL Code to retrieve magazines sorted by influence: SELECT i.mid, m.name, i.influence_factor FROM [Magazine Publication] AS m, Influence AS i WHERE m.mid=i.mid ORDER BY i.influence_factor DESC; Queries Query 5: Course Recommendations Description •Suggest recommended courses a student should take if he wants to enter a specific industry •Ranks each class by the number of times they were enrolled in by students who end up receiving offers from companies in a particular industry SQL SELECT c.course_id, c.title, Count(*) AS course_count FROM course AS c, student AS s, [Job Offer] AS o, course_taken AS ct, company AS com WHERE o.sid=s.sid And ct.sid=s.sid And ct.course_id=c.course_id And o.cid=com.cid And com.iid= <selected industry> GROUP BY c.course_id, c.title ORDER BY Count(*) DESC; Queries Query 5: Course Recommendations to ask Q and