Uploaded by Michelle Lai

idis3802-m5-assignment-instructions

advertisement
Assignment 5: DataFrame I
Start by downloading the file cleaned_survey.csv, students.csv, survey.csv, and Jupyter Notebook
file.
Data Set description
The data set contains one row for every person who completed the survey. We use person and
student interchangeably.
Job: 0 for those students without a job, 0.5 for those students with a part-time job, and 1 for those
students with a full-time job
Program: program of enrollment
C-Regression: indicates whether the student knows (1) or doesn't know (0) a certain programming
language or topic
Classification: indicates the level of knowledge (1-5) on classification
Clustering: indicates the level of knowledge (1-5) on clustering
Bach_0to1: 1 if time elapsed from graduation is less than a year; 0 otherwise
Bach_1to3: 1 if time elapsed from graduation is between one and three years; 0 otherwise
Bach_3to5: 1 if time elapsed from graduation is between three and five years; 0 otherwise
Bach_5Plus: 1 if time elapsed from graduation is more than five years; 0 otherwise
Complete the Jupyter Notebook file on the assignment page to answer the following questions.
Question 1
For each programming skill level, compute:


Find out the ratio of people that know at least one among Python and Java and their
Classification knowledge has to be at least 2.
Compute the standard deviation of the Clustering knowledge (function std), for people with
ProgSkills of 3
Question 2
For the MBA program, how many people have a Programming Skill knowledge of less than 4?
Question 3
Return the rows of those people who know the most languages.
Question 4
Let us define the "data science experience" of a given person as the person's largest score among
Regression, Classification, and Clustering. Compute the average data science experience among all
MBA students. (Pick the correct number in Camino)
Question 5 (Optional)
Among those with at least one year elapsed from their Bachelor's degree, find out who is the "most
knowledgeable" person. The most knowledgeable person is the one who knows Classification best
(in case of ties, consider whether they know C, then CPP, then CS, then Java, then SAS). Retrieve
that person's Program information.
Related documents
Download