In [1]: import pandas as pd In [2]: data=pd.read_csv('data/course_eval.csv') data.head() Out[2]: Instructor Course Semester Year Evaluation 0 Safadi MIST4610 Fall 2019 5 1 Aguar MIST6380 Spring 2018 4 2 Safadi MIST5730 Summer 2018 3 3 Boudreau MIST4610 Fall 2018 4 4 Safadi MIST4610 Summer 2017 3 In [4]: for x in data: print(x) #break Instructor Course Semester Year Evaluation The data frame contains MIS instructor course evaluation for several courses over different semesters/years. NOTE This is a simulated data. The course evaluation numbers are not real! 1. How many courses did each instructor teach? 2. What is the average course evaluation per instructor? 3. Report the minimum, median, and maximum evaluation of Safadi 4. Report the average evaluation per semester and year. 5. Format the previous result in a data frame where rows are semesters and years are columns. 6. Transform evaluation by subtracting the average evaluation per course (de-mean evaluation per course). 7. Filter the data to keep entries in which Evaluation is larger than the average evaluation 8. Filter the data to keep entries in which Evaluation is larger than the average evaluation per Course In [3]: test_1 = data.groupby('Instructor').Course.count() test_1 Out[3]: Instructor Aguar 22 Boudreau 17 Safadi 23 Srinivasan 16 Name: Course, dtype: int64 In [4]: test_2 = data.groupby('Instructor').Evaluation.mean() test_2 Out[4]: Instructor Aguar 2.909091 Boudreau 3.411765 Safadi 2.739130 Srinivasan 3.375000 Name: Evaluation, dtype: float64 In [7]: test_3 = data.groupby('Instructor').Evaluation.aggregate(['min', 'median', 'max'] # or data.loc[data.Instructor == 'Safadi', 'Evaluation'].aggregate(['min', 'media test_3 Out[7]: min 1 median 3 max 5 Name: Safadi, dtype: int64 In [11]: test_4 = data.groupby(['Semester', 'Year']).Evaluation.mean() test_4 Out[11]: Semester Fall Year 2017 2.666667 2018 2.200000 2019 3.333333 2020 3.500000 Spring 2017 2.833333 2018 3.250000 2019 3.000000 2020 3.571429 Summer 2017 2.166667 2018 3.625000 2019 2.500000 2020 3.000000 Name: Evaluation, dtype: float64 In [12]: test_5 = test_4.unstack() test_5 Out[12]: Year 2017 2018 2019 2020 Fall 2.666667 2.200 3.333333 3.500000 Spring 2.833333 3.250 3.000000 3.571429 Summer 2.166667 3.625 2.500000 3.000000 Semester In [16]: data['Demeaned_evaluation'] = data.groupby('Course')['Evaluation'].transform(lamb data.head() Out[16]: Instructor Course Semester Year Evaluation Demeaned_evaluation 0 Safadi MIST4610 Fall 2019 5 1.5000 1 Aguar MIST6380 Spring 2018 4 1.2500 2 Safadi MIST5730 Summer 2018 3 0.0625 3 Boudreau MIST4610 Fall 2018 4 0.5000 4 Safadi MIST4610 Summer 2017 3 -0.5000 In [19]: # overall mean evaluation data.Evaluation.mean() Out[19]: 3.0641025641025643 In [18]: test_7 = data[data.Evaluation > data.Evaluation.mean()] test_7.head() Out[18]: Instructor Course Semester Year Evaluation Demeaned_evaluation 0 Safadi MIST4610 Fall 2019 5 1.500000 1 Aguar MIST6380 Spring 2018 4 1.250000 3 Boudreau MIST4610 Fall 2018 4 0.500000 5 Boudreau MIST4600 Summer 2018 4 0.916667 6 Boudreau MIST4610 Summer 2018 5 1.500000 In [22]: # evaluation by course data.groupby('Course').Evaluation.mean() Out[22]: Course MIST4600 3.083333 MIST4610 3.500000 MIST5730 2.937500 MIST6380 2.750000 Name: Evaluation, dtype: float64 In [28]: better_than_average = data.groupby('Course').Evaluation.transform(lambda x: x > x test_8 = data[better_than_average] test_8.head() Out[28]: Instructor Course Semester Year Evaluation Demeaned_evaluation 0 Safadi MIST4610 Fall 2019 5 1.500000 1 Aguar MIST6380 Spring 2018 4 1.250000 2 Safadi MIST5730 Summer 2018 3 0.062500 3 Boudreau MIST4610 Fall 2018 4 0.500000 5 Boudreau MIST4600 Summer 2018 4 0.916667 In [29]: # or, given we already de-meaned the data # select based on positive value of demeaned_evaluation data[data.Demeaned_evaluation>0] Out[29]: Instructor Course Semester Year Evaluation Demeaned_evaluation 0 Safadi MIST4610 Fall 2019 5 1.500000 1 Aguar MIST6380 Spring 2018 4 1.250000 2 Safadi MIST5730 Summer 2018 3 0.062500 3 Boudreau MIST4610 Fall 2018 4 0.500000 5 Boudreau MIST4600 Summer 2018 4 0.916667 6 Boudreau MIST4610 Summer 2018 5 1.500000 8 Safadi MIST6380 Spring 2018 3 0.250000 11 Safadi MIST6380 Spring 2019 3 0.250000 12 Safadi MIST6380 Fall 2020 4 1.250000 15 Aguar MIST5730 Spring 2018 3 0.062500 19 Boudreau MIST4600 Fall 2017 4 0.916667 20 Srinivasan MIST6380 Summer 2018 3 0.250000 21 Srinivasan MIST4610 Spring 2020 5 1.500000 22 Srinivasan MIST4600 Fall 2019 4 0.916667 23 Srinivasan MIST4610 Spring 2017 4 0.500000 27 Srinivasan MIST5730 Fall 2020 5 2.062500 29 Boudreau MIST5730 Fall 2020 5 2.062500 31 Srinivasan MIST4600 Fall 2020 5 1.916667 32 Aguar MIST5730 Fall 2019 3 0.062500 34 Safadi MIST6380 Spring 2017 4 1.250000 37 Boudreau MIST4600 Spring 2019 4 0.916667 41 Aguar MIST4610 Spring 2017 4 0.500000 42 Safadi MIST4600 Spring 2020 4 0.916667 43 Boudreau MIST6380 Fall 2019 5 2.250000 44 Aguar MIST6380 Spring 2019 3 0.250000 45 Boudreau MIST5730 Spring 2019 4 1.062500 46 Safadi MIST4600 Summer 2019 4 0.916667 47 Boudreau MIST5730 Spring 2020 5 2.062500 49 Boudreau MIST6380 Fall 2017 3 0.250000 52 Safadi MIST5730 Fall 2017 3 0.062500 53 Aguar MIST4610 Spring 2019 5 1.500000 57 Aguar MIST4600 Summer 2018 5 1.916667 60 Aguar MIST5730 Summer 2018 5 2.062500 In [ ]: Instructor Course Semester Year Evaluation Demeaned_evaluation 61 Aguar MIST4610 Summer 2020 5 1.500000 65 Safadi MIST4610 Spring 2020 5 1.500000 66 Aguar MIST4600 Summer 2017 4 0.916667 68 Srinivasan MIST6380 Fall 2020 4 1.250000 70 Aguar MIST4600 Spring 2018 5 1.916667 73 Safadi MIST5730 Summer 2020 4 1.062500 75 Boudreau MIST4610 Fall 2020 4 0.500000 77 Srinivasan MIST6380 Spring 2018 4 1.250000