Uploaded by ohmray09

Pandas Data Analysis: Course Evaluation

advertisement
In [1]: import pandas as pd
In [2]: data=pd.read_csv('data/course_eval.csv')
data.head()
Out[2]:
Instructor
Course
Semester
Year
Evaluation
0
Safadi
MIST4610
Fall
2019
5
1
Aguar
MIST6380
Spring
2018
4
2
Safadi
MIST5730
Summer
2018
3
3
Boudreau
MIST4610
Fall
2018
4
4
Safadi
MIST4610
Summer
2017
3
In [4]: for x in data:
print(x)
#break
Instructor
Course
Semester
Year
Evaluation
The data frame contains MIS instructor course evaluation for several courses over different
semesters/years.
NOTE This is a simulated data. The course evaluation numbers are not real!
1. How many courses did each instructor teach?
2. What is the average course evaluation per instructor?
3. Report the minimum, median, and maximum evaluation of Safadi
4. Report the average evaluation per semester and year.
5. Format the previous result in a data frame where rows are semesters and years are columns.
6. Transform evaluation by subtracting the average evaluation per course (de-mean evaluation
per course).
7. Filter the data to keep entries in which Evaluation is larger than the average evaluation
8. Filter the data to keep entries in which Evaluation is larger than the average evaluation per
Course
In [3]: test_1 = data.groupby('Instructor').Course.count()
test_1
Out[3]: Instructor
Aguar
22
Boudreau
17
Safadi
23
Srinivasan
16
Name: Course, dtype: int64
In [4]: test_2 = data.groupby('Instructor').Evaluation.mean()
test_2
Out[4]: Instructor
Aguar
2.909091
Boudreau
3.411765
Safadi
2.739130
Srinivasan
3.375000
Name: Evaluation, dtype: float64
In [7]: test_3 = data.groupby('Instructor').Evaluation.aggregate(['min', 'median', 'max']
# or data.loc[data.Instructor == 'Safadi', 'Evaluation'].aggregate(['min', 'media
test_3
Out[7]: min
1
median
3
max
5
Name: Safadi, dtype: int64
In [11]: test_4 = data.groupby(['Semester', 'Year']).Evaluation.mean()
test_4
Out[11]: Semester
Fall
Year
2017
2.666667
2018
2.200000
2019
3.333333
2020
3.500000
Spring
2017
2.833333
2018
3.250000
2019
3.000000
2020
3.571429
Summer
2017
2.166667
2018
3.625000
2019
2.500000
2020
3.000000
Name: Evaluation, dtype: float64
In [12]: test_5 = test_4.unstack()
test_5
Out[12]:
Year
2017
2018
2019
2020
Fall
2.666667
2.200
3.333333
3.500000
Spring
2.833333
3.250
3.000000
3.571429
Summer
2.166667
3.625
2.500000
3.000000
Semester
In [16]: data['Demeaned_evaluation'] = data.groupby('Course')['Evaluation'].transform(lamb
data.head()
Out[16]:
Instructor
Course
Semester
Year
Evaluation
Demeaned_evaluation
0
Safadi
MIST4610
Fall
2019
5
1.5000
1
Aguar
MIST6380
Spring
2018
4
1.2500
2
Safadi
MIST5730
Summer
2018
3
0.0625
3
Boudreau
MIST4610
Fall
2018
4
0.5000
4
Safadi
MIST4610
Summer
2017
3
-0.5000
In [19]: # overall mean evaluation
data.Evaluation.mean()
Out[19]: 3.0641025641025643
In [18]: test_7 = data[data.Evaluation > data.Evaluation.mean()]
test_7.head()
Out[18]:
Instructor
Course
Semester
Year
Evaluation
Demeaned_evaluation
0
Safadi
MIST4610
Fall
2019
5
1.500000
1
Aguar
MIST6380
Spring
2018
4
1.250000
3
Boudreau
MIST4610
Fall
2018
4
0.500000
5
Boudreau
MIST4600
Summer
2018
4
0.916667
6
Boudreau
MIST4610
Summer
2018
5
1.500000
In [22]: # evaluation by course
data.groupby('Course').Evaluation.mean()
Out[22]: Course
MIST4600
3.083333
MIST4610
3.500000
MIST5730
2.937500
MIST6380
2.750000
Name: Evaluation, dtype: float64
In [28]: better_than_average = data.groupby('Course').Evaluation.transform(lambda x: x > x
test_8 = data[better_than_average]
test_8.head()
Out[28]:
Instructor
Course
Semester
Year
Evaluation
Demeaned_evaluation
0
Safadi
MIST4610
Fall
2019
5
1.500000
1
Aguar
MIST6380
Spring
2018
4
1.250000
2
Safadi
MIST5730
Summer
2018
3
0.062500
3
Boudreau
MIST4610
Fall
2018
4
0.500000
5
Boudreau
MIST4600
Summer
2018
4
0.916667
In [29]: # or, given we already de-meaned the data
# select based on positive value of demeaned_evaluation
data[data.Demeaned_evaluation>0]
Out[29]:
Instructor
Course
Semester
Year
Evaluation
Demeaned_evaluation
0
Safadi
MIST4610
Fall
2019
5
1.500000
1
Aguar
MIST6380
Spring
2018
4
1.250000
2
Safadi
MIST5730
Summer
2018
3
0.062500
3
Boudreau
MIST4610
Fall
2018
4
0.500000
5
Boudreau
MIST4600
Summer
2018
4
0.916667
6
Boudreau
MIST4610
Summer
2018
5
1.500000
8
Safadi
MIST6380
Spring
2018
3
0.250000
11
Safadi
MIST6380
Spring
2019
3
0.250000
12
Safadi
MIST6380
Fall
2020
4
1.250000
15
Aguar
MIST5730
Spring
2018
3
0.062500
19
Boudreau
MIST4600
Fall
2017
4
0.916667
20
Srinivasan
MIST6380
Summer
2018
3
0.250000
21
Srinivasan
MIST4610
Spring
2020
5
1.500000
22
Srinivasan
MIST4600
Fall
2019
4
0.916667
23
Srinivasan
MIST4610
Spring
2017
4
0.500000
27
Srinivasan
MIST5730
Fall
2020
5
2.062500
29
Boudreau
MIST5730
Fall
2020
5
2.062500
31
Srinivasan
MIST4600
Fall
2020
5
1.916667
32
Aguar
MIST5730
Fall
2019
3
0.062500
34
Safadi
MIST6380
Spring
2017
4
1.250000
37
Boudreau
MIST4600
Spring
2019
4
0.916667
41
Aguar
MIST4610
Spring
2017
4
0.500000
42
Safadi
MIST4600
Spring
2020
4
0.916667
43
Boudreau
MIST6380
Fall
2019
5
2.250000
44
Aguar
MIST6380
Spring
2019
3
0.250000
45
Boudreau
MIST5730
Spring
2019
4
1.062500
46
Safadi
MIST4600
Summer
2019
4
0.916667
47
Boudreau
MIST5730
Spring
2020
5
2.062500
49
Boudreau
MIST6380
Fall
2017
3
0.250000
52
Safadi
MIST5730
Fall
2017
3
0.062500
53
Aguar
MIST4610
Spring
2019
5
1.500000
57
Aguar
MIST4600
Summer
2018
5
1.916667
60
Aguar
MIST5730
Summer
2018
5
2.062500
In [ ]:
Instructor
Course
Semester
Year
Evaluation
Demeaned_evaluation
61
Aguar
MIST4610
Summer
2020
5
1.500000
65
Safadi
MIST4610
Spring
2020
5
1.500000
66
Aguar
MIST4600
Summer
2017
4
0.916667
68
Srinivasan
MIST6380
Fall
2020
4
1.250000
70
Aguar
MIST4600
Spring
2018
5
1.916667
73
Safadi
MIST5730
Summer
2020
4
1.062500
75
Boudreau
MIST4610
Fall
2020
4
0.500000
77
Srinivasan
MIST6380
Spring
2018
4
1.250000
Download