Document 10716485

advertisement
4. (1.6)
(a) Plot the marginal dot diagrams for all the variables.
●
●
2.5
8
●
●
●
2.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5
6
7
8
●
●
9
●
1.5
count
●
1.0
4
2
count
6
●
●
10
●● ●
● ●
40
3
4
5
count
6
●
●
●
●
●
●
●
●
●
●
●
6
4
●
●
●
●
●
2
10
5
count
10 14
2
8
●
●
●
●
●
●
15
●
7
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1
2
3
4
5
● ● ●
●
5
● ●
● ●
●
● ●
●
● ●
● ● ● ●
10
●
15
●
●
20
20
15
10
count
5
2.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2.5
● ● ● ● ●
●
● ● ● ● ● ● ● ● ●
● ●
● ● ● ●● ● ●● ● ●●
●●
10
15
● ●
●
● ● ●
20
25
O3
●
●
●
●
●
●
●
●
●
3.0
●
●
5
NO2
●
●
●
●
●
●
●
6
●
●
● ● ●
5
● ●
●
●
2
● ● ●
count
●
1
6
5
2
count
1
●
● ●
NO
●
● ●
● ● ●
●
100
●
CO
● ●
80
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
solar radiation
●
●
●
●● ●●●●●● ●●●●●●●●●
●●●●●
60
wind
●
●●
3.5
●
4.0
4.5
5.0
HS
Figure 1: Dot plots of the variables in the air pollution dataset.
(b) Construct the x̄, Sn, and R arrays, and interpret the entries in R.
Wind
7.50
x̄
Solar
73.86
CO
4.55
NO
2.19
NO2
10.05
O3
9.40
HC
3.10
Table 1: The sample means of variables.
Wind
2.50
-2.78
-0.38
-0.46
-0.59
-2.23
0.17
Wind
Solar
CO
NO
NO2
O3
HC
Solar
-2.78
300.52
3.91
-1.39
6.76
30.79
0.62
CO
-0.38
3.91
1.52
0.67
2.31
2.82
0.14
NO
-0.46
-1.39
0.67
1.18
1.09
-0.81
0.18
NO2
-0.59
6.76
2.31
1.09
11.36
3.13
1.04
O3
-2.23
30.79
2.82
-0.81
3.13
30.98
0.59
HC
0.17
0.62
0.14
0.18
1.04
0.59
0.48
Table 2: The sample variance-covariance matrix of the variables.
Wind
Solar
CO
NO
NO2
O3
HC
Wind
1.00
-0.10
-0.19
-0.27
-0.11
-0.25
0.16
Solar
-0.10
1.00
0.18
-0.07
0.12
0.32
0.05
CO
-0.19
0.18
1.00
0.50
0.56
0.41
0.17
NO
-0.27
-0.07
0.50
1.00
0.30
-0.13
0.23
NO2
-0.11
0.12
0.56
0.30
1.00
0.17
0.45
O3
-0.25
0.32
0.41
-0.13
0.17
1.00
0.15
HC
0.16
0.05
0.17
0.23
0.45
0.15
1.00
Table 3: The sample correlation matrix of the variables.
The majority of the variables have only weak linear associations, with correlations close to zero. The pollutants are mostly positively correlated with
each other. Wind is negatively correlated with pollutants, while solar radiation is
positively correlated with pollutants.
Windy and Sunny
Windy and Not Sunny
1
4
6
10
14
15
7
16
22
17
18
19
23
26
28
20
21
30
31
35
42
36
Not Windy and Sunny
2
3
Not Windy and Not Sunny
5
9
12
25
27
37
38
39
40
8
11
13
24
29
32
33
34
41
Figure 2: Star plots of the air pollution variables.
To investigate if there is an effect on air pollution with wind and the sun, we
can divide the wind and solar radiation variable in half by the median and then
make the star plots in Figure 2. From the stars, we can see that solar radiation
have some effects on the air pollution. When it is sunny, most pollutants are
at a relatively low level. W ithin each group, the patterns are quite different. So
there is a fair amount of variation in each of the four groups, as there were days
with very little pollution in each group, as well as days with quite a bit of air
pollution.
5. (1.17)
In this dataset the first three are measured in seconds, while the last four
are measured in minutes.
x̄
100m
11.62
200m
23.64
400m
53.41
800m
2.08
1500m
4.33
3000m
9.45
Marathon
173.25
Table 4: The sample means for the track record.
100m
200m
400m
800m
1500m
3000m
Marathon
100m
0.20
0.48
1.01
0.04
0.11
0.28
9.44
200m
0.48
1.23
2.55
0.09
0.26
0.65
23.18
400m
1.01
2.55
7.17
0.26
0.70
1.72
57.49
800m
0.04
0.09
0.26
0.01
0.03
0.08
2.57
1500m
0.11
0.26
0.70
0.03
0.11
0.27
8.88
3000m
0.28
0.65
1.72
0.08
0.27
0.68
22.57
Marathon
9.44
23.18
57.49
2.57
8.88
22.57
925.96
Table 5: The sample variance covariance matrix for the track record variables.
100m
200m
400m
800m
1500m
3000m
Marathon
100m
1.00
0.95
0.83
0.73
0.73
0.74
0.69
200m
0.95
1.00
0.86
0.72
0.70
0.71
0.69
400m
0.83
0.86
1.00
0.90
0.79
0.78
0.71
800m
0.73
0.72
0.90
1.00
0.90
0.86
0.78
1500m
0.73
0.70
0.79
0.90
1.00
0.97
0.88
3000m
0.74
0.71
0.78
0.86
0.97
1.00
0.90
Marathon
0.69
0.69
0.71
0.78
0.88
0.90
1.00
Table 6: The sample correlation matrix for the track record variables.
All the seven variables are strongly positively correlated, and the correlations tend to be larger when distances are close to each other. For example the
correlation between 100m and 200m is 0.95, while the correlation between 100m
and marathon is 0.69. This makes sense, since runners tend to be good at
races of similar length.
(1.18)
1
100m
8.62
200m
8.48
400m
7.51
800m
6.44
1500m
5.81
3000m
5.33
Marathon
4.15
Table 7: The sample mean track records measured in meters/second.
100m
200m
400m
800m
1500m
3000m
Marathon
100m
0.11
0.12
0.10
0.08
0.10
0.10
0.13
200m
0.12
0.15
0.13
0.09
0.11
0.12
0.16
400m
0.10
0.13
0.14
0.11
0.12
0.12
0.15
800m
0.08
0.09
0.11
0.11
0.12
0.12
0.15
1500m
0.10
0.11
0.12
0.12
0.16
0.16
0.20
3000m
0.10
0.12
0.12
0.12
0.16
0.17
0.21
Marathon
0.13
0.16
0.15
0.15
0.20
0.21
0.32
Table 8: The sample variance covariance matrix of track records measured in
meters/second.
100m
200m
400m
800m
1500m
3000m
Marathon
100m
1.00
0.95
0.84
0.73
0.74
0.75
0.72
200m
0.95
1.00
0.86
0.73
0.72
0.72
0.71
400m
0.84
0.86
1.00
0.90
0.80
0.78
0.71
800m
0.73
0.73
0.90
1.00
0.92
0.87
0.79
1500m
0.74
0.72
0.80
0.92
1.00
0.96
0.86
3000m
0.75
0.72
0.78
0.87
0.96
1.00
0.89
Marathon
0.72
0.71
0.71
0.79
0.86
0.89
1.00
Table 9: The sample correlation matrix of track records measured in me- ters/second.
The
positive
have a
because
results are similar to those wh i c h I obtained in Exercise 1.17. All have strong,
linear relationships with each other, and races that are closer together in distance
stronger relationship. The differences in correlation are slightly less, p o s s i b l y
all the races were measured in the same units now.
8 8 10 101
I chose to use a pairwise scatterplot to represent the data. Other choices are acceptable.
10.5
10.5
12.0
12.0
Download