4. (1.6) (a) Plot the marginal dot diagrams for all the variables. ● ● 2.5 8 ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 6 7 8 ● ● 9 ● 1.5 count ● 1.0 4 2 count 6 ● ● 10 ●● ● ● ● 40 3 4 5 count 6 ● ● ● ● ● ● ● ● ● ● ● 6 4 ● ● ● ● ● 2 10 5 count 10 14 2 8 ● ● ● ● ● ● 15 ● 7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● 15 ● ● 20 20 15 10 count 5 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● 10 15 ● ● ● ● ● ● 20 25 O3 ● ● ● ● ● ● ● ● ● 3.0 ● ● 5 NO2 ● ● ● ● ● ● ● 6 ● ● ● ● ● 5 ● ● ● ● 2 ● ● ● count ● 1 6 5 2 count 1 ● ● ● NO ● ● ● ● ● ● ● 100 ● CO ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● solar radiation ● ● ● ●● ●●●●●● ●●●●●●●●● ●●●●● 60 wind ● ●● 3.5 ● 4.0 4.5 5.0 HS Figure 1: Dot plots of the variables in the air pollution dataset. (b) Construct the x̄, Sn, and R arrays, and interpret the entries in R. Wind 7.50 x̄ Solar 73.86 CO 4.55 NO 2.19 NO2 10.05 O3 9.40 HC 3.10 Table 1: The sample means of variables. Wind 2.50 -2.78 -0.38 -0.46 -0.59 -2.23 0.17 Wind Solar CO NO NO2 O3 HC Solar -2.78 300.52 3.91 -1.39 6.76 30.79 0.62 CO -0.38 3.91 1.52 0.67 2.31 2.82 0.14 NO -0.46 -1.39 0.67 1.18 1.09 -0.81 0.18 NO2 -0.59 6.76 2.31 1.09 11.36 3.13 1.04 O3 -2.23 30.79 2.82 -0.81 3.13 30.98 0.59 HC 0.17 0.62 0.14 0.18 1.04 0.59 0.48 Table 2: The sample variance-covariance matrix of the variables. Wind Solar CO NO NO2 O3 HC Wind 1.00 -0.10 -0.19 -0.27 -0.11 -0.25 0.16 Solar -0.10 1.00 0.18 -0.07 0.12 0.32 0.05 CO -0.19 0.18 1.00 0.50 0.56 0.41 0.17 NO -0.27 -0.07 0.50 1.00 0.30 -0.13 0.23 NO2 -0.11 0.12 0.56 0.30 1.00 0.17 0.45 O3 -0.25 0.32 0.41 -0.13 0.17 1.00 0.15 HC 0.16 0.05 0.17 0.23 0.45 0.15 1.00 Table 3: The sample correlation matrix of the variables. The majority of the variables have only weak linear associations, with correlations close to zero. The pollutants are mostly positively correlated with each other. Wind is negatively correlated with pollutants, while solar radiation is positively correlated with pollutants. Windy and Sunny Windy and Not Sunny 1 4 6 10 14 15 7 16 22 17 18 19 23 26 28 20 21 30 31 35 42 36 Not Windy and Sunny 2 3 Not Windy and Not Sunny 5 9 12 25 27 37 38 39 40 8 11 13 24 29 32 33 34 41 Figure 2: Star plots of the air pollution variables. To investigate if there is an effect on air pollution with wind and the sun, we can divide the wind and solar radiation variable in half by the median and then make the star plots in Figure 2. From the stars, we can see that solar radiation have some effects on the air pollution. When it is sunny, most pollutants are at a relatively low level. W ithin each group, the patterns are quite different. So there is a fair amount of variation in each of the four groups, as there were days with very little pollution in each group, as well as days with quite a bit of air pollution. 5. (1.17) In this dataset the first three are measured in seconds, while the last four are measured in minutes. x̄ 100m 11.62 200m 23.64 400m 53.41 800m 2.08 1500m 4.33 3000m 9.45 Marathon 173.25 Table 4: The sample means for the track record. 100m 200m 400m 800m 1500m 3000m Marathon 100m 0.20 0.48 1.01 0.04 0.11 0.28 9.44 200m 0.48 1.23 2.55 0.09 0.26 0.65 23.18 400m 1.01 2.55 7.17 0.26 0.70 1.72 57.49 800m 0.04 0.09 0.26 0.01 0.03 0.08 2.57 1500m 0.11 0.26 0.70 0.03 0.11 0.27 8.88 3000m 0.28 0.65 1.72 0.08 0.27 0.68 22.57 Marathon 9.44 23.18 57.49 2.57 8.88 22.57 925.96 Table 5: The sample variance covariance matrix for the track record variables. 100m 200m 400m 800m 1500m 3000m Marathon 100m 1.00 0.95 0.83 0.73 0.73 0.74 0.69 200m 0.95 1.00 0.86 0.72 0.70 0.71 0.69 400m 0.83 0.86 1.00 0.90 0.79 0.78 0.71 800m 0.73 0.72 0.90 1.00 0.90 0.86 0.78 1500m 0.73 0.70 0.79 0.90 1.00 0.97 0.88 3000m 0.74 0.71 0.78 0.86 0.97 1.00 0.90 Marathon 0.69 0.69 0.71 0.78 0.88 0.90 1.00 Table 6: The sample correlation matrix for the track record variables. All the seven variables are strongly positively correlated, and the correlations tend to be larger when distances are close to each other. For example the correlation between 100m and 200m is 0.95, while the correlation between 100m and marathon is 0.69. This makes sense, since runners tend to be good at races of similar length. (1.18) 1 100m 8.62 200m 8.48 400m 7.51 800m 6.44 1500m 5.81 3000m 5.33 Marathon 4.15 Table 7: The sample mean track records measured in meters/second. 100m 200m 400m 800m 1500m 3000m Marathon 100m 0.11 0.12 0.10 0.08 0.10 0.10 0.13 200m 0.12 0.15 0.13 0.09 0.11 0.12 0.16 400m 0.10 0.13 0.14 0.11 0.12 0.12 0.15 800m 0.08 0.09 0.11 0.11 0.12 0.12 0.15 1500m 0.10 0.11 0.12 0.12 0.16 0.16 0.20 3000m 0.10 0.12 0.12 0.12 0.16 0.17 0.21 Marathon 0.13 0.16 0.15 0.15 0.20 0.21 0.32 Table 8: The sample variance covariance matrix of track records measured in meters/second. 100m 200m 400m 800m 1500m 3000m Marathon 100m 1.00 0.95 0.84 0.73 0.74 0.75 0.72 200m 0.95 1.00 0.86 0.73 0.72 0.72 0.71 400m 0.84 0.86 1.00 0.90 0.80 0.78 0.71 800m 0.73 0.73 0.90 1.00 0.92 0.87 0.79 1500m 0.74 0.72 0.80 0.92 1.00 0.96 0.86 3000m 0.75 0.72 0.78 0.87 0.96 1.00 0.89 Marathon 0.72 0.71 0.71 0.79 0.86 0.89 1.00 Table 9: The sample correlation matrix of track records measured in me- ters/second. The positive have a because results are similar to those wh i c h I obtained in Exercise 1.17. All have strong, linear relationships with each other, and races that are closer together in distance stronger relationship. The differences in correlation are slightly less, p o s s i b l y all the races were measured in the same units now. 8 8 10 101 I chose to use a pairwise scatterplot to represent the data. Other choices are acceptable. 10.5 10.5 12.0 12.0