ECM2709 Statistics – Assessed Coursework 1 (1) Observed Reading Observed Exeter Minimum 19.00 14.20 1st Quartile 20.90 18.85 Median 21.80 19.90 Mean 22.03 19.93 3rd Quartile 22.80 21.10 Maximum 27.20 25.00 IQR 1.90 2.25 Overall, Reading has higher temperatures than Exeter. Exeter has a slightly bigger range, with outliers at both the minimum and maximum values. Reading has far more outliers on the upper side, which suggests that it had more very hot days than Exeter. Exeter has a larger interquartile range, which shows that the temperatures are spread more evenly around the median, compared to in Reading where they are clustered more densely. Reading’s median, and all of it’s significant points (25th & 75th quartiles) are higher than Exeter’s. Looking at the histograms, it could be suggested that Exeter has a positive skew to it’s data, whereas Reading has more of a symmetrical distribution, with little skewness. (2) These scatter plots compare the forecasted temperatures and the observed temperatures, in both Reading and Exeter. Reading has a positive best-line-of-fit, but then it settles off as the temperatures pass the mean. Whereas Exeter has a positive best-line-of-fit throughout the spread of the data. This indicates that the forecasters were more accurate in the Exeter area, than the Reading area. Correlation coefficients have been calculated as: Reading = 0.3844058 Exeter = 0.7783475 Because both the correlation coefficients are >0 then this shows that both data sets have a positive correlation, with Exeter’s positive correlation being strong, and Reading’s being weak. (3a) Mean = µ = 22.03043 Variance = σ2 = 3.013129 Standard Deviation = σ = 1.735837 (3b) This is a histogram that has been made, with the normal distribution lines put on top of it. The red line represents the normal distribution line of the data, and the black dashed line represents the theoretical normal distribution line. The fit is reasonably good, but by definition, normal distributions are symmetrically distributed, where we know that the observed temperatures in Reading are not. (3c) The sample 5% quantile = 19.800 The sample 95% quantile = 25.625 Using the ‘pnorm’ function, the probability of the function falling below the sample 5% quantile is 0.09940828. Using the ‘pnorm’ function, the probability of the function falling above the 95% quantile is 1 0.9808112 = 0.0191888 These values are very different, but being based on a Normal Distribution (which is symmetrical), they should be the same or very close. This therefore shows that the Normal Distribution isn’t particularly great for mapping the data. From what I have calculated, it is more likely that a single data piece will fall in the first 5% compared to in the last 5%, by about 5-fold. (4a) Mean = µ = 19.93478 Variance = σ2 = 3.89548 Standard Deviation = σ = 1.973697 (4b) As on the previous graph, this is a histogram with a normal distribution line of the data (the red line) and the theoretical normal distribution line (the black dashed line) plotted on top. It is noted that both lines are very similar, and seem to sit on top of one another, which only slight variation. It looks as if the central location of the data line is slightly higher than the theoretical line, with identical shape. This proves that the Exeter data is very well mapped with the normal distribution, which is mainly because the data is symmetrical and has little skewness. (4c) The sample 5% quantile = 16.665 The sample 95% quantile = 23.090 Using the ‘pnorm’ function, the probability of the function falling below the sample 5% quantile is 0.04879232 Using the ‘pnorm’ function, the probability of the function falling above the 95% quantile is 1 0.9450491 = 0.0549509 The ‘pnorm’ values are very similar in the case of Exeter. This shows that the Normal distribution fits quite well, because (as stated before), the Normal Distribution is symmetrical, so the probabilities of falling in the first 5% or the last 5% should be the same of very close. In this case, the difference between them is only 0.00615858 which is negligible.