Statistics Assessed Coursework 1

advertisement
ECM2709 Statistics – Assessed Coursework 1
(1)
Observed Reading
Observed Exeter
Minimum
19.00
14.20
1st Quartile
20.90
18.85
Median
21.80
19.90
Mean
22.03
19.93
3rd Quartile
22.80
21.10
Maximum
27.20
25.00
IQR
1.90
2.25
Overall, Reading has higher temperatures than Exeter. Exeter has a slightly bigger range, with
outliers at both the minimum and maximum values. Reading has far more outliers on the upper side,
which suggests that it had more very hot days than Exeter. Exeter has a larger interquartile range,
which shows that the temperatures are spread more evenly around the median, compared to in
Reading where they are clustered more densely. Reading’s median, and all of it’s significant points
(25th & 75th quartiles) are higher than Exeter’s. Looking at the histograms, it could be suggested that
Exeter has a positive skew to it’s data, whereas Reading has more of a symmetrical distribution, with
little skewness.
(2)
These scatter plots compare the forecasted temperatures and the observed temperatures, in both
Reading and Exeter. Reading has a positive best-line-of-fit, but then it settles off as the temperatures
pass the mean. Whereas Exeter has a positive best-line-of-fit throughout the spread of the data. This
indicates that the forecasters were more accurate in the Exeter area, than the Reading area.
Correlation coefficients have been calculated as:
Reading = 0.3844058
Exeter = 0.7783475
Because both the correlation coefficients are >0 then this shows that both data sets have a positive
correlation, with Exeter’s positive correlation being strong, and Reading’s being weak.
(3a) Mean = µ = 22.03043
Variance = σ2 = 3.013129
Standard Deviation = σ = 1.735837
(3b)
This is a histogram that has been made, with the normal distribution lines put on top of it. The red
line represents the normal distribution line of the data, and the black dashed line represents the
theoretical normal distribution line. The fit is reasonably good, but by definition, normal
distributions are symmetrically distributed, where we know that the observed temperatures in
Reading are not.
(3c) The sample 5% quantile = 19.800
The sample 95% quantile = 25.625
Using the ‘pnorm’ function, the probability of the function falling below the sample 5% quantile is
0.09940828.
Using the ‘pnorm’ function, the probability of the function falling above the 95% quantile is 1 0.9808112 = 0.0191888
These values are very different, but being based on a Normal Distribution (which is symmetrical),
they should be the same or very close. This therefore shows that the Normal Distribution isn’t
particularly great for mapping the data. From what I have calculated, it is more likely that a single
data piece will fall in the first 5% compared to in the last 5%, by about 5-fold.
(4a) Mean = µ = 19.93478
Variance = σ2 = 3.89548
Standard Deviation = σ = 1.973697
(4b)
As on the previous graph, this is a histogram with a normal distribution line of the data (the red line)
and the theoretical normal distribution line (the black dashed line) plotted on top. It is noted that
both lines are very similar, and seem to sit on top of one another, which only slight variation. It looks
as if the central location of the data line is slightly higher than the theoretical line, with identical
shape. This proves that the Exeter data is very well mapped with the normal distribution, which is
mainly because the data is symmetrical and has little skewness.
(4c) The sample 5% quantile = 16.665
The sample 95% quantile = 23.090
Using the ‘pnorm’ function, the probability of the function falling below the sample 5% quantile is
0.04879232
Using the ‘pnorm’ function, the probability of the function falling above the 95% quantile is 1 0.9450491 = 0.0549509
The ‘pnorm’ values are very similar in the case of Exeter. This shows that the Normal distribution fits
quite well, because (as stated before), the Normal Distribution is symmetrical, so the probabilities of
falling in the first 5% or the last 5% should be the same of very close. In this case, the difference
between them is only 0.00615858 which is negligible.
Download