Practical 2

advertisement
Module I3 Sessions 6&7
Practical 2: Processing single variables 2
This practical sheet uses both Excel and CAST.
More summary statistics with Excel

Go to the same Excel workbook as the last practical, Summary statistics.xls.
Open the sheet called Rice yields.
Column B contains the data for the rice yields, from the workbook called survey.xls
arranged in ascending order. Summary statistics are given below the data. They should be
familiar by now.
To reinforce the calculations of the mean deviation and the standard deviation, columns D,
E and F have been started.

Double click or drag the 2nd value (-16.28) in column D to complete that column.
Do the same for columns E and F.

Are the statistics calculated from first principles, the same as using the Excel
functions? Explain briefly any differences:
The quartile deviation is a third measure of variation. Your next tasks are to calculate its
value, as well as the median, from first principles.
The cells I22 and I23gives the observation numbers of the lower quartile and the median,
using the general formula:
Observation = r * (n+1)/100, where r = 25 for the lower quartile, and 50 for
the median.

Use the same formula to give the observation number of the upper quartile in cell
I24.
The formula and corresponding values for the lower quartile are then given in cells I10 and
I11.

Use the observation number for the upper quartile to give its value in cell I12.

Then give the inter-quartile range, IQR, (difference between the quartiles), in I13,

And the quartile deviation – half this range – in I14.
You now have 3 measures of spread, the quartile deviation, the mean deviation and the
standard deviation. Their respective sizes are in the following table.
SADC Course in Statistics
Module I3 Sessions 6/7 – Page 1
Module I3 Sessions 6&7

Complete the table:
Summary statistic
Value
Quartile deviation
Reason it is in this order
I expect it to be less than the standard deviation
because……………………
Mean deviation
9.6
I expect it to be less than the standard deviation
because……………………..
Standard deviation
11.9
I expect it to be greater than the quartile deviation
because of the 70-95-100 rule of thumb
Quartiles in Excel
First use excel to get the MIN and MAX (Hint: Use the arrow next to Σ or use excel’s
functions). Excel has a function, called quartile. It is used in cell J10 as an alternative way of
giving the lower quartile value. The function there is =QUARTILE(B5:B40,1), where 1
represents the lower quartile, 4 represents the upper. Using column J give the
corresponding values from Excel for the quartiles and hence for the IQR and the quartile
deviation (For further explanation refer to fx>QUARTILES>Help on this function).
Compare the values from Excel and from first principles for the quartiles. Does the
difference concern you? Explain why, or why not. (Hint: See CAST page 2.2.2.)
Warnings about means and standard deviations

Read CAST page 3.3.2 that shows possible problems using the mean and
standard deviation in an uncritical way. Hence complete the following table:
Data set
Problems with mean and s.d.
What to do
Symmetrical
None
Use them
Clusters
Outlier
Skew
SADC Course in Statistics
Module I3 Sessions 6/7 – Page 2
Module I3 Sessions 6&7
Read CAST page 3.3.3 on the possible problems using the mean, and particularly the s.d.
when there are outliers. Hence complete the following table:
Data
Mean
Mean (day in year)
Standard deviation
As given
126.2
5th May
18.7 days (about 2.5 weeks)
Low outlier (50)
124.8
3rd May
21.1 days (about 3 weeks)
No planting rain (0)
High outlier (220)
High outlier (365)
Missing value (999)
What conclusion is written in CAST following this exercise?
More practice with outliers
Go back to the Excel workbook Summary statistics.xls. Open the sheet called Samaru.
The data in column B are the same as you examined in CAST on page 3.3.3.

Try inserting the value from CAST (0, 50, 220, 365, 999) in turn in cell B61. Check
that the mean and sd in cells B64 and B65 give the same results as CAST.

Now extend the mean and sd calculations along that row for all the other columns.

Then try the odd values from the table below (200, 0, 365, 999) to the end of the
season (Column I) and complete the table below.
Data
Mean
Mean (day in year)
Standard deviation
As given
292.9
18 Oct
8.9 days (just over 1 week)
Low outlier (200)
291.3
16 Oct
15.1 days (just over 2 weeks)
No ending date (0)
High outlier (365)
Missing value (999)
SADC Course in Statistics
Module I3 Sessions 6/7 – Page 3
Module I3 Sessions 6&7
More on the coefficient of variation

Take away the extra observations to use the actual data.

Give the coefficient of variation in cell B66 for the start of the rains.
What is its value? _____________

What is the value of the cv for the end of the rains? ___________
Can you explain why the cv is not a useful summary for this sort of data? No/Maybe/Yes
If “Maybe” or “Yes”, then please try to explain.
Here is an exercise to help. The end of the season (Column called End) was the first day
after 1st September that there was no water in the soil. We started counting days from the
1st January. But for the end of the rains we could equally well define 1 September (day 245)
as day 1.

Create a new column, J that is the same as the End, but subtracting 244 from each
value.

Work out the mean and standard deviation of these new values. How do they
compare? Complete the table below.
Summary statistic
Original data (Column
I)
New data Column (J)
Mean
Day 293 (18 October)
Day ….. (
Standard deviation
8.9 days
Comment
)
Same date
cv
Now are you able to complete the comment above? If not, then try the exercise again,
making 1st October (day 275) into the new day 1.
SADC Course in Statistics
Module I3 Sessions 6/7 – Page 4
Module I3 Sessions 6&7
Using SSC-Stat
Make the active cell somewhere in the data set. Then use SSCstat=> Analysis =>
Descriptive statistics. Complete the dialogue as shown below, with Start1 and End as the
selected Variables, to process the 2 columns called Start1 and End, and give all the
summary statistics used in this practical.
This shows it is easy to give any summary statistics you would like. What is more
important is to provide summary values that are appropriate and that you and the readers
can interpret.
From the work in CAST and in this practical, you should now be able to interpret the mean
and standard deviation and know when it is an appropriate summary. Similarly you should
be able to interpret the 5-number summary. If (as here), the coefficient of variation can
not usefully be interpreted, it should not be given.
SADC Course in Statistics
Module I3 Sessions 6/7 – Page 5
Download