Activity 5

advertisement
ISP 121, Winter 2007, Activity 5
ISP 121, Winter, 2007
Section 201 (TTh, 10:10 – 11:40)
Section 202 (TTh, 11:50 – 1:20)
Activity 5: Descriptive Statistics
Answer the questions given here in a Word document. E-mail the document to
me at the end of class.
1. From the QRC data site, download and open the file AgeAtInauguration.xls,
which lists the age at inauguration of every US President to date. We want to
summarize this data in a number of ways. Along the way I will remind you of the
Excel commands to produce the relevant summaries. Remember that you really
don't need to memorize the names of Excel functions. They can always be
accessed through the paste function button.
a. Who was the oldest president at inauguration? Who was the youngest? (Hey!
I always thought John F. Kennedy was the youngest president. Can you figure
out what's going on here?)
b. Calculate the mean (average) age at inauguration. (To calculate the average
of a data series, use the command =AVERAGE(......).)
c. Calculate the median age at inauguration. (To calculate the average of a data
series, use the command =MEDIAN(......).)
d. How does our current president, George W. Bush, compare to the average?
e. How did the previous president, Bill Clinton, compare to the average?
f. Just as we did in d. and e., it is often interesting to know where a particular
data point in a dataset lies compared to the others. (You might be too well
acquainted with this phenomenon in standardized tests.) A useful tool for
calculating the position of a datapoint in a data is the percent rank. This number
tells you approximately how many percent of the data is less than the datapoint.
The syntax for this command is =PERCENTRANK(dataseries, datavalue). Using
this command, find the percentage of presidents whose age at inauguration was
younger than Bill Clinton's.
g. Find the percentage of presidents who were older than George H. W. Bush
(that is the elder one) when inaugurated.
h. What is the percent rank of the median?
Printed 3/9/2016
ISP 121, Winter 2007, Activity 5
i. Recently we had two of the older presidents (Ronald Reagan, the oldest in
history, and George H. W. Bush) but we have also had two of the youngest (John
F. Kennedy and Bill Clinton). Using this data, investigate the question whether
presidents inaugurated since 1950 are on average older or younger than the
presidents inaugurated before 1950. Briefly explain your methodology.
2.
a. Can it happen in a dataset that almost every data point is above the average?
Explain why or why not. If it can, make up an example.
b. Can it happen in a dataset that almost every data point is above the median?
Explain why or why not. If it can, make up an example.
3. From the QRC site, download and open the file ChicagoBulls1996-97.xls
which contains the salaries of the Chicago Bulls players at the start of the 199697 season.
a. Calculate the mean and median salary and include it in your Word document.
b. Suppose Michael Jordan had been paid 60 million dollars instead of 30
million. What would the mean have been in that situation? What would the
median have been in that situation? (If you used Excel to do a, all you have to
do is type in 60000000 in place of 30140000 and everything will update
automatically.)
c. Suppose Michael Jordan had been paid 500 million dollars instead of 30
million. What would the mean have been in that situation? What would the
median have been in that situation?
Because of the property demonstrated in b and c, the median is called a
resistant measure because it is not so sensitive to extreme outliers. Generally,
the median is a more realistic measure of the center of a dataset, but it is not
always the most useful. If the distribution of the data is relatively symmetric, then
the mean and the median will be close to each other.
Printed 3/9/2016
ISP 121, Winter 2007, Activity 5
Copied from http://www.nps.gov/archive/yell/tours/livecams/oldfaithful/oldfaithfulphoto.htm.
4. Download and open the file OldFaithful.xls which contains data on the Old
Faithful geyser in Yellowstone National Park and pictured above. When this data
was collected, the geyser erupted about every hour with some consistency,
hence its name. (It is now erupting about every 1.5 hours.) The file contains
data on the length of the eruption and the interval between eruptions.
a. What is the mean interval between eruptions?
b. Give the five number summary (min, first quartile, median, third quartile, max)
for the interval between eruptions.
c. Make a frequency distribution of the interval data by using Excel's histogram
tool. Here is how. Go to Tools->Data Analysis->Histogram and click ok. (If Data
Analysis does not appear on this menu, go to Tools  Add-Ins, select Analysis
ToolPak and click OK. Now if you go back to Tools, Data Analysis should be
showing.) (If this doesn’t work on your machine, then use the Chart Wizard to
create a column or bar chart.) You will get a window that looks like:
Printed 3/9/2016
ISP 121, Winter 2007, Activity 5
Fill it in so that it looks like
Then click OK. Fix the graph up a bit (delete the legend, add a title), then paste it
into your Word document.
d. Describe the distribution of the intervals between eruptions.
e. Do the standard measures (means, medians, standard deviations, quartiles)
adequately describe this data?
f. What advice would you give visitors to Yellowstone National Park about Old
Faithful based on the data you looked at here?
Printed 3/9/2016
ISP 121, Winter 2007, Activity 5
5. The histogram you created in the previous step can be improved upon.
a. Go back into the Old Faithful spreadsheet and re-select Tools / Data Analysis /
Histogram and click OK. Enter the proper data range in the Input Range box.
For now, leave the Bin Range box blank. Make sure the Chart Output box is
clicked, and click OK. To make your chart really look like a histogram, you must
double click on one of the bars on the chart, go to the options tab, and set the
gap width to 0. While you are there, delete the “Frequency” legend.
b. To get control of the bins, you have to set them up in a column. In column D
on your original sheet, type 40 in cell D8, 45 in cell D9, 50 in cell D10, etc., up to
110. (There is an easy way of doing this if you use a formula.) Repeat the
process of creating the histogram: enter the proper data range, check Chart
Output box, and this time in the Bin Range box enter the cell range of your 40 –
110 values (should be D8 to D22). Make the histogram, set the gap width to 0,
and delete the legend. Paste this copy of the histogram into your Word
document.
Printed 3/9/2016
Download