Investigation #1: ES25 Student`s Water Consumption

advertisement
ES 25 Quantitative Thinking
Lab 4: Data Description: Summary Statistics and Histograms
Due: Tuesday, May 1st, before 12 noon (E-mail to your facilitator)
Note: “interpret or explain for this problem” means: write 1-4 complete, precise, sentences that would explain
the phenomena in question, at a high school reading level (using non-math lingo), and involving the particular
problem that you are working with (example: NOT “z-score =3 since standard deviation=2 and observation
=36, mean=30” but “my average daily water consumption was 36 gal/day, which was three standard deviations
above the average value of 30 gallons per day. For the ES 25 population, the water consumption observations
were an average distance of 2 gal/day away from the mean. NOTE: this is an example, the numbers are not
accurate).
The data files needed for the following exercises are found in Lab4.xls on the course website.
 INSTALL the data analysis toolpak (if you have it) as follows:
o Tools/Add-ins/Check “data analysis toolpak (VBA)”
 IF you missed lab, or if you need more help, you will also want to check out the
tutorials on making histograms in Excel available on the course website.
Investigation #1: ES25 Student’s Water Consumption
1. Summary Statistics
 Generate “summary statistics” for the list of student’s water consumption (measured
in gallons/day)
 Tools/Data Analysis/Descriptive Statistics/Input range= (select data)/Output
range (select destination cell on same worksheet)/Check “summary statistics”
 You should get the following chart (if you do NOT have the Data Analysis Toolpak
on your computer, you can get the values in the chart using the formulas given):
Summary Statistics
If you don't have Data AnalysisToolpak,Use:
Mean
76.93487 average(a1:a46)
Standard Error
7.030648 STDEV(A1:A46)/SQRT(COUNT(A1:A46))
Median
65.04 MEDIAN(A1:A46)
Mode
#N/A MODE(A1:A46)
Standard Deviation 47.68418 STDEV(A1:A46)
Sample Variance
2273.781 VAR(A1:A46)
Kurtosis
0.572544 KURT(A1:A46)
Skewness
1.100719 SKEW(A1:A46)
Range
192.2925 max(A1:A46)-min(a1:a46)
Minimum
24.7075 MIN(A1:A46)
Maximum
217 MAX(A1:A46)
Sum
3539.004 SUM(A1:A46)
Count
46 COUNT(A1:A46)
 Interpret what each of the values highlighted in yellow means for this problem
(using complete sentences). You may report your answer in a chart (for example,
replace the 3rd column in the chart above with your “interpretations”) if you like.
2. Frequency table and Histogram
 Calculate the “frequencies” for the water consumption data, using the following
bins: (0-40], (40-80], (80-120], (120, 160], (160, 200], (200-240]
 NOTE: for Excel to recognize the bins above, you must type them is as: 40,
80, 120, 160, 200, 240… it does not recognize parentheses as number values.
 Frequencies can be calculated using the Tools/Data Analysis/Histogram:
o input= CO2 data, bins= bins on left hand column below, output =
click on one cell above where you want the values to start)
 Calculate the “relative frequency” and the “density,” to generate the following table
(replace the one below with your completed table):
Bin Width = 40 Bin Range Frequency
40
(0-40]
14
80
(40-80]
16
Relative
Frequency
0.3043
0.3478
Density
0.0076
0.0087
Sum
 Use the “Chart” to make a “Frequency histogram.” Try to make it look like this:
Histogram: ES 25 Student Water Consumption
18
16
14
Frequency
12
10
8
6
4
2
0
(0-40]
(40-80]
(80-120]
(120-160]
(160-200]
Total Daily Water Consumption (gal/day)
(200-240]
 Use the “Chart” to make a Relative Frequency histogram. Paste below.
 Use the “Chart” to make a Density histogram. Paste below.
3. Investigate effects of bin width:
 Make Relative Frequency histograms for the bin widths below. Format the
histograms like the previous group, for easier comparison. Paste below (resize them
so that you can easily (and accurately) compare the graphs.
o Bin width = 25
o Bin width = 10
 Which bin width do you think is most appropriate for displaying the water
consumption data? Write a paragraph justifying your choice (think about what
patterns you can see that may be “masked” by larger or smaller bin choices)
4. Draw conclusions (answer is complete sentences):
 Write a paragraph description of the distribution (think about type, outliers,
clusters, maximums, minimums, see pages 26-27 workbook).
 What proportion of students used greater than or equal to120 gallons of water
per day? Explain how you got your answer (which histogram did you use,
did you have to add numbers up, did you compare areas, etc.)
 What percentage of students consume less water than the mean (you may
have to calculate this by hand, or assume that values are uniformly
distributed within bins)? What percentage of students consume less water
than the median? Would you say “most students consume less than
average?”
 What is the probability that a randomly selected student in ES 25 would use
less than 50 gallons of water per day? Explain how you got your answer
(which histogram did you use, did you have to add numbers up, did you
compare areas, etc.)
 Pretend that we actually did this experiment, for five days, and every
single day a different, randomly selected student reported using less than
50 gal/day. How would you explain the discrepancy between the
probability that you calculated above, and the observed phenomenon
(clearly, there is no “right answer” we are looking for logical reasoning)
 What is the z-score associated with your personal water consumption (report
your water consumption, in gal/day)? If you did not do HW1, choose a value,
report the value (in gal/day), and calculate the associated z-score. Interpret
your answer, for this problem.
Investigation #2: Distribution of Carbon Dioxide for a One-year Period at Mauna Loa
1. Copy the 2004 monthly CO2 data from the “MaunaLoaCO2” spreadsheet into a new
spreadsheet (name it 2004CO2). Paste special/transpose to get the data in a column
rather than a row.
2. Summary Statistics
 Generate “summary statistics” for the distribution of CO2 during the year 2004.
Paste your findings (in chart form) below.
 Interpret the values: mean, median, standard deviation, and range, in terms of this
problem. You may report your answer in the chart above, if you like.
3. Frequency table and Histogram
 Using a bin width of 2, calculate the “frequencies,” “relative frequency” and the
“density,” to generate the following table (paste your completed version, below):
Bin Width
=2
Bin Range
Frequency
Relative
Frequency
Density
(371-373]
(373-375]
Sum
 Using the chart tool, make a “relative frequency” histogram (format them like you
did in the water problem, with no space between the bins). Paste below.
4. Investigate effects of bin width:
 Make a new frequency table (like the one above) and relative frequency histogram
for a bin width= 1 ppm. Paste below.
 Which bin width (1 ppm or 2 ppm) is best at revealing patterns of variation in the
data? Justify your choice.
5. Is the distribution changing over time?
 Generate a new frequency table and histogram, for the CO2 distribution in the year
that you were born (your “birth year”). If you were born before 1958, use the year
1960 data.
 Your histogram should be of “relative frequency,” and should use a bin width of 1
ppm. Paste the frequency table, and histogram, below.
Draw conclusions (answer is complete sentences):
 In the year 2004, what proportion of months had carbon dioxide

concentrations greater than 379 ppm?
In your birth year, what proportion of months had carbon dioxide
concentrations greater than 379 ppm?
 Compare the ‘typical’ CO2 observation in 2004 to the ‘typical’ CO2


observation in your birth year (think: should you be using the mean, or the
median?) Explain why the values (from your birth year to 2004) are so
different (hint: see Keeling Curve, page 4 of your workbook).
Compare the range and standard deviation for the 2004 data and your birth
year data. Offer logical explanations (they don’t need to be “correct”) for the
differences that you observe.
Compare the shape of the distribution to what you predicted it would look
like (in HW3). Offer a reasonable hypothesis for the observed shape.
If you don’t have the data analysis toolpak, you can calculate frequencies as follows:
http://www.meadinkent.co.uk/xlfreq.htm
Download