ES 25 Quantitative Thinking Lab 4: Data Description: Summary Statistics and Histograms Due: Tuesday, May 1st, before 12 noon (E-mail to your facilitator) Note: “interpret or explain for this problem” means: write 1-4 complete, precise, sentences that would explain the phenomena in question, at a high school reading level (using non-math lingo), and involving the particular problem that you are working with (example: NOT “z-score =3 since standard deviation=2 and observation =36, mean=30” but “my average daily water consumption was 36 gal/day, which was three standard deviations above the average value of 30 gallons per day. For the ES 25 population, the water consumption observations were an average distance of 2 gal/day away from the mean. NOTE: this is an example, the numbers are not accurate). The data files needed for the following exercises are found in Lab4.xls on the course website. INSTALL the data analysis toolpak (if you have it) as follows: o Tools/Add-ins/Check “data analysis toolpak (VBA)” IF you missed lab, or if you need more help, you will also want to check out the tutorials on making histograms in Excel available on the course website. Investigation #1: ES25 Student’s Water Consumption 1. Summary Statistics Generate “summary statistics” for the list of student’s water consumption (measured in gallons/day) Tools/Data Analysis/Descriptive Statistics/Input range= (select data)/Output range (select destination cell on same worksheet)/Check “summary statistics” You should get the following chart (if you do NOT have the Data Analysis Toolpak on your computer, you can get the values in the chart using the formulas given): Summary Statistics If you don't have Data AnalysisToolpak,Use: Mean 76.93487 average(a1:a46) Standard Error 7.030648 STDEV(A1:A46)/SQRT(COUNT(A1:A46)) Median 65.04 MEDIAN(A1:A46) Mode #N/A MODE(A1:A46) Standard Deviation 47.68418 STDEV(A1:A46) Sample Variance 2273.781 VAR(A1:A46) Kurtosis 0.572544 KURT(A1:A46) Skewness 1.100719 SKEW(A1:A46) Range 192.2925 max(A1:A46)-min(a1:a46) Minimum 24.7075 MIN(A1:A46) Maximum 217 MAX(A1:A46) Sum 3539.004 SUM(A1:A46) Count 46 COUNT(A1:A46) Interpret what each of the values highlighted in yellow means for this problem (using complete sentences). You may report your answer in a chart (for example, replace the 3rd column in the chart above with your “interpretations”) if you like. 2. Frequency table and Histogram Calculate the “frequencies” for the water consumption data, using the following bins: (0-40], (40-80], (80-120], (120, 160], (160, 200], (200-240] NOTE: for Excel to recognize the bins above, you must type them is as: 40, 80, 120, 160, 200, 240… it does not recognize parentheses as number values. Frequencies can be calculated using the Tools/Data Analysis/Histogram: o input= CO2 data, bins= bins on left hand column below, output = click on one cell above where you want the values to start) Calculate the “relative frequency” and the “density,” to generate the following table (replace the one below with your completed table): Bin Width = 40 Bin Range Frequency 40 (0-40] 14 80 (40-80] 16 Relative Frequency 0.3043 0.3478 Density 0.0076 0.0087 Sum Use the “Chart” to make a “Frequency histogram.” Try to make it look like this: Histogram: ES 25 Student Water Consumption 18 16 14 Frequency 12 10 8 6 4 2 0 (0-40] (40-80] (80-120] (120-160] (160-200] Total Daily Water Consumption (gal/day) (200-240] Use the “Chart” to make a Relative Frequency histogram. Paste below. Use the “Chart” to make a Density histogram. Paste below. 3. Investigate effects of bin width: Make Relative Frequency histograms for the bin widths below. Format the histograms like the previous group, for easier comparison. Paste below (resize them so that you can easily (and accurately) compare the graphs. o Bin width = 25 o Bin width = 10 Which bin width do you think is most appropriate for displaying the water consumption data? Write a paragraph justifying your choice (think about what patterns you can see that may be “masked” by larger or smaller bin choices) 4. Draw conclusions (answer is complete sentences): Write a paragraph description of the distribution (think about type, outliers, clusters, maximums, minimums, see pages 26-27 workbook). What proportion of students used greater than or equal to120 gallons of water per day? Explain how you got your answer (which histogram did you use, did you have to add numbers up, did you compare areas, etc.) What percentage of students consume less water than the mean (you may have to calculate this by hand, or assume that values are uniformly distributed within bins)? What percentage of students consume less water than the median? Would you say “most students consume less than average?” What is the probability that a randomly selected student in ES 25 would use less than 50 gallons of water per day? Explain how you got your answer (which histogram did you use, did you have to add numbers up, did you compare areas, etc.) Pretend that we actually did this experiment, for five days, and every single day a different, randomly selected student reported using less than 50 gal/day. How would you explain the discrepancy between the probability that you calculated above, and the observed phenomenon (clearly, there is no “right answer” we are looking for logical reasoning) What is the z-score associated with your personal water consumption (report your water consumption, in gal/day)? If you did not do HW1, choose a value, report the value (in gal/day), and calculate the associated z-score. Interpret your answer, for this problem. Investigation #2: Distribution of Carbon Dioxide for a One-year Period at Mauna Loa 1. Copy the 2004 monthly CO2 data from the “MaunaLoaCO2” spreadsheet into a new spreadsheet (name it 2004CO2). Paste special/transpose to get the data in a column rather than a row. 2. Summary Statistics Generate “summary statistics” for the distribution of CO2 during the year 2004. Paste your findings (in chart form) below. Interpret the values: mean, median, standard deviation, and range, in terms of this problem. You may report your answer in the chart above, if you like. 3. Frequency table and Histogram Using a bin width of 2, calculate the “frequencies,” “relative frequency” and the “density,” to generate the following table (paste your completed version, below): Bin Width =2 Bin Range Frequency Relative Frequency Density (371-373] (373-375] Sum Using the chart tool, make a “relative frequency” histogram (format them like you did in the water problem, with no space between the bins). Paste below. 4. Investigate effects of bin width: Make a new frequency table (like the one above) and relative frequency histogram for a bin width= 1 ppm. Paste below. Which bin width (1 ppm or 2 ppm) is best at revealing patterns of variation in the data? Justify your choice. 5. Is the distribution changing over time? Generate a new frequency table and histogram, for the CO2 distribution in the year that you were born (your “birth year”). If you were born before 1958, use the year 1960 data. Your histogram should be of “relative frequency,” and should use a bin width of 1 ppm. Paste the frequency table, and histogram, below. Draw conclusions (answer is complete sentences): In the year 2004, what proportion of months had carbon dioxide concentrations greater than 379 ppm? In your birth year, what proportion of months had carbon dioxide concentrations greater than 379 ppm? Compare the ‘typical’ CO2 observation in 2004 to the ‘typical’ CO2 observation in your birth year (think: should you be using the mean, or the median?) Explain why the values (from your birth year to 2004) are so different (hint: see Keeling Curve, page 4 of your workbook). Compare the range and standard deviation for the 2004 data and your birth year data. Offer logical explanations (they don’t need to be “correct”) for the differences that you observe. Compare the shape of the distribution to what you predicted it would look like (in HW3). Offer a reasonable hypothesis for the observed shape. If you don’t have the data analysis toolpak, you can calculate frequencies as follows: http://www.meadinkent.co.uk/xlfreq.htm