Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ NOTE: Do not Turn This In. Homeworks will not be graded Homework in preparation for quiz #6 – on October 15th Use SPSS exercises to start developing your computer skills. 1) Explain the following terms using the words provided in parenthesis (+0.25, each): -Parametric statistics (normality) = Built upon the assumption that the observations (data) come from a normal distribution, and make inferences about the specific parameter of that distribution - Non-parametric statistics (distribution-free) = A type of distribution-free techniques do not rely on any assumptions concerning the distributions of the observations (data). Two examples of distributionfree statistics involve randomization tests, comparing observed and expected patterns for shuffled data, and non-parametric statistical tests, using ranked data to compare the medians, rather than the means. - P-P plot (proportions) = Two-dimensional plots of the cumulative probabilities of two different distributions. Usually applied to compare the observed distributions against a theoretical expectation (e.g., one-sample K-S test), or two observed distributions (e.g., two-sample K-S test). P-P plots are used to visually assess the similarity between the two variable distributions. If all the data points fall on the diagonal of the plot then the variable is normally distributed. Any points that do not follow the diagonal show deviations from normality. - Shapiro - Wilks test (normality) = A statistical test developed to determine whether a given frequency distribution originated from a normally distributed population, with the same mean parameters as the sample (mean, S.D.). The null hypothesis is that the sample came from the normal distribution. A significant result suggests that the sample is not normally distributed. 2) Highlight the difference between a 1-sample and a 2-sample Kolmogorov-Smirnov test by filling in the blanks below. With either: “1-sample K-S test” , “2-sample K-S test” or “both” (each is 0.10 points): - compares observed data distribution against theoretical distribution with same mean and S.D.: 1-sample K-S test - compares two observed data distributions against each other: 2-sample K-S test - quantifies the location (central tendency) and spread (shape) of continuous distributions: both - null hypothesis for this test states that the two data distributions are the same (all data come from the same biological population): 2-sample K-S test 1 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ - null hypothesis for this test states that the observed data and the theoretical distribution are the same (the observations come from the same theoretical biological population): 1-sample K-S test 3) Open the file BIOL4090_Hw4_data.xls with SPSS and use these observations, drawn from three random variable distributions (all derived from theoretical distributions with a mean = 10 and a variance =10) for the following exercise. Make sure the variables are “numeric” and the measure is “scale”. Create a frequency table for these three datasets of 100 data points each and use this information to fill in the table below (+0.05 each entry). Note – because these are random samples from theoretical distributions , the parameter estimates (x-bar, S.D.) will vary from the real parameters ( µ, σ). DATASET MEAN STDEV MEDIAN 5 PERCENTILE 95 PERCENTILE Distribution_1 9.724 2.943 9.644 5.208 15.799 Distribution_2 9.720 3.000 10.000 5.000 14.950 Distribution_3 9.705 3.033 9.762 4.937 15.167 Use SPSS to create calculate the skewness and kurtosis of each distribution and to create a histogram – with a superimposed normal distribution. Note: you can make multiple calculations at once, by dragging several distributions into the statistics “box”. Paste the information below (+0.25 for each distribution): NOTE: this is the SPSS output 2 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ Distribution_1: Skewness: 0.625 (S.E of skewness = 0.241) Interpret this skewness: Looks very symmetrical, but it has positive skewness, suggesting that the right tail is longer and the mass of the distribution is concentrated on the left of the figure. Kurtosis: 0.843 (S.E of kurtosis = 0.478). Interpret this kurtosis: Distribution is leptokurtic, meaning that the distribution has a taller and skinnier peak around the mean (more observations) and fewer around the tails (less observations) than expected, as indicated by the positive kurtosis. Note: however, kurtosis is not significant (the 95% C.I. overlaps “0”). Paste Histogram – with superimposed normal curve below: 3 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ Briefly explain: Briefly – discuss, whether this distribution looks like a normal distribution, on the basis of the skewness / kurtosis and the shape of the histogram? Explain why / why not: Looks like a normal distribution. While the skewness is less than the “rule of thumb” threshold (1), it does seems to be significant, since skewness +/- 2 S.E.s ranges from 0.143 to 1.107 (does not overlap 0). The kurtosis is not very pronounced either: it is less than the “rule of thumb” threshold (1), and it does not seem to be significant, since kurtosis +/- 2 S.E.s ranges from -0.113 to 1.799 (overlaps 0) Distribution_2: Skewness: 0.180 (S.E of skewness = 0.241). Interpret this skewness: Looks very symmetrical, but it has positive skewness, suggesting that the right tail is longer and the mass of the distribution is concentrated on the left of the figure. Kurtosis: 0.362 (S.E of kurtosis = 0.478) Interpret this kurtosis: Distribution is leptokurtic, meaning that the distribution has a taller and skinnier peak right on the mean (more observations) and slimmer tails (fewer observations) than expected, as indicated by the positive kurtosis. Note: however, kurtosis is not significant (the 95% C.I. overlaps “0”). 4 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ Paste Histogram – with superimposed normal curve below: Briefly explain: Briefly – discuss, whether this distribution looks like a normal distribution, on the basis of the skewness / kurtosis and the shape of the histogram? Explain why / why not: ______________________________________________________________________________ However, the skewness is less than the “rule of thumb” threshold (1), and it does not seem to be significant, since skewness +/- 2 S.E.s ranges from -0.302 to 0.662 (overlaps 0). The kurtosis is not very pronounced: it is less than the “rule of thumb” threshold (1), and it does not seem to be significant, since kurtosis +/- 2 S.E.s ranges from -0.594 to 1.318 (overlaps 0) Distribution_3: Skewness: -0.082 (S.E of skewness = 0.241). Interpret this skewness: Looks very symmetrical, but it has negative skewness, suggesting that the left tail is longer and the mass of the distribution is concentrated on the right of the figure. Kurtosis: -0.095 (S.E of kurtosis = 0.478) Interpret this kurtosis: Distribution is platykurtic, meaning it has a “wider” peak around the mean (less observations right on the mean) and thicker tails (more observations) than expected, as indicated by the negative kurtosis. Note: however, kurtosis is not significant (the 95% C.I. overlaps “0”). 5 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ Paste Histogram – with superimposed normal curve below: Briefly explain: Briefly – discuss, whether this distribution looks like a normal distribution, on the basis of the skewness / kurtosis and the shape of the histogram? Explain why / why not: It looks like a normal distribution. The skewness is less than the “rule of thumb” threshold (1), and it does not seem to be significant, since skewness +/- 2 S.E.s ranges from -0.564 to 0.400 (overlaps 0). However, the kurtosis is not very pronounced: it is less than the “rule of thumb” threshold (1), and it does not seem to be significant, since kurtosis +/- 2 S.E.s ranges from -0.051 to 0.861 (overlaps 0) 4) Compare each distribution to a normal distribution with the same mean / S.D. (use parameters estimated in question #2, above. For each test, use the Shapiro – Wilk test and paste the table of results and the Q-Q plot (+0.50 for each distribution): This is the output from SPSS, showing the three test results. Focus on the Shapiro-Wilk results: 6 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ Distribution_1: Paste results table here, and interpret the S-W test result: - was this result significant: (Y / N), why? Yes, p = 0.022 is < 0.05 - is distribution 1 normally distributed: (Y / N), why? No – because the null was rejected - using the Q-Q plot, are there more observations than expected in the tails or on the center of mass of the observed distribution? Does this agree with the sign (+/-) of the kurtosis of this distribution you calculated in question #3? Why / why not? Paste the detrended normal Q-Q plot here – for reference: This plot agrees with the answers in question 3. The Q-Q plot shows an excess of observations (positive values) from the two tails of the distribution (the smaller and larger values), and a deficit of values (negative values) around the mean (from ~ 7 to ~ 12). This suggests that the observed distribution has more observations in the tails (leptokurtic). The asymmetry of the deviations around the mean, with smaller left deviations (smaller values), suggests that the distribution has a positive skew, with a longer right tail and more observations to the left. NOTE: deviations are calculated as “observed proportion – expected proportion” 7 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ Distribution_2: Paste results table here, and interpret the S-W test result: - was this result significant: (Y / N), why? No, p = 0.154 is > 0.05 - is distribution 2 normally distributed: (Y / N), why? Yes - because the null was not rejected - using the Q-Q plot, are there more observations than expected in the tails or on the center of mass of the observed distribution? Does this agree with the sign (+/-) of the kurtosis of this distribution you calculated in question #3? Why / why not? - Paste the detrended normal Q-Q plot here – for reference: This plot agrees with the answers in question 3. The Q-Q plot shows an excess of observations (positive values) from the right tail of the distribution (the larger values), and a deficit of values (negative values) in the left tail. The asymmetry of the deviations about the mean, with smaller positive deviations to the left (smaller values), suggests that the distribution has a right skew (positive skew) with a longer right tail. NOTE: deviations are calculated as “observed proportion – expected proportion” 8 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ Distribution_3: Paste results table here, and interpret the S-W test result: - was this result significant: (Y / N), why? No, p = 0.154 is > 0.05 - using the Q-Q plot, are there more observations than expected in the tails or on the center of mass of the observed distribution? Does this agree with the sign (+/-) of the kurtosis of this distribution you calculated in question #3? Why / why not? Paste the Normal Q-Q plot here (not the detrended plot) – for reference: This plot agrees with the answers in question 3. The skewness is less clear, because the Q-Q plot shows a scatter of positive and negative points, with an excess of observations (positive values) along the right tail of the distribution (the larger values), and a deficit of values (negative values) in the left tail. This suggests that the skewness is very small, since large asymmetries in the distribution are not clearly visible. Similarly, the kurtosis is also hard to evaluate visually, since the positive and negative deviations are spread throughout the range of observed values. These graphs reinforce the notion that the skew and kurtosis quantified in question 3 are very small. NOTE: deviations are calculated as “observed proportion – expected proportion” 9 Biometry (Biol4090) - Fall 2015 Homework #4 Student name: ______KEY__________ 5) Given that distribution_1 was log normal, distribution_2 was Poisson, and distribution_3 was normal, answer the following questions (+0.25 each). Note: Re-read the notes from lecture 5: - What is the only parameter of the Poisson distribution (Lambda) and what does it quantify? The Poisson distribution only has one parameter (Lambda), which quantified its mean and the variance. In the Poisson distribution, the mean = the variance. - Briefly describe how the shape of the Poisson distribution changes as the parameter Lambda increases from 0.1 to 10 (range of values from lecture and this example)? Specifically, describe what happens to the skewness and the kurtosis of the distribution. The Poisson distribution changes shape as lambda increases from a value of 0.1 to a value of 10 (the current value in this example). Please consult the following figure from your class notes (taken from the Gotelli book), showing that the distribution starts as being highly asymmetrical, and then becomes increasingly symmetrical. Thus, the skewness ranges from a large positive value (when Lambda = 0.1) to a very small positive value (when Lambda = 10). Please check out this slide from lecture 6. 10