Stat 401G Lab 1 Solution Use JMP to analyze the data in problem 1 below. It is not sufficient to simply hand in JMP output. You must answer the questions completely. Hand in the JMP output attached at the end of your written (word processed) solutions. 1. In a study of plant communities in tropical forests researchers quantified nonstructural carbohydrates (NSC) concentrations (mg/g) in sapling stems in moist semi-evergreen and dry deciduous tropical forests in the rainy season in Bolivia. NSC stored in stems allows plants to overcome periods of stress and should enhance survival. One question the researchers had was whether the NSC concentrations differed for species of plants in the two types of forest. Go to the course Web site http://www.public.iastate.edu/~wrstephe/stat401.html Download/open the JMP file that contains the NSC data. a) Summarize the NSC data for the 69 species of trees/shrubs graphically by creating a histogram and box plot. Referring to these graphs, describe the distribution of NSC concentration. Refer to each graph separately and comment on the shape of the distribution, if there are any possible outliers and describe what in the graph supports your statements. 25 15 10 Count 20 5 0 25 50 75 100 125 150 NSC The histogram is mounded around 50 mg/g and has a slight skew towards higher concentrations (skewed right) as indicated by the stair step down as you move to the right. The box plot also indicates this skew to the right with the longer “whisker” stretching to the right. The box plot indicates one potential outlier at 137 mg/g. 1 b) Summarize the data numerically and report a five number summary, the sample mean and sample standard deviation for the entire sample of 69 species. Minimum = 19 mg/g, Lower quartile = 45 mg/g, Median = 58 mg/g, Upper quartile = 81 mg/g, Maximum = 137 mg/g sample mean = 63.6 mg/g sample standard deviation = 27.42 mg/g c) What does the comparison of the mean to the median tell you about the shape of the distribution of values? The fact that the sample mean is higher than the sample median supports the notion that the distribution is skewed toward higher values (skewed right). The mean will be pulled in the direction of the skew. d) Report the 95% confidence interval for the population mean NSC concentration. Give an interpretation of this interval and the confidence level. The 95% confidence interval for the population mean NSC concentration is from 57.05 mg/g to 70.23 mg/g. This interval provides plausible values for the population mean NSC concentration for tropical forest trees/shrubs. 95% confidence tells us that if we were to create many intervals, like the one above, based on random samples from the population, about 95% of those intervals would capture the population mean NSC concentration. e) Test the hypothesis that the population mean NSC concentration is 55 mg/g against an alternative that the population mean NSC concentration is greater than 55 mg/g. H 0 : 55 H A : 55 t 63.64 55 2.617 27.423 69 P value 0.0055 Because the P-value is so small, reject the null hypothesis. The population mean NSC concentration is greater than 55 mg/g. 2 f) Use JMP to create residuals by subtracting off the sample mean from each of the 69 NSC concentrations. Use Analyze + Distribution to create JMP output that looks at the distribution (histogram, box plot and Normal Quantile Plot) of the 69 residuals. 3 .99 2 .95 .90 1 .75 .50 0 Normal Quantile Plot Residual (NSC concentrations) .25 .10 .05 -1 -2 .01 -3 25 15 Count 20 10 5 -60 -40 Species Ocotea sp. 2 -20 Forest Moist 0 20 40 NSC 137 60 80 100 Residual (one-sample) 73.3623188 3 g) Refer to the histogram of residuals and indicate what you see and what this tells you about the condition that the random errors are normally distributed. The histogram of residuals is mounded between –20 and 0. There is a skew toward larger values (skewed right) as indicated by the stair step pattern down to the right. The skewed shape is an indication that the NSC concentrations may not be normally distributed. h) Refer to the box plot of residuals. Are there any possible outliers? If so, what are the corresponding NSC concentrations, species and type of forest? What does the box plot tell you about the condition of identically distributed random errors? The box plot shows that there is a potential outlier for the largest positive residual. This corresponds to the species Ocotea sp. 2 in a moist forest with NSC concentration of 137 mg/g. Because of this potential outlier, the condition of identically distributed random errors may not be met. i) Refer to the Normal Quantile Plot of residuals and indicate what you see and what this tells you about the condition that the random errors come from a normal distribution. The normal quantile plot shows a curved pattern. The residuals start off below the diagonal line representing a normal model, curve above and then below the diagonal line. This is consistent with the right skewed shape of the histogram. Because of this curved pattern, the condition that the random errors come from a normal distribution may not be met. j) Summarize what you have learned about the conditions necessary for the statistical analysis of the NSC concentrations from your analysis of the residuals. There are problems with the residuals. Both the histogram and the normal quantile plot indicate a right skewed distribution. This puts the condition of normally distributed random errors in doubt. The potential outlier puts the condition of identically distributed random errors in doubt. 4 2. On a separate piece of paper write a brief summary of what you have learned about NSC concentrations in trees/shrubs in tropical forests. NSC concentrations in 69 species of trees/shrubs in tropical forests vary from about 20 mg/g to about 140 mg/g. A majority of the NSC concentrations are between 25 and 75 mg/g. The average NSC concentration is 63.6 mg/g, which is slightly higher than the median NSC concentration of 58 mg/g. The population mean NSC concentration is likely more than 55 mg/g. This statement is supported by a test of hypothesis (P-value = 0.0055) and a 95% confidence interval that indicates that possible values for the population mean NSC concentration are between 57.05 mg/g and 70.23 mg/g. There are some issues as to whether the conditions for statistical inference are met. The P-value may not be exactly as reported, however, there is still strong evidence that the population mean NSC concentration is greater than 55 mg/g. Additionally, the confidence level may be less than the stated value of 95%. 5