Lab 10 - Statistics

advertisement
Statistics 400
Lab #10
Objectives: The objective of this lab is to learn to perform linear regression analysis on
real data. Scatter-plots and a regression analysis should be done.
Instructions:
1. Read the problem statements on the following pages.
2. Conduct a regression analysis on the following data sets.
3. The data sets can be found in:
(a) SPSS data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/brain.sav
ASCII data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/brain.dat
(b) SPSS data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/hubble.sav
ASCII data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/hubble.dat
(c) SPSS data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/cpk.sav
ASCII data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/cpk.dat
In either case, save the file to the local disk before beginning the analysis.
4. Write a small report outlining your analysis and conclusions for each dataset..
Problem 1:
Is the size of your brain an indicator of your mental capacity?
In this study by Willerman, Schultz, Rutledge and Bigler (1991, "In Vivo Brain Size
and Intelligence," Intelligence, 15, 223-228), the researchers use Magnetic Resonance
Imaging (MRI) to determine the brain size of the subjects with the aim of
investigating whether there is a relationship between brain size and IQ.
Willerman et al. (1991) conducted their study at a large southwestern university. They
selected a sample of 38 right-handed Anglo introductory psychology students who
had indicated no history of alcoholism, unconsciousness, brain damage, epilepsy, or
heart disease. With prior approval of the University's research review board students
meeting the criteria were selected for and IQ tests. The MRI Scans were performed
at the same facility for all 38 subjects. The scans consisted of 18 horizontal MR
images. The computer counted all pixels with non-zero gray scale in each of the 18
images and the total count served as an index for brain size. A large pixel count
implies a large brain size.
1. Treating IQ as the response variable, construct a scatter-plot of IQ versus MRI
Pixel Count.
2. Does there appear to be a significant linear relationship between IQ and MRI
Pixel Count?
3. Is there really a response variable in this case?
4. Using SPSS and treating IQ & Pixel Count as the response and explanatory
variables, obtain the least squares regression line….what would the slope be if
there is no linear relationship?
5. Does there appear to be a significant linear relationship between IQ and MRI
Pixel Count based on the least squares line?
Problem 2:
In 1929, Edwin Hubble investigated the relationship between distance of a galaxy
from the earth and the velocity with which it appears to be receding. Galaxies appear
to be moving away from us no matter which direction we look. This is thought to be
the result of the "Big Bang". Hubble hoped to provide some knowledge about how
the universe was formed and what might happen in the future. The data collected
include distances (megaparsecs) to 24 galaxies and their recession velocities
(km/sec). Note: 1 parsec = 3.26 light years.
Hubble's law:
Recession Velocity = Bo*Distance
where Bo is the famous Hubble's Constant. Hence, for every additional Megaparsec
away from th earth, a galaxy recedes faster by approximately 75 km/sec. By working
backward in time, the galaxies appear to meet in the same place. Thus 1/ Bo can be
used to estimate the time since the "Big Bang" -- a measure of the age of the universe.
1. Treating Recession Velocity as the response variable, construct a scatter-plot of
Recession Velocity versus Distance.
2. Does there appear to be a significant linear relationship between Recession
Velocity and Distance?
3. Using SPSS and treating Velocity and Distance as the response and explanatory
variables, obtain the least squares regression line containing an intercept term?
4. Using SPSS and treating Velocity and Distance as the response and explanatory
variables, obtain the least squares regression line WITHOUT an intercept term
(from the linear regression window, select options and then de-select the
Include Constant in Equation check-box)?
5. Which of the two models is best in this case? Why? (Hint: Look at the
correlation coefficients and Mean Squares due to residual (or error))
Problem 3:
CPK (creatine phosphokinase) is a enzyme contained within muscle cells which is
necessary for the storage and release of energy. It can be released into the blood in
response to vigorous exercise from damaged (leaky) muscle cells. This occurs often
even in healthy athletes. (Source: Zuliani, U., Mandras, A., Beltrami, G. F., Bonetti, A.,
Montani, G., and Novarini, A. (1983). Metabolic modifications caused by sport activity: effect in
leisure-time cross-country skiers. Journal of Sports Medicine and Physical Fitness, 23, 385-392.)
This study investigated the metabolic effect of cross-country skiing. Subjects were
participants in a 24 hour cross-country relay. Weight (kg) and blood CPK
concentration 12 hours into the relay were recorded. The purpose of the study was to
to see if weight impacted blood CPK in skiers.
1. Construct a scatter-plot of Blood CPK versus Weight. Does there appear to be a
linear relationship between the two variables?
2. Frequently, transforming the response helps the make the relationship appear
more linear. Common transformations are the natural log and square root
transformations. Construct a scatter-plot of the Log(Blood CPK) versus Weight.
Does the relationship between the two variables Log(Blood CPK) and Weight
appear more linear than the relationship between Blood CPK and Weight?
3. Perform a regression analysis of Log(Blood CPK) with Weight as the independent
variable. Does there appear to be a linear relationship between Log(Blood CPK)
with Weight?
4. What would you predict the Log(Blood CPK) to be for an individual who
weighed 73 kg?
5. What would you predict the Blood CPK to be for an individual who weighed 73
kg?
Download