Statistics 400 Lab #10 Objectives: The objective of this lab is to learn to perform linear regression analysis on real data. Scatter-plots and a regression analysis should be done. Instructions: 1. Read the problem statements on the following pages. 2. Conduct a regression analysis on the following data sets. 3. The data sets can be found in: (a) SPSS data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/brain.sav ASCII data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/brain.dat (b) SPSS data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/hubble.sav ASCII data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/hubble.dat (c) SPSS data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/cpk.sav ASCII data set http://www.stat.lsa.umich.edu/~dbingham/Stat400/cpk.dat In either case, save the file to the local disk before beginning the analysis. 4. Write a small report outlining your analysis and conclusions for each dataset.. Problem 1: Is the size of your brain an indicator of your mental capacity? In this study by Willerman, Schultz, Rutledge and Bigler (1991, "In Vivo Brain Size and Intelligence," Intelligence, 15, 223-228), the researchers use Magnetic Resonance Imaging (MRI) to determine the brain size of the subjects with the aim of investigating whether there is a relationship between brain size and IQ. Willerman et al. (1991) conducted their study at a large southwestern university. They selected a sample of 38 right-handed Anglo introductory psychology students who had indicated no history of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease. With prior approval of the University's research review board students meeting the criteria were selected for and IQ tests. The MRI Scans were performed at the same facility for all 38 subjects. The scans consisted of 18 horizontal MR images. The computer counted all pixels with non-zero gray scale in each of the 18 images and the total count served as an index for brain size. A large pixel count implies a large brain size. 1. Treating IQ as the response variable, construct a scatter-plot of IQ versus MRI Pixel Count. 2. Does there appear to be a significant linear relationship between IQ and MRI Pixel Count? 3. Is there really a response variable in this case? 4. Using SPSS and treating IQ & Pixel Count as the response and explanatory variables, obtain the least squares regression line….what would the slope be if there is no linear relationship? 5. Does there appear to be a significant linear relationship between IQ and MRI Pixel Count based on the least squares line? Problem 2: In 1929, Edwin Hubble investigated the relationship between distance of a galaxy from the earth and the velocity with which it appears to be receding. Galaxies appear to be moving away from us no matter which direction we look. This is thought to be the result of the "Big Bang". Hubble hoped to provide some knowledge about how the universe was formed and what might happen in the future. The data collected include distances (megaparsecs) to 24 galaxies and their recession velocities (km/sec). Note: 1 parsec = 3.26 light years. Hubble's law: Recession Velocity = Bo*Distance where Bo is the famous Hubble's Constant. Hence, for every additional Megaparsec away from th earth, a galaxy recedes faster by approximately 75 km/sec. By working backward in time, the galaxies appear to meet in the same place. Thus 1/ Bo can be used to estimate the time since the "Big Bang" -- a measure of the age of the universe. 1. Treating Recession Velocity as the response variable, construct a scatter-plot of Recession Velocity versus Distance. 2. Does there appear to be a significant linear relationship between Recession Velocity and Distance? 3. Using SPSS and treating Velocity and Distance as the response and explanatory variables, obtain the least squares regression line containing an intercept term? 4. Using SPSS and treating Velocity and Distance as the response and explanatory variables, obtain the least squares regression line WITHOUT an intercept term (from the linear regression window, select options and then de-select the Include Constant in Equation check-box)? 5. Which of the two models is best in this case? Why? (Hint: Look at the correlation coefficients and Mean Squares due to residual (or error)) Problem 3: CPK (creatine phosphokinase) is a enzyme contained within muscle cells which is necessary for the storage and release of energy. It can be released into the blood in response to vigorous exercise from damaged (leaky) muscle cells. This occurs often even in healthy athletes. (Source: Zuliani, U., Mandras, A., Beltrami, G. F., Bonetti, A., Montani, G., and Novarini, A. (1983). Metabolic modifications caused by sport activity: effect in leisure-time cross-country skiers. Journal of Sports Medicine and Physical Fitness, 23, 385-392.) This study investigated the metabolic effect of cross-country skiing. Subjects were participants in a 24 hour cross-country relay. Weight (kg) and blood CPK concentration 12 hours into the relay were recorded. The purpose of the study was to to see if weight impacted blood CPK in skiers. 1. Construct a scatter-plot of Blood CPK versus Weight. Does there appear to be a linear relationship between the two variables? 2. Frequently, transforming the response helps the make the relationship appear more linear. Common transformations are the natural log and square root transformations. Construct a scatter-plot of the Log(Blood CPK) versus Weight. Does the relationship between the two variables Log(Blood CPK) and Weight appear more linear than the relationship between Blood CPK and Weight? 3. Perform a regression analysis of Log(Blood CPK) with Weight as the independent variable. Does there appear to be a linear relationship between Log(Blood CPK) with Weight? 4. What would you predict the Log(Blood CPK) to be for an individual who weighed 73 kg? 5. What would you predict the Blood CPK to be for an individual who weighed 73 kg?