IB Studies 1/28/2014 Name ____________________________ Pearson’s correlation coefficient, r Coefficient of determination, r2: If there is a casual relationship between two groups then r2 indicates the degree of which change in the independent variable explains the change in the dependent variable. Example Father’s Height (cm) Son’s height 175 183 167 178 170 158 167 158 179 180 183 185 171 167 180 177 170 152 2) Find r2: 1) Find r: r2 = ____ is the variation in the son’s height can be explained by the variation in the father’s height. X2 determines whether one category is related to another one or not. Or you might say that X2 tests the difference between the observed and expected values we obtained from one sample. The Critical Value of X2 has to be used to conclude that the variables are not independent. X2 depends on 2 things: a) the size of the table, called the degree of freedom, df = (# rows -1)(# columns – 1) and b) the significance level, the minimum acceptable probability (usually 10%, 5%, or 1%) that the variables are independent. The table for the DF and significant level gives the critical value of X2, above which we conclude the variables are not independent. Table of the chi square distribution DF 1 2 3 4 5 6 7 8 9 10 0.200 1.642 3.219 4.642 5.989 7.289 8.558 9.803 11.030 12.242 13.442 0.100 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 Level of Significance 0.075 0.050 0.025 0.010 3.170 3.841 5.024 6.635 5.181 5.991 7.378 9.210 6.905 7.815 9.348 11.345 8.496 9.488 11.143 13.277 10.008 11.070 12.833 15.086 11.466 12.592 14.449 16.812 12.883 14.067 16.013 18.475 14.270 15.507 17.535 20.090 15.631 16.919 19.023 21.666 16.971 18.307 20.483 23.209 0.005 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 0.001 10.828 13.816 16.266 18.467 20.516 22.458 24.322 26.125 27.878 29.589 0.0005 12.116 15.202 17.731 19.998 22.106 24.104 26.019 27.869 29.667 31.421 For example, at a 5% significance level with a DF of 1, the critical number is 3.84. This means that at a 5% significance level, the departure between the observed level and the expected level is too great if X2 > 3.84.In order for X2 to be distributed appropriately, the sample size must be sufficiently large. Generally it is sufficiently large if no values in the expected value table no less than 5. If it is less than five, we can combine this column with an adjacent column. 3) What is the degree of freedom for a table that is shown below? C D E A 23 17 43 B 7 3 17 sum 30 20 60 F 7 13 20 sum 90 40 130 4) Find the expected values in to an appropriate table. If any expected values are insufficiently large, combine and appropriated set of columns. rarely A few times More than Many usual Large `12 13 8 10 Small 22 21 1 11 When using your calculator to find X2, a p-value is provided. This can be used, together with the X2 value and the critical value, to determine whether or not to accept that the variables are independent. For a given contingency table, the p-value is the probability of obtaining observed values as far or further from the expected values, assuming the variables are independent. If the p value is smaller than the significant level, then it is unlikely that we would have obtained the observed results if the variable had been independent. We therefore conclude that the variables are not independent. The Formal Test For Independence: Step 1: State H0 called the null hypothesis. This is a statement that the two variables being considered are independent. State H1 called the alternative hypothesis. This is a statement that the two variables being considered are not independent. Step 2: State the rejection inequality X2 > k where k is the critical value of X2. Step 3: Construct the expected frequency table. Step 4: Use technology to find X2. Step 5: We either reject Ho or do not reject Ho, depending on the result of the rejection inequality. Step 6: We would also use the p-value to help us with our decision making. For example, at 5% significance level: If p < 0.05 we reject Ho. If p > 0.05 we do not reject Ho (don’t use “we accept”). 5) A survey was given to randomly chosen high school students from years 9 to 12 on possible changes to the school’s canteen. The contingency table shows the results. Year Grouping 9 10 11 12 Change 7 9 13 14 No change 14 12 9 7 At a 5% significance level, test whether the student’s canteen preference depends on the year group. Find: a) Null Hypothesis: b) Alternative Hypothesis: c) Degree of freedom: d) Critical value of a 5% significance level (i.e., We reject Ho if X2 > ___?____critical level: e) What is the Expected frequency table? f) X2 = ____?____ g) Is X2 > Critical number? h) What conclusion do you make with this information? i) What does the calculator say about the p value? Is it > .05, which says not to reject Ho, or is it < .05 which means to reject Ho. We conclude that at a 5% level of significance, the variables year group and canteen preference ________ ( are. are not) independent. Problems: 1) This contingency table shows the responses of a randomly chosen sample of adults regarding the person’s weight and whether they are diabetics. At a 5% significance level, the critical value of X2 is 5.99. Test at a 5% level whether there is a link between weight and suffering diabetes. State Ho. Weight Light Medium Heavy Diabetic 11 19 26 Non-Diabetic 79 68 69 Ho: The weight of a person is independent of them being diabetic. X2 is approx. 6.61, df = 2, p is approx. 0.0368 If X2 > 5.99, we reject Ho or if p < .05, we reject Ho. 2) The guest staying at a hotel is asked to provide their reason for traveling and to rate the hotel on a scale from Poor to Excellent. The results are shown below. State Ho. Poor Fair Good Excellent Business 27 25 20 8 Holiday 9 17 23 30 Show that, at a 5% significance level, the variables reasons for traveling and rating are not independent. 3) The hair and eye colors of 150 randomly selected individuals are shown in the table below.. State Ho. Blonde Black Brunette Red Blue 14 10 21 5 Brown 11 32 20 12 Green 5 2 14 4 Find the critical level for X2. Determine whether there is an association between hair color and eye color. 4) A study followed a random sample of 8474 people with normal blood pressure for about four years. All the individuals were free of heart disease at the beginning of the study. Each person took the Spielberger Trait Anger Scale test., which measures how prone a person is to sudden anger. Researchers also recorded whether each individual developed coronary heart disease (CHD). This includes people who had heart attacks and those who needed medical treatment for heart disease. Here is a two-way table that summarizes the data. Low anger Moderate anger High anger CHD 53 100 27 No CHD 3057 4621 606 a) Sate Ho. b) Find the expected values. c) Find the X2. d) Find the p-value. e) What conclusion do you make? 5) A great folk story is that dog owners and their dogs tend to look alike. Here is a two-way study that investigates that theory. Resembles owner Doesn’t resemble Purebred Dogs 16 9 Mixed-Breed dogs 7 13 Ho; There is no association between dog breed and resemblance to the owner in the population. a) b) c) d) Determine the expected values. What is the X2 value? What is the p-value? What is your conclusion?