One more example of a hypothesis test Chapter 10: Scatterplots If time: Joy of Stats – 200 Countries, 200 Years You should know this. You should be familiar with all of this, but don’t waste too much time memorizing. Alzhiemer’s Onset and Gender (From Ch.7 exercises, #21) We’ve been given a list of age of Alzheimer’s onset ages from men and women. We want to find out if there is a difference between the ages that men get Alzheimer’s and the age women get it. Alzheimer’s Onset Sample Mean Standard Devation Sample Size Men 67.75 6.58 8 Women 66.55 5.34 9 Is this a one-sided or two-sided test? “If there’s a difference” tells us this is… ______________ Is this a one-sided or two-sided test? “If there’s a difference” tells us this is… a two-sided test. It’s a two-sided test… Is it about means or proportions? We’re talking about ages, so the ___________ is appropriate. It’s a two-sided test… Is it about means or proportions? We’re talking about ages, so the appropriate. mean is It’s a two-sided test of a mean or means. We weren’t told otherwise, so we don’t know the true standard deviation. We use the sample standard deviation instead. It’s a two-sided test of a mean or means, using the sample standard deviation We’re comparing men and women, so we’re interested in ________________ It’s a two-sided, two-sample test of means, using the sample standard deviation We’re comparing men and women, so we’re interested in two means It’s a two-sided, two-sample test of means, using the sample standard deviation Is it an independent or a paired test? There a different number of people in each group, so they must be ________________ It’s a two-sided, independent two-sample test of means, using the sample standard deviation. Is it an independent or a paired test? There a different number of people in each group, so they must be independent Hope you’re not getting caught up in the concepts We have this data: Alzheimer’s Onset Sample Mean Standard Devation Sample Size Men 67.75 6.58 8 Women 66.55 5.34 9 And we know to do an independent two-sample test. All our formulae will have t and s in them, instead of z and sigma. If we do a confidence interval, it will have mu = something, instead of pi = something. We input the raw data into SPSS and then click Analyze Compare Means Independent T Test We get this for the first couple columns: From this result we _____________ assume equal variances. This is because __________________, so we use the __________ row. From this result we can assume equal variances. This is because Sig. (p-value) is large. so we use the top row. We get this for the middle five columns of the table. Sig. (2-tailed) is large so we _________________ the null. That means we detect __________________ between the onset age of alzhiemers between men and women. Sig. (2-tailed) is large so we fail to reject the null. That means we detect no significant difference between the onset age of alzhiemers between men and women. We can also tell that the difference between the means was ________ standard errors. (t-score) The area beyond this score (on both sides) was found on the tdistribution with _____ degrees of freedom. The area was ____________ (p-value) We can also tell that the difference between the means was 0.413 standard errors. (t-score) The area beyond this score (on both sides) was found on the tdistribution with 15 degrees of freedom. The area was .686 (p-value) To a new chapter we go! Chapter 10: Correlations Correlations are one way to quanity and show the relationship between two features of the same object. Usually this is between two sets interval data, otherwise it’s called an association. If two values increase together, they are said to be positively correlated. (As one goes up, so does the other) If one value increases as the other decreases, they are said to be negatively correlated. Example: Longer bearded dragons tend to be larger all around, so they weigh more. Length and Weight are positively correlated in bearded dragons. Example: Heating bills tend to be a lot less when it’s warmer out. Heating Cost and Outdoor Temperature are negatively correlated. The most common graph to show two sets of interval data together is the scatter plot. Each dot represents a subject. In Length vs. Weight, each dot is a dragon. The height of the dot represents the length of the dragon. How far it is to the right represents the weight of the dragon. The dragon for this dot is 18cm long, and weighs 700g. There is an obvious upward trend in the graph. This shows a positive correlation. The negative correlation between heating cost and outdoor temperature can be shown the same way. The lack of correlation between two variables can also be show in a scatterplot. The strength of a correlation is how well the data points fit onto a straight line . Stronger correlations are easier to see and have less random scatter or variation. We can quantify the strength and direction of a correlation with the correlation coefficient. The correlation coefficient, called… r from a sample and ρ, or rho from a population. (we’ll see r frequently) (we’ll see rho rarely) Is a value between -1 and 1 that tells how strong a correlation is and in what direction. The stronger a correlation, the farther the coefficient is from zero (and the closer it is to 1 or -1) Positive correlations have positive coefficients r. Negative correlations have negative coefficients r. The stronger the negative correlation, the closer it is to -1. A perfect correlation, one in which all the values fit perfectly on a line, has a correlation 1 (for positive) or -1 (for negative). If there is no correlation at all, r will have a value of zero. However, since r is from a sample, it will vary like everything else from a sample. Instead of zero, it usually has some value close to zero on either side. Recall the Burnaby vs. Coquitlam gas example from last week. One reason a pooled t-test was appropriate was because gas prices between the two cities were correlated. Guess the correlation coefficient between Burnaby and Coquitlam gas prices. A) r = 0.05 B) r = 0.97 C) r = 0.592 D) r = -0.592 C) r = 0.592 There is a relationship between Burnaby and Coquitlam gas prices, but it’s not a perfect relationship. It’s postive, so the correlation coefficient r is postive, not negative. Which of these is a possible correlation coefficent? A) r = -0.28 B) r = 1.21 C) r = 0.41 grams per bean. Which of these is a possible correlation coefficent? A) r = -0.28 r is always between -1 and 1. Also, it has no units, so ‘grams per bean’ doesn’t make much sense, even through it’s a relationship between two variables. Joy of Stats 28:45 – 33:00 (200 Countries, 200 Years) Next time: r-squared, significance test for correlation , nonlinearity.