Chapter 1 Review of Variables: Types: Discrete: e.g. counts such as family size or indicator variables (0 or 1; success or failure). When plotted, say a dot plot or bar graph, you have consistent gaps. Continuous: e.g. include measured variables such as age, height, weight. When plotted, say histogram or boxplot, there are either no gaps or the gaps are inconsistent. Levels of Measurements Nominal: categories are named rather than measured, e.g. sex, eye color, race Ordinal: categories are ordered by name, e.g. shirt size (S, M, L), agreement (disagree, agree, strongly agree). Interval: equal differences between numbers represent equal differences in variable or attribute being measured, e.g. degrees Fahrenheit, Year (A.D.) The zero point is arbitrary. For instance zero degreed Fahrenheit does not indicate absence of heat and thus 100 degrees F doe not mean “twice as hot” as 50 degrees F. Ratio: numbers represent equal amounts from absolute zero. Scores compared as ratios, such as age, dollars, time, and height. With ratio measurements the comparisons have value, e.g. a person who is 6 feet tall is twice the height of someone who is 3 feet tall. Review of Statistical Concepts Measures of Central Tendency – Mode (most frequently occurring), median (half the observations fall at or below this point) and mean (the mathematical average equal to the sum of the observations divided by the number of observations) n X x i 1 i n Measures of Spread – range, quantiles, variance n n S2 (x i 1 i x) 2 n 1 or S 2 x 2 i nx 2 i 1 n 1 and standard deviation equals S. 1 Random Variables Binomial: consists of a fixed n independent trials, with each success, X, having equal probability, p, of success and failure, q = 1 – p. Then the probability of any number of successes x in n trials can be found by: n P(X = x) = P(X x ) p x q n x where x = 0, 1, …., n x Normal: X ~ N(u, σ2) has a normal distribution with parameters mean, u, and variance σ2. Standardizing X by Z = (X – u)/ σ produces a standard normal variable with a mean of 0 and σ2 of 1, probabilities for which can be found in Table B.1 of your text. Properties relating the sample mean, X , and the normal distribution. 1. For x1, x2,…,xn a random sample taken from N(u, σ2) we know that the distribution of X follows a normal distribution with a mean u and variance of σ2/n 2. For any random variable (not necessarily normal) then by the Central Limit Theorem X ~N(u, σ2/n). This theorem typically holds for n > 30 when there is little skewness or outliers. t- distribution: Under assumption 1 above, T = X S/ n has a t-distribution w/ n – 1 degrees of freedom; written T~tn-1 Chi-Square (X2) distribution: For a normal random sample, 2 (n 1)S 2 has 2 a X2 distribution with n – 1 degrees of freedom. F – distribution (used in ANOVA): For two independent normal samples of size n1 and n2 and variances 12 and 22 then: S12 has a F distribution with degrees of freedom n1 – 1 and n2 – 2 and S 22 is written F~Fn1- 1, n2 - 1 F Confidence Intervals and Hypothesis Testing Example: 2-sample t-test - - - Two groups of mental patients undergoing different treatments for the same disorder. A measure of change in the health status is based on a questionnaire and the responses are assumed normal with the same variance for each group. Group 1: n1 = 15 S1 = 2.5 X 1 = 15.1 Group 2: n2 = 15 S2 = 3.0 X 2 = 12.3 2 a. Find a 99% confidence interval for the true mean difference, u1 – u2. b. Test to see if the true means are equal or not equal at alpha = 1% Solution (a): Since the variances are assumed equal we use the “pooled” two-sample tprocedure. In this case we have the following: ( X X 2 ) (1 2 ) (n 1)S12 (n 2 1)S 22 T 1 where S 2p 1 and T~tn1 +n2 -2 n1 n 2 2 1 1 Sp n1 n 2 1 1 where t* comes from the t n1 n 2 distribution with degrees of freedom (df) = n1 + n2 – 2 and probability of 1 – α/2 = t280.995 which from Table B.2 is 2.763 The 99% confidence interval is ( X1 X 2 ) ± t*Sp 1 1 15 15 = 2.80 ± 2.78 or the 99% confidence interval is (0.02, 5.58) = (15.1 – 12.3) ± 2.763*2.76 Solution (b): We wish to test Ho: u1 = u2 versus Ha: u1 ≠ u2. We reject Ho if the p-value for the test statistic is “too small”, i.e. less than alpha, where test statistic is found by: ( X X 2 ) (1 2 ) T 1 and under Ho (1 2 ) = 0. 1 1 Sp n1 n 2 = 2.80 = 2.78 1 1 2.76 15 15 and from Table B.2 with t28 2.78 lies between 2.763 and 3.047. Since Ha is two-sided, the p-value is twice the probability associated with these two boundaries. Thus the pvalue is: 0.005 < p < 0.01. So we reject Ho and conclude that there is a difference between the two means at the 1% level. Alternately, since the 99% confidence interval does not contain zero we could reach the same conclusion. 3