UCLA STAT 233 Statistical Methods in Biomedical Imaging zInstructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/ Stat 233, UCLA, Ivo Dinov Slide 1 General Statistical Techniques zIntro to stats, vocabulary zDisplaying data zCentral tendency and variability zNormal z-scores, standardized distribution zProbability, Samples & Sampling error zType I and Type II errors; Power of a test zIntro to hypothesis testing zOne sample tests & Two independent samples tests zTwo sample tests - dependent samples & Estimation zCorrelation and regression techniques zNon-parametric statistical tests Stat 233, UCLA, Ivo Dinov General Statistical Techniques zApplications of Central Limit Theorem, Slide 2 Errors in Samples … z Selection bias: Sampled population is not a representative subgroup of the population really investigated. Law of Large Numbers. z Non-response bias: If a particular subgroup of the population studied zDesign of studies and experiments. does not respond, the resulting responses may be skewed. zFisher's F-Test & Analysis Of Variance (ANOVA, 1- or 2-way). z Question effects: Survey questions may be slanted or loaded to influence the result of the sampling. z Is quota sampling reliable? Each interviewer is assigned a fixed quota zPrinciple Component Analysis (PCA). of subjects (subjects district, sex, age, income exactly specified, so investigator can select those people as they liked). zχ2 (Chi-Square) Goodness-of-fit test. z Target population –entire group of individuals, objects, units we study. zMultiple linear regression z Study population –a subset of the target population containing zGeneral Linear Model z Sampling protocol – procedure used to select the sample z Sample – the subset of “units” about which we actually collect info. all “units” which could possibly be used in the study. zBootstrapping and Resampling Stat 233, UCLA, Ivo Dinov Slide 3 Slide 4 More terminology … z Census – attempt to sample the entire population z Parameter – numerical characteristic of the population, e.g., income, age, etc. Often we want to estimate population parameters. z Statistic – a numerical characteristic of the sample. (Sample) statistic is used to estimate a corresponding population parameter. z Why do we sample at random? We draw “units” from the study population at random to avoid bias. Every subject in the study sample is equally likely to be selected. Also randomsampling allows us to calculate the likely size of the error in our sample estimates. Slide 5 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Terminology … z Random allocation – randomly assigning treatments to units, leads to representative sample only if we have large # experimental units. z Completely randomized design- the simplest experimental design, allows comparisons that are unbiased (not necessarily fair). Randomly allocate treatments to all experimental units, so that every treatment is applied to the same number of units. E.g., If we have 12 units and 3 treatments, and we study treatment efficacy, we randomly assign each of the 3 treatments to 4 units exactly. z Blocking- grouping units into blocks of similar units for making treatment-effect comparisons only within individual groups. E.g., Study of human life expectancy perhaps income is clearly a factor, we can have high- and low-income blocks and compare, say, gender differences within these blocks separately. Slide 6 Stat 233, UCLA, Ivo Dinov 1 Experiments vs. observational studies Terminology … for comparing the effects of treatments z Why should we try to “blind” the investigator in an experiment? z In an Experiment z Why should we try to “blind” human experimental subjects? z Observational study – useful when can’t design a experimenter determines which units receive which treatments. (ideally using some form of random allocation) controlled randomized study compare units that happen to have received each of the treatments Ideal for describing relationships between different characteristics in a population. often useful for identifying possible causes of effects, but cannot reliably establish causation. z The basic rule of experimentor : “Block what you can and randomize what you cannot.” z Only properly designed and executed experiments can reliably demonstrate causation. Slide 7 Slide 8 Stat 233, UCLA, Ivo Dinov Types of variable Distinguishing between types of variable z Quantitative variables are measurements and counts Types of Variables Variables with few repeated values are treated as continuous. Variables with many repeated values are treated as discrete Stat 233, UCLA, Ivo Dinov Quantitative Qualitative (measurements and counts) Continuous Discrete (define groups) (few repeated values) (many repeated values) Categorical (no idea of order) Ordinal (fall in natural order) z Qualitative variables (a.k.a. factors or classvariables) describe group membership Slide 9 Slide 10 Stat 233, UCLA, Ivo Dinov Histograms vs. Stem-and-leaf plots … Stat 233, UCLA, Ivo Dinov Interpreting Stem-plots and Histograms z What advantages does a stem-and-leaf plot have over a histogram? (S&L Plots return info on individual values, quick to produce by hand, provide data sorting mechanisms. But, histograms are more attractive and more understandable). z The shape of a histogram can be quite drastically altered by choosing different class-interval boundaries. What type of plot does not have this problem? (density trace) What other factor affects the shape of a histogram? (bin-size) z What was another reason given for plotting data on a variable, apart from interest in how the data on that variable behaves? (shows features, cluster/gaps, outliers; as well as trends) Slide 11 Stat 233, UCLA, Ivo Dinov (a) Unimodal (b) Bimodal (c) Trimodal 1 2 − 2 e x (d) Symmetric 1 (e) Positively skewed (long upper tail) e −| x| ( x2 −1) 2 (g) Symmetric (f) Negatively skewed (long lower tail) (h) Bimodal with gap Slide 12 (i) Exponential shape Stat 233, UCLA, Ivo Dinov 2 Skewness & Kurtosis Descriptive statistics from computer programs like STATA z What do we mean by symmetry and positive and negative skewness? Kurtosis? Properties?!? N ∑ (Y Skewness = −Y ) N 3 k k =1 ( N − 1) SD 3 ∑ (Y Kurtosis = ; −Y ) 4 k k =1 ( N − 1) SD Minitab output STATA Output Standard deviation 4 Descriptive Statistics z Skewness is linearly invariant Sk(aX+b)=Sk(X) Variable age Variable age z Skewness is a measure of unsymmetry N 45 Minimum 36.000 Mean 50.133 Maximum 59.000 Median 51.000 Q1 46.500 z Kurtosis is (also linearly invariant) a measure of flatness TrMean 50.366 Q3 56.000 Lower quartile StDev 6.092 SE Mean 0.908 Upper quartile z Both are used to quantify departures from StdNormal z Skewness(StdNorm)=0; Kurtosis(StdNorm)=3 Slide 13 Slide 14 Stat 233, UCLA, Ivo Dinov Box plot compared to dot plot Q1 Construction of a box plot Q 1 Med Q3 Median Stat 233, UCLA, Ivo Dinov Q3 Data Box plot 1.5 IQR 1.5 IQR Dot plot (pull back until hit observation) 50 100 150 (pull back until hit observation) 200 SYSVOL Figure 2.4.3 Box plot for SYSVOL. Scale Figure 2.4.4 Slide 15 Slide 16 Stat 233, UCLA, Ivo Dinov TABLE 2.5.1 Word Lengths for the First 100 Words on a Randomly Chosen Page 2 3 6 5 7 2 4 9 9 5 4 5 6 2 10 4 2 3 3 3 4 9 2 7 2 3 5 3 11 3 9 8 4 2 9 9 3 4 3 4 3 2 4 6 5 6 4 2 4 5 2 5 2 4 4 3 2 4 7 4 Stat 233, UCLA, Ivo Dinov Describing data with pictures and two numbers Frequency Table 3 2 3 2 7 Construction of a box plot. 2 4 2 6 3 3 1 3 6 5 4 4 7 10 2 6 2 4 4 5 5 5 2 3 2 3 2 6 5 4 4 5 4 7 2 • • Random Number generation: frequency histogram Descriptive statistics - Central tendency (Mode, Median, Mean) - Variability (Variance, Standard deviation) Frequency Table Value u j Frequency f j 1 1 2 22 3 18 4 22 Slide 17 5 6 7 8 9 13 8 6 1 6 10 2 Stat 233, UCLA, Ivo Dinov 11 1 Slide 18 Stat 233, UCLA, Ivo Dinov 3 Histogram of random numbers generated Central tendency: the middle in a single number Notes: 2. Sacrifice some information (within intervals) to gain this perspective 8 Frequency • Mode: The most frequent score in the distribution. • Median: The centermost score if there are an odd 1: Histogram allows you to see distribution -- estimate middle, spread. 10 number of scores or the average of the two centermost scores if there are an even number of scores. 3. How can we further reduce this information to a single number for middle and another for spread? 6 4 • Mean: The sum of the scores divided by the number of scores. 2 0 0-19 2039 4059 6079 80- 100- 120- 140- 160- 180- 200- 220- 240- 260- 280- 30099 119 139 159 179 199 219 239 259 279 299 319 Random Number (interval) Slide 19 Slide 20 Stat 233, UCLA, Ivo Dinov Neat things about the mean Five number summary The five-number summery = (Min, Q1, Med, Q3, Max) Slide 21 • Conceptual SS: SS: Σ(x-x)2 Computational Σ x2 - (Σx)2/N “Sum of each X’s squared difference from the mean” Xi (Xi-X) X1 X2 X3 S um=> Mea n=> 1 2 3 6 2 “Sum of the Xs after they have been squared minus the sum of all the Xs, which is then squared and then divided by the total number of Xs” 2 1 0 1 2 SS = Σ x2 - (Σx)2/N = (12 + 22 + 32) - [(1+2+3)2/3] = (1+4+9) - (62/3) = 14 - 36/3 = 14- 12 =2 Slide 23 • If you add/subtract a constant to/from each score, you change the mean by adding/subtracting the constant to/from it. • If you multiply/divide each score by a constant you change the mean by multiplying/dividing it by the constant. • • • Summed deviations from the mean = 0, or Σ(xi-x) = 0 Stat 233, UCLA, Ivo Dinov Very sensitive to extreme scores (outliers). Sum of squared deviations from the mean (SS) is minimized. Σ(xi-x)2 = minimum Σx2 - (Σx)2/N = minimum Slide 22 Stat 233, UCLA, Ivo Dinov Conceptual formula vs. computational formula for SS Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Questions about measures of central tendency? • Why is the mean our preferred measure of central tendency? - Adjusted for the number of scores. - Sum of squared deviations from the mean (SS) is minimized. SS is the square of the sum of each score’s difference from the mean. Σ(x-x)2 Σx2 - (Σx)2/N = minimum Takes into account the numerical “weight” of each score. As scores of greater magnitude are added, the mean increases As scores of lesser magnitude are added, the mean decreases Slide 24 Stat 233, UCLA, Ivo Dinov 4 Variability Variability • • • • How do these distributions differ? Also interested in its spread (or variability). Frequency histogram for women in 214 How can we describe variability with a single number? 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Another Frequency histogram X = 64.3 Frequency Define distributions by: - Central tendency - Variability Frequency • Not only interested in a distribution’s middle. 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 X = 64.3 60 60 61 62 63 64 65 66 67 68 69 70 61 Slide 25 Slide 26 Stat 233, UCLA, Ivo Dinov 62 63 64 65 66 67 68 69 70 Inches Inches Stat 233, UCLA, Ivo Dinov One is more spread out (greater variability) than another. Describe variability around the mean with one number. • • Want to adjust for the number of scores. • Take into account the numerical “weight” of each score. - As scores are farther from the mean, the index of variability Spread out around what? Another Frequency histogram X = 64.3 Frequency Frequency Frequency histogram for women in 214 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 60 61 62 63 64 65 66 67 68 69 70 Inches 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 should increase - As scores are closer to the mean, the index of variability X = 64.3 should decrease • Suppose we measured each score’s distance from its mean, and then scores. 60 61 62 63 64 65 66 67 68 - Measuring the distance from the mean should tell us how 69 70 spread out each score is relative to the mean. Inches Slide 27 Slide 28 Stat 233, UCLA, Ivo Dinov Generally Generally • • Population variance (sigma squared, or σ2) is the average of the squared deviations from the mean: Σ(x- µ)2 N • Note: also written SS/N or Σ x2 - (Σx)2/N N Population standard deviation (sigma, or σ) is the square root of the average of the squared deviations from the mean: Σ(x- µ)2 N used the average distance as our measure? - Using the average distance will adjust for the number of Note: also written SS/N or Slide 29 Σ x2 - (Σx)2/N N Stat 233, UCLA, Ivo Dinov Sample variance (or s2) is the (corrected) average of the squared deviations from the mean: Σ(x-x)2 N-1 • Stat 233, UCLA, Ivo Dinov Note: also written SS/N-1or Σ x2 - (Σx)2/N N-1 Sample standard deviation (or s) is the square root of the (corrected) average of the squared deviations from the mean: Σ(x-x)2 Note: also written N-1 SS/N-1or Slide 30 Σ x2 - (Σx)2/N N-1 Stat 233, UCLA, Ivo Dinov 5 Effects of µ and σ The Normal distribution density curve (a) Changing z Is symmetric about the mean! Bell-shaped and unimodal. Mean is a measure of … central tendency shifts the curve along the axis z Mean = Median! 1 = 2= 6 (b) Increasing increases the spread and flattens the curve N(µ, 2.2σ) 140 160 1 50% 50% 180 = 160 Standard deviation is a measure of … Mean variability/spread Slide 31 200 1 =6 2 =174 2= 140 160 Slide 32 Stat 233, UCLA, Ivo Dinov 180 1= 12 200 2 =170 Stat 233, UCLA, Ivo Dinov Chebyshev’s Theorem Understanding the standard deviation: σ z Applies to all distributions Probabilities/areas and numbers of deviations standard deviations (c) Probabilities and numbers of standard z Pafnuty Chebyshev (Пафнутий Чебышёв) (1821 - 1894). AKA Chebyshov, Tchebycheff or Tschebyscheff. for the Normal distribution Shaded area = 0.683 Shaded area = 0.954 Shaded area = 0.997 P(µ − kσ < X < µ + kσ ) ≥ 1 − − + 68% chance of falling between − and + −2 +2 95% chance of falling between − 2 and +2 Slide 33 −3 z Gives a lower bound for the probability that a value of a random variable with finite variance lies within a certain distance from the variable's mean; equivalently, the theorem provides an upper bound for the probability that values lie outside the same distance from the mean. The theorem applies even to non "bell-shaped" distributions and puts bounds on how much of the data is or is not "in the middle". z Let X be a random variable with mean µ and finite variance σ2. Now, for any real number k > 0, X −µ 1 ⇔P ≥ k ≤ 2 k2 σ k 1 z Only the cases k > 1 provide useful information. Why? Slide 35 for k > 1 Slide 34 Stat 233, UCLA, Ivo Dinov Chebyshev’s Theorem P( X − µ < kσ ) ≥1− +3 99.7% chance of falling between − 3 and +3 Stat 233, UCLA, Ivo Dinov 1 2 k Stat 233, UCLA, Ivo Dinov Markov’s & Chebyshev’s Inequalities z Markov's inequality: (Markov was a student of Chebyshev) E (Y ) d Since, if X = d , if Y ≥ d , Note Y ≥ 0, ⇒ X ≥ 0 0, otherwise E (Y ) ≥ E ( X ) ≥ d × P{Y ≥ d } Then : _________________________________________ 2 Let Y = X – E(X) and d = k 2 with k > 0 ⇒ E | X – E(X) |2 ⇒ P(Y ≥ d ) = P | X – E(X) |2 ≥ k 2 ≤ k2 2 Var(X) σ 1 = ⇒ P(| X - E(X) |≥ k × σ ) ≤ P(| X - E(X) |≥ k ) ≤ k2 k2 k2 Let k’=k/σ Î k=k’σ If Y ≥ 0 & d > 0 ⇒ P(Y ≥ s ) ≤ { ( ) Slide 36 ( ) Stat 233, UCLA, Ivo Dinov 6 Chebyshev’s Theorem Coefficient of Variation z Ratio of the standard deviation to the mean, expressed as a percentage z Applies to all distributions Number of Standard Deviations Minimum Proportion of Values Falling Within Distance Distance from the Mean µ ± 2σ K=2 1-1/22 = 0.75 µ ± 3σ K=3 C.V .= 1-1/32 = 0.89 µ ± 4σ K=4 z Measurement of relative dispersion σ (100) µ 1-1/42 = 0.94 Slide 37 Slide 38 Stat 233, UCLA, Ivo Dinov Coefficient of Variation – an example The inverse problem – Percentiles/quantiles - SOCR (b) 80th percentile (0.8-quantile) of women’s heights (a) p-Quantile µ = 29 µ = 84 1 σ 1 σ = 4.6 σ CV . .=µ 1 Programs supply x p x-value for which pr(X 2 1 (100) 2 = 10 σ CV . .=µ 2 1 4.6 (100) 29 = 1586 . 2 (100) 2 10 (100) 84 . = 1190 = = Stat 233, UCLA, Ivo Dinov people have height below the 80th percentile. prob = p This is EQ to =162.7 x = ?? saying there’s 0.8 xp = ?? Program returns 167.9. 80% chance that a Thus 80% lie below 167.9. random (c) Further percentiles of women’s heights observation from Percent 1% 5% 10% 20% 30% 70% 80% 90% 95% 99% the distribution Propn 0.01 0.05 0.1 0.2 0.3 0.7 0.8 0.9 0.95 0.99 will fall below the Percentile (or quantile) 148.3 152.5 154.8 157.5 159.4 166.0 167.9 170.6 172.9 177.1 (cm) 80th percentile. 5'0" 5'2" 5'2" 5'5" 5'6" 5'7" 5'8” 5'9" (ft'in") 4'10" 5'0" x p) = p Normal( = 162.7, 80% = 6.2)of prob = 0.8 3/8" 3/8" 1/8" 1/8" 3/4" frac) The(+inverse problem is7/8" what is the3/4" height for1/8" the 80th percentile/quantile? So far we studied given the height value what’s the corresponding percentile? Slide 39 Stat 233, UCLA, Ivo Dinov Slide 40 Stat 233, UCLA, Ivo Dinov General Normal Curve Standard Normal Approximation z The standard normal curve can be used to estimate the percentage of entries in an interval for any process. Here is the protocol for this approximation: Convert the interval (we need the assess the percentage of entries in) to standard units. We saw the algorithm already. Find the corresponding area under the normal curve (from tables or online databases); z The general normal curve is defined by: Where µ is the average of (the symmetric) normal curve, and σ is the standard deviation (spread of the distribution). − y= Why worry about a standard and general normal curves? How to convert between the two curves? ( x − µ )2 2 2 e σ 2πσ 2 Compute % Transform to Std.Units Data What percentage of the density scale histogram is shown on this graph? 12 18 22 Report back % Slide 41 Stat 233, UCLA, Ivo Dinov Slide 42 Stat 233, UCLA, Ivo Dinov 7 Continuous Variables and Density Curves Summary z There are no gaps between the values a continuous random variable can take. QuincunxApplet.html z Random observations arise in two main ways: (i) by sampling populations; and (ii) by observing processes. Show Sampling Distribution Simulation Applet file:///C:/Ivo.dir/UCLA_Classes/Winter2002/AdditionalInstructorAids/ SamplingDistributionApplet.html Stat 233, UCLA, Ivo Dinov Slide 43 Slide 44 The density curve Stat 233, UCLA, Ivo Dinov For any random variable X z The probability distribution of a continuous variable is represented by a density curve. Probabilities are represented by areas under the curve, the probability that a random observation falls between a and b equal to the area under the density curve between a and b. z E(aX +b) = a E(X) +b and SD(aX +b) = | a | SD(X) The total area under the curve equals 1. The population (or distribution) mean µX = E(X), is where the density curve balances. When we calculate probabilities for a continuous random variable, it does not matter whether interval endpoints are included or excluded. Slide 45 Standard Units Stat 233, UCLA, Ivo Dinov Ranges, extremes and z-scores The z-score of a value a is …. Central ranges: z the number of standard deviations a is away from the mean z positive if a is above the mean and negative if a is below the mean. The standard Normal distribution has µ = 0 and σ = 0. z We usually use Z to represent a random variable with a standard Normal distribution. Slide 47 Slide 46 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov P(-z ≤ Z ≤ z) is the same as the probability that a random observation from an arbitrary Normal distribution falls within z SD's either side of the mean. Extremes: P(Z ≥ z) is the same as the probability that a random observation from an arbitrary Normal distribution falls more than z standard deviations above the mean. P(Z ≤ -z) is the same as the probability that a random observation from an arbitrary Normal distribution falls more than z standard deviations below the mean. Slide 48 Stat 233, UCLA, Ivo Dinov 8 Combining Random Quantities Independence We model variables as being independent …. Variation and independence: z No two animals, organisms, natural or man-made objects are ever identical. z There is always variation. The only question is whether it is large enough to have a practical impact on what you are trying to achieve. z Variation in component parts leads to even greater variation in the whole. Slide 49 z if we think they relate to physically independent processes z and if we have no data that suggests they are related. Both sums and differences of independent random variables are more variable than any of the component random variables Slide 50 Stat 233, UCLA, Ivo Dinov Formulas Stat 233, UCLA, Ivo Dinov Example z For a constant number a, E(aX) = aE(X) and SD(aX) = |a| SD(X). z For constant numbers a & b, E(aX+b) = aE(X)+b and SD(aX+b) = |a| SD(X). z Means of sums and differences of random variables act in an obvious way z For independent random variables the mean of the sum is the sum of the means the mean of the difference is the difference in the means z For independent random variables, (cf. Pythagorean theorem), SD( X + X ) = SD( X − X ) = SD ( X ) + SD( X ) 2 1 2 1 2 1 2 2 E( X + X ) = E ( X ) + E ( X ) 1 2 1 2 [ASIDE: Sums and differences of independent Normally distributed random variables are also Normally distributed] Slide 51 2 1 2 1 z The mean rest-state fMRI response for one subject in the left OFC is 125+/-12. Signals exceeding 150 are thought to indicate imaging effects of cognitive stimulation. Approximately what proportion of the OFC regions would be expected to light up (exceed 150) on average? About what percentage of all voxels will have mean intensities between 111 and 130? Assumptions? Stat 233, UCLA, Ivo Dinov 2 1 2 2 E( X + X ) = E ( X ) + E ( X ) 1 2 1 2 z For Dependent variables: Ex. E(X)=1, SD(X)=3 Y = 2X-1 Î E(Y)=1 and SD(Y) = 6 SD(X+Y)=SD(3X-1) = 9, NOT 7== SD(X+Y) = Sqrt(SD2 (X)+SD2(Y)). Defense vs. prosecution argument may be different for an X+Y value of 18, say. Slide 52 Stat 233, UCLA, Ivo Dinov Areas under Standard Normal Curve – Normal Approximation Slide 53 SD( X + X ) = SD( X − X ) = SD ( X ) + SD( X ) Stat 233, UCLA, Ivo Dinov Conditional Probability The conditional probability of A occurring given that B occurs is given by pr(A | B) = pr(A and B) pr(B) Suppose we select one out of the 400 patients in the study and we want to find the probability that the cancer is on the extremities given that it is of type nodular: P = 73/125 = P(C. on Extremities | Nodular) # nodular patients with cancer on extremitie s # nodular patients Slide 54 Stat 233, UCLA, Ivo Dinov 9 Review Melanoma – type of skin cancer – an example of laws of conditional probabilities pr( A and B) = pr( A | B)pr( B) = pr( B | A)pr( A) TABLE 4.6.1: 400 Melanoma Patients by Type and S ite S ite Head and Neck Type Hutchinson's melanomic freckle Superficial Nodular Indeterminant Column Totals 22 16 19 11 68 Trunk Extremities Row Totals 2 54 33 17 106 10 115 73 28 226 34 185 125 56 400 pr(A) = 1 - pr(A) 1. Proportions (partial description of a real population) and probabilities (giving the chance of something happening in a random experiment) may be identical – under the experiment choose-aunit-at-random 2. Properties of probabilities. { p } define probabilities ⇔ p ≥ 0; N k k =1 ∑ k k p =1 k Contingency table based on Melanoma histological type and its location Slide 55 Slide 56 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov TABLE 4.6.5 Number of Individuals Conditional probabilities and 2-way tables z Many problems involving conditional probabilities can be solved by constructing two-way tables z This includes reversing the order of conditioning Having a Given Mean Absorbance Ratio (MAR) in the ELISA for HIV Antibodies MAR Healthy Donor <2 2 - 2.99 202 73 3 4 5 6 15 3 2 2 0 297 P(A & B) = P(A | B) x P(B) = P(B | A) x P(A) - 3.99 - 4.99 - 5.99 -11.99 12+ Total HIV patients } 275 0 Test cut-off 2 Falsepositives } 2 FalseNegatives 7 7 15 36 21 88 (FNE) Power of a test is: 1-P(FNE)= 1-P(Neg|HIV) ~ 0.976 Adapted from Weiss et al.[1985] Slide 57 Slide 58 Stat 233, UCLA, Ivo Dinov Sensitivity vs. Specificity of a Test 4 Factors affecting the power z An ELISA is developed to diagnose HIV infections. Serum from 10,000 patients that were positive by Western Blot (the gold standard assay) were tested and 9990 were found to be positive by the new ELISA. The manufacturers then used the ELISA to test serum from 10,000 nuns who denied risk factors for HIV infection. 9990 were negative and the 10 positive results were negative by Western Blot. HIV Infected (True Case) ELISA Test + - + 9990 (TP) - 10 (FN, β ) 10 (FP, α) 9990 (TN) 10,000 (TP+FN) 10,000 (FP+TN) Sensitivity = TP/(TP+FN) 9990/(9990+10) = 0.999 Specificity= TN/(FP+TN) 9990/(9990+10) = 0.999 Slide 59 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Larger: Î Causes: Sample size (positive) Sample variance (negative) Effect size (positive) The chosen level for α (positive) Slide 60 Stat 233, UCLA, Ivo Dinov 10 Comments Hypotheses z Why can't we (rule-in) prove that a hypothesized value of a parameter is exactly true? (Because when constructing estimates Guiding principles We cannot rule in a hypothesized value for a parameter, we can only determine whether there is evidence to rule out a hypothesized value. The null hypothesis tested is typically a skeptical reaction to a research hypothesis based on data, there’s always sampling and may be non-sampling errors, which are normal, and will effect the resulting estimate. Even if we do 60,000 ESP tests, as we saw earlier, repeatedly we are likely to get estimates like 0.2 and 0.200001, and 0.199999, etc. – non of which may be exactly the theoretically correct, 0.2.) z Why use the rule-out principle? (Since, we can’t use the rule-in method, we try to find compelling evidence against the observed/dataconstructed estimate – to reject it.) z Why is the null hypothesis & significance testing typically used? (Ho: skeptical reaction to a research hypothesis; ST is used to check if differences or effects seen in the data can be explained simply in terms of sampling variation!) Slide 61 Slide 62 Stat 233, UCLA, Ivo Dinov The t-test STEP 1 The t-test Using θˆ to test H 0: θ = θ 0 versus some alternative H 1. Calculate the test statistic , θˆ − θ t0 = ˆ 0 = se(θ ) estimate - hypothesized value standard error [This tells us ho w m a ny s ta ndard erro rs the es tim ate is a bo ve the hypo the s ize d value (t 0 po s itive ) o r belo w the hypo the s ized va lue (t 0 nega tive).] STEP 2 Calculate the P -value using the following table. STEP 3 Interpret the P -value in the context of the data. Slide 63 Stat 233, UCLA, Ivo Dinov Alternative hypothesis H 1: θ > θ0 H 1: θ < θ0 H 1: θ ≠ θ0 Evidence against H0: θ > θ 0 provided by θˆ too much bigger than θ0 (i.e., θˆ - θ0 too large) θˆ too much smaller than θ0 (i.e., θˆ - θ0 too negative) θˆ too far from θ0 (i.e., |θˆ - θ0| too large) Slide 64 Stat 233, UCLA, Ivo Dinov Comparing Proportions – matching groups z Samples are large enough to use Normal-approx. Since the two proportions come from totally diff. mothers they are independent Æ Estimate - HypothesizedValue = SE pˆ1 − pˆ 2 − 0 pˆ1 − pˆ 2 = = pˆ1(1 − pˆ1) pˆ 2 (1 − pˆ 2 ) SE ( pˆ1 − pˆ 2 ) + n1 n2 t0 = P -value P = pr(T≥ t 0) P = pr(T ≤ t 0) P = 2 pr(T≥ |t 0|) where T ~ Student(df ) Stat 233, UCLA, Ivo Dinov Statistical vs. Practical Significance – CI’s z Computed by: estimate ± z × SE = pˆ1 − pˆ 2 ± 1.96 × SE ( pˆ1 − pˆ 2 ) = pˆ (1 − pˆ1) pˆ 2 (1 − pˆ 2 ) pˆ1 − pˆ 2 ± 1.96 × 1 + n1 n2 P − value = Pr(T ≥ t 0 ) Slide 65 Stat 233, UCLA, Ivo Dinov Slide 66 Stat 233, UCLA, Ivo Dinov 11 Analysis of two independent samples Urinary androsterone levels cont. Urinary androsterone levels – data, dot-plots and 95% CI. Relations between hormonal levels and homosexuality, Margolese, 1970. Hormonal levels are lower for homosexuals. Samples are independent, as unrelated. Results, P-value of t-test 0.004 with a CI (µHet-µHom)=[0.4:1.7]. Normal hypothesis satisfied?Skewed? Two Sample T-Test and Confidence Interval Homosexuals Two sample T for androsterone N Mean StDev SE Mean Confidence interval hetero Heterosexuals 11 3.518 0.721 0.22 homose 15 2.500 0.923 0.24 95% CI for mu (hetero) - mu ( 0.35, 1.69) 5 1 2 (homose): 3 4 T-Test mu (hetero) = mu (homose) (vs not=): Androsterone (mg/24 hrs) T=3.16 P=0.0044 DF=23 TABLE 10.2.1 Urinary Androsterone Levels(mg/24 hr) Homosexual: 2.5, 1.8, Heterosexual: 3.9, 1.6, 2.2, 4.0, 3.9, 3.1, 3.8, 3.4, 1.3 3.9, 2.3, 1.6, 2.5, 3.4, 1.6, 4.3, 2.0, 2.9, 3.2, 4.6, 4.3, 3.1, 2.7, 2.3 Figure 10.2.1 Dot plots of the androsterone data (with 95% CIs). t-test statistic P-value Figure 10.2.3 Minitab 2-sample t-output for the androstenone data From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000. S o urc e : M a rgo le s e [1970]. Homosexuals Heterosexuals 1 2 3 Androsterone (mg/24 hrs) Slide 67 4 5 Slide 68 Stat 233, UCLA, Ivo Dinov Comparing two means for independent samples Suppose we have 2 samples/means/distributions as follows: { x , N (µ , σ ) } and { x , N ( µ , σ ) }. We’ve 1 1 1 2 2 2 seen before that to make inference about µ1 − µ 2 we µ µ − = 0 with t = ( x − x ) − 0 can use a T-test for H0: 1 And CI(µ − µ ) 1 2 1 2 0 = x − x ± t × SE ( x − x ) 1 2 1 2 2 SE ( x − x ) 1 2 If the 2 samples are independent we use the SE formula SE = s 2 / n + s 2 / n with df = Min( n − 1; n − 1. ) 1 1 2 2 1 2 This gives a conservative approach for hand calculation of an approximation to the what is known as the Welch procedure, which has a complicated exact formula. Slide 69 Stat 233, UCLA, Ivo Dinov Comparing two means for independent samples 1. How sensitive is the two-sample t-test to non-Normality in the data? (The 2-sample T-tests and CI’s are even more robust than the 1-sample tests, against nonNormality, particularly when the shapes of the 2 distributions are similar and n1=n2=n, even for small n, remember df= n1+n2-2. 3. Are there nonparametric alternatives to the two-sample t-test? (Wilcoxon rank-sum-test, Mann-Witney test, equivalent tests, same Pvalues.) 4. What difference is there between the quantities tested and estimated by the two-sample t-procedures and the nonparametric equivalent? (Non-parametric tests are based on Stat 233, UCLA, Ivo Dinov Means for independent samples – equal or unequal variances? Pooled T-test is used for samples with assumed equal variances. Under data Normal assumptions and equal variances of ( x1 − x 2 − 0) / SE ( x1 − x 2 ), where (n1 − 1) s12 + ( n2 − 1) s22 n1 + n2 − 2 df = (n + n − 2) is exactly Student’s t distributed with 1 2 SE = sp 1 / n1 + 1 / n2 ; s p = Here sp is called the pooled estimate of the variance, since it pools info from the 2 samples to form a combined estimate of the single variance σ12= σ22 =σ2. The book recommends routine use of the Welch unequal variance method. Slide 70 Stat 233, UCLA, Ivo Dinov Paired Comparisons z An fMRI study of N subjects: The point in the time course of maximal activation in the rostral and caudal medial premotor cortex was identified, and the percentage changes in response to the go and no-go tasks from the rest state measured. Similarly the points of maximal activity during the go and no-go task were identified in the primary motor cortex. Paired t-test comparisons between the go and no-go percentage changes were performed across subjects for these regions of maximum activity. ordering, not size, of the data and hence use median, not~mean, for the average. The equality of 2 means is tested and CI(µ1 - µ1~). Slide 71 Stat 233, UCLA, Ivo Dinov Slide 72 Stat 233, UCLA, Ivo Dinov 12 2-sample t-tests and intervals for differences Paired data between means µ1−µ2 z We have to distinguish between independent and related samples because they require different methods of analysis. Assume statistically independent random samples from the two populations of interest both samples come from Normal distributions z Paired data is an example of related data. Pooled method also assumes that σ1=σ2 Welch method (unpooled) does not Two-sample t-methods are z With paired data, we analyze the differences this converts the initial problem into a one-sample problem. remarkably robust against non-Normality can be sensitive to the presence of outliers in small to moderatesized samples One-sided tests are reasonably sensitive to skewness. z The sign test and Wilcoxon rank-sum test are nonparametric alternatives to the one-sample or paired t-test, respectively. Slide 73 The Wilcoxon or Mann-Whitney test is a nonparametric alternative to the two-sample t-test. Slide 74 Stat 233, UCLA, Ivo Dinov We know how to analyze 1 & 2 sample data. How about if we have than 2 samples – One-way ANOVA, F-test One-way ANOVA refers to the situation of having one factor (or categorical variable) which defines group membership – e.g., comparing 4 fMRI stimuli in a block design. Hypotheses for the one-way analysis-of-variance F-test Null hypothesis: All of the underlying true means are identical. Alternative: Differences exist between some of the true means. Slide 75 S ource Between Within Total a df squares i i ∑ (n − 1)s i ∑ ∑(x ij of S quares 2 2 i − x ..) 2 z A large P-value indicates that the differences seen between the sample means could be explained simply in terms of sampling variation. z A small P-value indicates evidence that real differences exist between at least some of the true means, but gives no indication of where the differences are or how big they are. z To find out how big any differences are we need confidence intervals. 2 B k -1 s n tot - k sW2 a Stat 233, UCLA, Ivo Dinov Where did the F-statistics came from? Mean sum ∑ n (x . − x ..) (The null hypothesis is that all underlying true means are identical.) Slide 76 Typical Analysis-of-Variance Table for One-Way ANOVA S um of Interpreting the P-value from the F-test Stat 233, UCLA, Ivo Dinov Form of a typical ANOVA table Stat 233, UCLA, Ivo Dinov F -statistic P -value f 0 = s 2B / s2W pr(F≥ f 0) n tot - 1 z Let’s look at this example comparing groups. How do we obtain intuitive evidence against H0? Far separated sample means + differences of sample means are large compared to their internal (within) variability! Which of the following examples indicate group diff’s are “large”? Example 1 Gp 1 Gp 2 Gp 3 Example 2 Gp 1 Gp 2 Gp 3 Example 3 Gp 1 Gp 2 Gp 3 M ean sum of squares = (sum of squares)/df z The F-test statistic, f0, applies when we have independent samples each from k Normal populations, N(µi, σ), note same variance is assumed. Slide 77 Stat 233, UCLA, Ivo Dinov Slide 78 Stat 233, UCLA, Ivo Dinov 13 F-test assumptions More about the F-test z s2B is a measure of variability of sample means, how far apart they are. z s2W reflects the avg. internal Variability within the samples. 2 ∑ ni ( xi. − x..) 2 .. s = B k −1 ∑ (ni − 1) si 2 2 s = .. W n −k tot z The F-test statistic, f0, tests H0 by comparing the variability of the sample means (numerator) with the variability within the samples (denominator). z Evidence against H0 is provided by values of f0 which would be unusually large if H0 was true. Slide 79 1. Samples are independent, physically independent subjects, units, objects are being studies. 2. Sample Normal distributions, especially sensitive for small ni, number of observations, N(µi, σ). 3. Standard deviations should be equal within all samples, σ1= σ2= σ3=… σn = σ. (1/2 <= σk/σj<=2) k How to check/validate these assumptions for your data? For the reading-score improvement data: independence is clear since different groups of students are used. Dot-plots of group data show no evidence of non-Normality. Sample SD’s are very similar, hence we assume population SD’s are similar. Slide 80 Stat 233, UCLA, Ivo Dinov F-test assumptions 1. Whether the two samples are independent of each other is generally determined by the structure of the experiment from which they arise. Obviously correlated samples, such as a set of pre- and post-test observations on the same subjects, are not independent, and such data would be more appropriately tested by a one-sample test on the paired differences. 2. Two random variables are independent if their joint probability density is the product of their individual (marginal) probability densities. Less technically, if two random variables A and B are independent, then the probability of any given value of A is unchanged by knowledge of the value of B. A sample of mutually independent random variables is an independent sample. Slide 81 Approaches for modeling data relationships Regression and Correlation zThere are random and nonrandom variables zCorrelation applies if both variables (X/Y) are random (e.g., We saw a previous example, systolic vs. diastolic blood pressure SISVOL/DIAVOL) and are treated symmetrically. zRegression applies in the case when you want to single out one of the variables (response variable, Y) and use the other variable as predictor (explanatory variable, X), which explains the behavior of the response variable, Y. Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov 9000 10000 11000 9000 12000 Disposable income ($) 10000 11000 12000 Correlation coefficient (-1<=R<=1): a measure of linear association, or clustering around a line of multivariate data. Relationship between two variables (X, Y) can be summarized by: (µX, σX), (µY, σY) and the correlation coefficient, R. R=1, perfect positive correlation (straight line relationship), R =0, no correlation (random cloud scatter), R = –1, perfect negative correlation. Computing R(X,Y): (standardize, multiply, average) Disposable income ($) z Regression is a way of studying relationships between variables (random/nonrandom) for predicting or explaining behavior of 1 variable (response) in terms of others (explanatory variables or predictors). From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999. Slide 83 Stat 233, UCLA, Ivo Dinov Slide 82 Correlation Coefficient Regression relationship = trend + residual scatter (a) Sales/income Stat 233, UCLA, Ivo Dinov R( X , Y ) = N x − µ y − µ X={x1, x2,…, xN,} 1 ∑ Y={y , y ,…, y ,} N − 1 k = 1 σ σ (µ , σ1 ),2(µ , σN ) k x x k y y X X Y Y sample mean / SD. Slide 84 Stat 233, UCLA, Ivo Dinov 14 Correlation Coefficient - Properties Correlation Coefficient - Properties Correlation is pseudo-invariant w.r.t. linear transformations of X or Y N xk − µx yk − µy 1 R( X , Y ) = ∑ = N − 1 k = 1 σx σy R( aX + b, cY + d ), since axk + b − µax + b axk + b − ( aµx + b) = = a × σx σax + b Correlation is Associative R( X , Y ) = 1 N x − µ y − µ ∑ = R (Y , X ) N k = 1 σ σ k x k x y y Correlation measures linear association, NOT an association in general!!! So, Corr(X,Y) could be misleading for X & Y related in a non-linear fashion. a ( xk − µ ) + b − b xk − µx = sign( a ) a × σx σx Slide 85 Slide 86 Stat 233, UCLA, Ivo Dinov Correlation Coefficient - Properties R( X , Y ) = x k y x 1. 2. Least squares criterion 1 N x − µ y − µ ∑ = R (Y , X ) N k = 1 σ σ k y R measures the extent of linear association between two continuous variables. Association does not imply causation - both variables may be affected by a third variable – age was a confounding variable. Least squares criterion: Choose the values of the parameters to minimize the sum of squared prediction errors (or sum of squared residuals), n ∑ (y i =1 Slide 87 2 Slope: Least-squares line: ith data point yˆ = βˆ0 + βˆ1x (x , y ) i yi ^y i i 0 ^ x1 x 2 . . Least-squares line: Slide 89 n ∑ ( xi − x )( yi − y ) βˆ1 = i = 1 ; n 2 ∑ ( xi − x ) i =1 [ Prediction y - ^y error i i ^ 1 Stat 233, UCLA, Ivo Dinov The least squares line Its parameters are denoted: Intercept: 2 − yˆi ) Slide 88 (c) Prediction errors Choose line with smallest sum of squared prediction errors Min Σ (yi − y^i ) i Stat 233, UCLA, Ivo Dinov The least squares line Least-squares line Stat 233, UCLA, Ivo Dinov xi . . . xn ] βˆ = y − βˆ x 0 1 yˆ = βˆ0 + βˆ1x Stat 233, UCLA, Ivo Dinov Slide 90 Stat 233, UCLA, Ivo Dinov 15 The simple linear model y Data generated from Y = 6 + 2x + error (U) Dotted line is true line and solid line is the data-estimated LS line. Note differences between true β0=6, β1=2 and their estimates β0^ & β1^. y x x 1 x 2 x 3 x 4 (a) The simple linear model When X = x, x 1 x 2 x 3 Y ~ Normal(µY,σ) where µY = β0 + β1 x, OR Random error Slide 91 = 7.38, 0 ^ Sample 4: = 2.10 1 30 30 20 20 10 10 0 2 Sample 5: 30 4 ^ = 9.14, 0 6 ^ 8 ^ = 7.92, 0 10 10 20 20 10 10 2 4 x 6 8 1= 1.44 y 0 0 2 4 6 0 8 2 0 4 6 8 Stat 233, UCLA, Ivo Dinov Data generated from Y = 6 + 2x + error(U) = 1.59 1 2 4 Combined: 0 = 7.44, 6 ^ Estimates of slope, 0 0 0 Slide 93 8 = 1.70 5 10 15 0.5 True value 2 4 x 1 Mean = 1.98 Std dev. = 0.46 1 Figure 12.4.3 0 ^ ^ y 0 Sample 2: 0 = 9.11, 30 0 20 ^ 1= 2.26 Mean = 6.05 Std dev. = 2.34 0 20 ^ Estimates of intercept, 0 30 = 3.63, Histograms of least-squares estimates from 1,000 data sets ^ = 1.13 1 0 Slide 92 y 0 ^ Stat 233, UCLA, Ivo Dinov Data generated from Y = 6 + 2x + error(U) ^ 4 (b) Data sampled from the model when X = x, Y = β0 + β1 x + U, where U ~ Normal(0,σ) Sample 3: Sample 1: 30 6 Stat 233, UCLA, Ivo Dinov 1.0 1.5 2.0 2.5 3.0 3.5 True value Data generated from the model Y = 6 + 2 x + U where U Normal( µ = 0, σ = 3). 8 Slide 94 Stat 233, UCLA, Ivo Dinov R-Square Summary A mathematical term describing how much variation is being explained by the predictor X. For the simple linear model, least-squares estimates are unbiased [ E(β^)= β ] and Normally distributed. Noisier data produce more-variable least-squares estimates. R2 = 1 - SS(regression)/SS(total), Assuming "SS" = Sum Squared error, and that "SS(total)" means the variance in the data. This should be obvious, as Rsquared approaches 1 as a regression approaches a perfect fit.(i.e., R2 = 1 - Σ((data – regress’n)2))/Σ((data–data_mean)2)) The R-squared value is the fraction of the variance (not 'variation') in the data that is explained by a regression model (does not have to be simple-linear). Slide 95 Stat 233, UCLA, Ivo Dinov Slide 96 Stat 233, UCLA, Ivo Dinov 16 Summary Summary 4. In the simple linear model, what behavior is governed by σ ? (the spread of scatter of the data around trend) 1. Before considering using the simple linear model, what sort of pattern would you be looking for in the scatter plot? (linear trend with constant scatter spread across the range of X) 5. Our estimate of σ can be thought of as a sample standard deviation for the set of prediction errors from the least-squares line. 2. What assumptions are made by the simple linear model, SLM? (X is linearly related to the mean value of the Y obs’s at each X, µY= β0 + β1 x; where β0 & β1 are the true values of the intercept and slope of the SLM; The LS estimates β0^ & β1^ estimate the true values of β 0 & β 1; and the random errors U=YµY~N(µ, σ).) 30 20 3. If the simple linear model holds, what do you know about the sampling distributions of the least-squares estimates? (Unbiased and Normally distributed) 10 0 0 Slide 97 2 z Error = Actual value – Predicted value Y Stat 233, UCLA, Ivo Dinov X 1 2 3 4 5 6 7 8 Y Y= β 0 + β1 X 20 10 10 X 0 1 2 2 3 0 1 4 4 5 k 5 1 X 1 z The RMS Error for the regression line Y= β0 + β1 X is 2 2 2 2 2 2 3 ( y − yˆ ) + ( y − yˆ ) + ( y − yˆ ) + ( y − yˆ ) + ( y − yˆ ) 4 5 −1 5 where yˆ = βˆ + βˆ x , 1 ≤ k ≤ 5 6 z First compute the LS linear fit (estimate β 0^ + β 1^ ) 7 z Then Compute the individual errors 8 z Finally compute the cumulative RMS measure. z Error = Actual value – Predicted value 2 0 2 1 3 3 2 2 3 yˆ = βˆ + βˆ x , k 0 1 4 4 5 5 k 3 4 k Slide 100 Y 9 15 12 19 11 20 22 18 5 Stat 233, UCLA, Ivo Dinov z Then Compute the individual errors X 1 2 3 z Finally compute the cumulative RMS measure. ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 4 5 5 −1 6 ˆ ˆ where yˆ = β + β x , 1 ≤ k ≤ 5 7 z Note on the Correlation coefficient formula, 8 ( y − yˆ ) 2 , where K K 1 1 2 k 0 yˆ = βˆ + βˆ x , k 0 2 1 3 1 3 1≤ k ≤ 8 k 4 4 5 5 k Y 9 15 12 19 11 20 22 18 N x − µ y − µ X={x , x ,…, x ,} 1 1 2 N ∑ N − 1 k = 1 σ σ Y={y1, y2,…, yN,} k x k x Stat 233, UCLA, Ivo Dinov 5 Compute the RMS Error for this regression line R( X , Y ) = Slide 101 4 5 −1 1≤ k ≤ 5 Stat 233, UCLA, Ivo Dinov Compute the RMS Error for this regression line 1 1 where Slide 99 Y 9 15 12 19 11 20 22 18 0 2 4 6 8 z The RMS Error for the regression line Y= β0 + β1 X is ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 5 −1 1≤ k ≤ 5 yˆ = βˆ + βˆ x , k 3 X 0 0 2 4 6 8 z The RMS Error for the regression line Y= β0 + β1 X is ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 + ( y − yˆ ) 2 k 8 30 20 1 Slide 98 z Error = Actual value – Predicted value 30 where 6 Compute the RMS Error for this regression line RMS Error for regression 1 4 Stat 233, UCLA, Ivo Dinov y y Slide 102 (µX, σX), (µY, σY) sample mean / SD. Stat 233, UCLA, Ivo Dinov 17 Recall the correlation coefficient… Misuse of the correlation coefficient Another form for the correlation coefficient is: [ ] n ∑ ( x − x )( y − y ) i i i =1 = R ( X ; Y ) = Corr ( X ; Y ) = n n ∑ ( x − x )2 × ∑ ( y − y )2 i i i = 1 i = 1 n − n× x × y ∑ yx i i [ ] i =1 = n n ∑ ( x − x )2 × ∑ ( y − y )2 i i i = 1 i = 1 ( n ∑ y x + x y − yx − x y i i i i i =1 )( n n 2 2 ∑ (x −x) × ∑ ( y −y) i i i =1 i =1 (b) ( ) r=0 ) Slide 103 Slide 104 n n βˆ1 = ∑ [( xi − x )( yi − y )] ∑ ( xi − x ) 2 ; i =1 i =1 10000 11000 12000 βˆ0 = y − βˆ1x z Scatter = residual (prediction) error Err=Obs-Pred ∑ ( yi − yˆ i ) = ( y1 − yˆ1 ) + ( y2 − yˆ 2 ) + ... + ( yn − yˆ n ) 2 Stat 233, UCLA, Ivo Dinov 1. Note that there is a slight difference in the formula for the slope of the Least-Squares Best-Linear Fit line: n ∑ ( xi − x )( yi − y ) βˆ1 = i = 1 ; βˆ0 = y − βˆ1 x n 2 ∑ ( xi − x ) i =1 [ yˆ = βˆ0 + βˆ1x + Err z Trend=best linear fit Line (LS) 9000 r=0 Another Notation for the Slope of the LS line z Regression relationship = trend + residual scatter 2 (c) r=0 Stat 233, UCLA, Ivo Dinov Linear Regression n Some patterns with r = 0 (a) 2 2 βˆ = Corr ( X ; Y ) × 1 ] SD(Y ) ; SD ( X ) βˆ = y − βˆ x 0 1 i =1 Slide 105 Stat 233, UCLA, Ivo Dinov Definition of the population median 1. The population median is defined as the number in the middle of the distribution of the RV, i.e., 50% of the data lies below and 50% above the median. 2. Under what circumstances is the population median the same as the population mean? (symmetry of the distribution.) 3. Why do we use the population median rather than the population mean in the sign test? (for a skewed distribution, mean may not be representative, or may be outlier heavily influenced.) 4. Why is the model for the sign test like tossing a fair coin? (In the sign-test we test H0: µ~ =0, under H0 a random observation ~ ~ is as likely to be < µ as to be > µ . So observation has + or – sign with the same probability, hence the coin-toss model, distribution-free, nonparametric approach. Testing H0 is just like testing biased/unbiased coin). Slide 107 Stat 233, UCLA, Ivo Dinov Slide 106 Stat 233, UCLA, Ivo Dinov Why Use Nonparametric Statistics? z Parametric tests are based upon assumptions that may include the following: The data have the same variance, regardless of the treatments or conditions in the experiment. The data are normally distributed for each of the treatments or conditions in the experiment. z What happens when we are not sure that these assumptions have been satisfied? Slide 108 Stat 233, UCLA, Ivo Dinov 18 How Do Nonparametric Tests Compare with the Usual z, t, and F Tests? The Wilcoxon Rank Sum Test z Suppose we wish to test the hypothesis that two distributions have the same center. z Studies have shown that when the usual assumptions are satisfied, nonparametric tests are about 95% efficient when compared to their parametric counterparts. z We select two independent random samples from each population. Designate each of the observations from population 1 as an “A” and each of the observations from population 2 as a “B”. z When normality and common variance are not satisfied, the nonparametric procedures can be much more efficient than their parametric equivalents. z If H0 is true, and the two samples have been drawn from the same population, when we rank the values in both samples from small to large, the A’s and B’s should be randomly mixed in the rankings. Slide 109 Stat 233, UCLA, Ivo Dinov What happens when H0 is true? • Suppose we had 5 measurements (A) from population 1 and 6 measurements (B) from population 2. • If they were drawn from the same population, the rankings might be like this. ABABBABABBA • In this case if we summed the ranks of the A measurements and the ranks of the B measurements, the sums would be similar. Slide 111 Stat 233, UCLA, Ivo Dinov How to Implement Wilcoxon’s Rank Test •Rank the combined sample from smallest to largest. •Let T1 represent the sum of the ranks of the first sample (A’s). •Then, T1* defined below, is the sum of the ranks that the A’s would have had if the observations were ranked from large to small. Slide 110 Stat 233, UCLA, Ivo Dinov What happens if H0 is not true? • If the observations come from two different populations, perhaps with population 1 lying to the left of population 2, the ranking of the observations might take the following ordering. AAABABABBB In this case the sum of the ranks of the B observations would be larger than that for the A observations. Slide 112 Stat 233, UCLA, Ivo Dinov The Wilcoxon Rank Sum Test – aka Mann-Whitney U test H H00::the thetwo twopopulation populationdistributions distributions are arethe thesame same H thetwo twopopulations populationsare arein in some some way waydifferent different Haa::the z The test statistic is the smaller of T1 and T1*. z Reject H0 if the test statistic is less than the critical value found in Wilcoxon Rank Sum Test Table. T1* = n 1 ( n 1+ n 2 + 1) − T 1 Slide 113 Stat 233, UCLA, Ivo Dinov Slide 114 Stat 233, UCLA, Ivo Dinov 19 Example IfIfseveral severalmeasurements measurementsare aretied, tied, each eachgets getsthe theaverage averageof ofthe theranks ranks they theywould wouldhave havegotten, gotten,ififthey they were werenot nottied! tied!(See (Seexx==180) 180) The wing stroke frequencies of two species of bees were recorded for a sample of n1 = 4 from species 1 and n2 = 6 from species 2. Can you conclude that the distributions of wing strokes differ for these two species? Use α = .05. H : the two species are the same Species 1 235 225 190 188 Species 2 H00: the two species are the same (10) (9) 180 H : the two species are in some way different (3.5) Haa: the two species are in some way different 169 (1) (8) (7) 180 (3.5) (6) (2) (5) 185 178 182 Can you conclude that the The Bee Problem distributions of wing strokes differ for these two species? α = .05. Reject RejectH H00..Data Data provides provides sufficient sufficient evidence evidenceindicating indicatingaa difference differencein in the the distributions distributions of of wing wing stroke strokefrequencies. frequencies. 1. The test statistic is T = 10. 1. The sample with the smaller sample size is called sample 1. 2. The critical value of T from Table 7(b) for a two-tailed test with α/2 = .025 is T = 12; H0 is rejected if T ≤ 12. 2. We rank the 10 observations from smallest to largest, shown in parentheses in the table. Slide 115 Slide 116 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Large Sample Approximation: Wilcoxon Rank Sum Test Minitab Output Recall T1 = 34; T1* = 10. Mann-Whitney Test and CI: Species1, Species2 Species1 N = 4 Median = 207.50 Species2 N = 6 Median = 180.00 Point estimate for ETA1-ETA2 is 30.50 95.7 Percent CI for ETA1-ETA2 is (5.99,56.01) W = 34.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0142 The test is significant at 0.0139 (adjusted for ties) Minitab Minitabcalls callsthe theprocedure procedurethe theMann-Whitney Mann-WhitneyUU Test, Test,equivalent equivalentto tothe theWilcoxon WilcoxonRank RankSum SumTest. Test. When n1 and n2 are large (greater than 10 is large enough), a normal approximation can be used to approximate the critical values. 1. Calculate T1 and T1*. Let T = min(T1, T1* ). T − µT 2. The statistic z = has an approximate σT Normal distribution with n ( n + n + 1) n n ( n + n + 1) and σT2 = 1 2 1 2 µT = 1 1 2 T1* = n 1 (n 1+ n 2 + 1) − T 1 2 12 T1 = sum of the ranks of sample 1 (A’s). The 34and andhas haspp-value -value Thetest teststatistic statisticisisW W==TT11==34 ==.0142. .0142.Do Donot notreject rejectHH00for forαα==.05. .05. Slide 117 Calculate T1 = 7 + 8 + 9 + 10 = 34 T1* = n1(n1 + n2 + 1) − T1 = 4(4 + 6 + 1) − 34 = 10 Slide 118 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Wilcoxon Rank Sum test Å Æ two-sample T test The Sign Test should you use the Wilcoxon Rank Sum test instead of the two-sample t test for independent samples? 9when the responses can only be ranked and not quantified (e.g., ordinal qualitative data) 9when the F test or the Rule of Thumb shows a problem with equality of variances 9when a normality plot shows a violation of the normality assumption z The sign test is a fairly simple procedure that can be used to compare two populations when the samples consist of paired observations. •When Slide 119 Stat 233, UCLA, Ivo Dinov z It can be used 9when the assumptions required for the paireddifference test are not valid or 9 when the responses can only be ranked as “one better than the other”, but cannot be quantified. Slide 120 Stat 233, UCLA, Ivo Dinov 20 The Sign Test The Sign Test 9 9For For each each pair, pair, measure measure whether whether the the first first response—say, response—say, A—exceeds A—exceeds the the second second response—say, response—say, B. B. 9 9The The test test statistic statistic isis x, x, the the number number of of times times that that A exceeds B in the n pairs of observations. A exceeds B in the n pairs of observations. 9 9Only Only pairs pairs without without ties ties are are included included in in the the test. test. 9 9Critical Critical values values for for the the rejection rejection region region or or exact exact p-values p-values can can be be found found using using the the cumulative cumulative binomial binomial distribution distribution (SOCR (SOCR resource resource online). online). Slide 121 H H00:: the the two two populations populations are are identical identical versus versus : one or two-tailed alternative H Haa: one or two-tailed alternative isis equivalent equivalent to to H : p = P(A exceeds H00: p = P(A exceeds B) B) == .5 .5 versus versus H (≠, <, <, or or >) >) .5 .5 Haa:: pp (≠, Test Test statistic: statistic: xx == number number of of plus plus signs signs Rejection region, p-values from Rejection region, p-values from Bin(n=size, Bin(n=size, p). p). Slide 122 Stat 233, UCLA, Ivo Dinov The Gourmet Chefs Example Two gourmet chefs each tasted and rated eight different meals from 1 to 10. Does it appear that one of the chefs tends to give higher ratings than the other? Use α = .01. Meal 1 2 3 4 5 6 7 8 Chef A 6 4 7 8 2 4 9 7 Chef B 8 5 4 7 3 7 9 8 Sign - - + + - - 0 - HH0::the rating distributions are the same (p = .5) 0 the rating distributions are the same (p = .5) ratings are different (p ≠ .5) HHa::the a the ratings are different (p ≠ .5) Slide 123 When n ≥ 25, a normal approximation can be used to approximate the critical values of Binomial distribution. 1. Calculate x = number of plus signs . x − .5n has an approximat e .5 n z distributi on. Slide 125 Meal 1 2 3 4 5 6 7 8 Chef A 6 4 7 8 2 4 9 7 Chef B 8 5 4 7 3 7 9 8 p-value =.454 istoo toolarge largeto to reject Thereisisinsufficient insufficient rejectHH00..There HH0::pp==.5 .5 evidence to indicate that one 0 evidence to indicate that one the pair) HHa::pp≠≠.5.5 with withnn==77(omit (omitchef thetied tied pair) tends to a chef tends torate rateone onemeal meal plus 22 other. Test than numberof ofhigher plussigns signs TestStatistic: Statistic:xx==number higher than==the the other. Sign Stat 233, UCLA, Ivo Dinov - - + p-value - =.454 0 is + Use UseTable Table11with withnn==77and andpp==.5. .5. pp-value -value==P(observe P(observexx==22or orsomething somethingequally equallyas asunlikely) unlikely) x ≤ 2) + P( x ≥ 5) = 2(.227) = ==P( P(x ≤ 2) + P(x ≥ 5) = 2(.227) =.454 .454 k 0 1 2 3 4 5 6 7 P(x ≤ k) .008 .062 .227 .500 .773 .938 .992 1.000 Slide 124 Stat 233, UCLA, Ivo Dinov Large Sample Approximation: The Sign Test 2. The statistic z = Stat 233, UCLA, Ivo Dinov Y~Bin(n, p) Î E(Y) = np Var(Y) = np(1-p) Stat 233, UCLA, Ivo Dinov Example You record the number of For Foraatwo twotailed tailedtest, test,we wereject rejectHH00 accidents per day at a large ifif|z|z| |>>1.96 1.96(5% (5%level). level). manufacturing plant for both the HH0 isisrejected. rejected.There Thereisisevidence evidence 0 day and evening shifts for n = of ofaadifference differencebetween betweenthe theday day 100 days. You find that the and andnight nightshifts. shifts. number of accidents per day for the evening shift xE exceeded H : the distributions (# of H00: the distributions (# of the corresponding number of accidents) .5) accidents)are arethe thesame same((pp==.5) accidents in the day shift xD on H : the distributions are diff. (p≠.5) Haa: the distributions are diff. (p≠.5) 63 of the 100 days. Do these Test statistic : results provide sufficient x − .5n 63 − .5(100) evidence to indicate that more z= = = 2.60 accidents tend to occur on one .5 n .5 100 shift than on the other? Slide 126 Stat 233, UCLA, Ivo Dinov 21 Which test should you use? z We compare statistical tests using z If all parametric assumptions have been met, the parametric test will be the most powerful. Definition: Power = 1 − β = P(reject H0 when Ha is true) z The power of the test is the probability of rejecting the null hypothesis when it is false and some specified alternative is true. z The power is the probability that the test will do what it was designed to do—that is, detect a departure from the null hypothesis when a departure exists. Slide 127 z If not, a nonparametric test may be more powerful. z If you can reject H0 with a less powerful nonparametric test, you will not have to worry about parametric assumptions. z If not, you might try more powerful nonparametric test or increasing the sample size to gain more power Slide 128 Stat 233, UCLA, Ivo Dinov The Wilcoxon Signed-Rank Test – different form Wilcoxon Rank Sum Test z The Wilcoxon Signed-Rank Test is a more powerful nonparametric procedure that can be used to compare two populations when the samples consist of paired observations. z It uses the ranks of the differences, d = x1-x2 that we used in the paired-difference test. Slide 129 Which test should you use? The Wilcoxon Signed-Rank Test – different form Wilcoxon Rank Sum Test 99For For each eachpair, pair, calculate calculatethe the difference differencedd ==xx11-x -x22.. Eliminate Eliminatezero zerodifferences. differences. 99Rank Rank the theabsolute absolute values values of of the thedifferences differences from from11 to ton. n. Tied Tied observations observations are areassigned assigned average average of of the theranks ranks they they would would have have gotten gottenifif not not tied. tied. + T T+ ==rank rank sum sumfor for positive positive differences differences T T- ==rank rank sum sumfor for negative negative differences differences + 99If If the thetwo twopopulations populations are arethe the same, same, TT+ and and TT- should should ++ or T-- is unusually large, this be nearly equal. If either T be nearly equal. If either T or T is unusually large, this provides provides evidence evidenceagainst againstthe thenull nullhypothesis. hypothesis. Slide 130 Stat 233, UCLA, Ivo Dinov Slide 131 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Example The Wilcoxon Signed-Rank Test H H00:: the the two two populations populations are are identical identical versus versus H : one or two-tailed alternative Haa: one or two-tailed alternative + Test Test statistic: statistic: TT == min min (( TT + and and TT -)) Critical Critical values values for for aa one one or or two-tailed two-tailed rejection region can be found rejection region can be found using using Wilcoxon Wilcoxon Signed-Rank Signed-Rank Test Test Table. Table. Stat 233, UCLA, Ivo Dinov To compare the densities of cakes using mixes A and B, six pairs of pans (A and B) were baked side-by-side in six different oven locations. Is there evidence of a difference in density for the two cake mixes? Location 1 2 3 4 5 6 Cake Mix A .135 .102 .098 .141 .131 .144 Cake Mix B .129 .120 .112 .152 .135 .163 d = xA-xB .006 -.018 -.014 -.011 -.004 -.019 HH0::the density distributions are the same 0 the density distributions are the same density distributions are different HHa::the a the density distributions are different Slide 132 Stat 233, UCLA, Ivo Dinov 22 Cake Densities Do Donot notreject reject is HH0..There 0 There is insufficient Cake Mix B .129 .120 .112 .152 .135 .163 insufficient evidence d = xA-xB .006 -.018 -.014 -.011 -.004 -.019 evidenceto to indicate indicatethat that Rank 2 5 4 3 1 6 there is a there is a difference differencein in + Calculate: Calculate:TT+ ==22and andTT -==5+4+3+1+6 5+4+3+1+6==19. 19. densities densitiesfor forthe the + + two The twocake cakemixes. mixes. min((TT ,, TT ))==2. 2. Thetest teststatistic statisticisisTT==min Rejection Rejectionregion: region:Use UseTable Table8. 8.For Foraatwo-tailed two-tailedtest testwith with αα==.05, .05,reject rejectHH00ififTT≤≤1. 1. Rank Rankthe the66 differences, differences, without withoutregard regard to tosign. sign. Location 1 2 3 4 5 6 Cake Mix A .135 .102 .098 .141 .131 .144 Slide 133 When n ≥ 25, a normal approximation can be used to approximate the critical values in Table 8. 1. CalculateT + and T − . Let T = min(T + , T − ). T − µT 2. The statisticz = has an approximate σT z distribution with µT = n(n + 1) n(n + 1)(2n + 1) and σ T2 = 4 24 Slide 134 Stat 233, UCLA, Ivo Dinov The Kruskal-Wallis – H Test z The KruskalKruskal-Wallis H Test is a nonparametric procedure that can be used to compare more than two populations in a completely randomized design. z Non-parametric equivalent to ANOVA F-test! z All n = n1+n2+…+nk measurements are jointly ranked. z We use the sums of the ranks of the k samples to compare the distributions. Slide 135 Large Sample Approximation: The Signed-Rank Test The Kruskal-Wallis – H Test 99Rank Rank the thetotal totalmeasurements measurements in inall allkk samples samples from from11 to ton. n. Tied Tied observations observations are are assigned assigned average averageof of the the ranks ranks they theywould would have have gotten gotten ifif not nottied. tied. 99Calculate Calculate th T Tii==rank rank sum sum for for the the iithsample sample ii ==1, 1,2,…,k 2,…,k n n == nn11+n +n22+…+n +…+nkk 99And And the thetest teststatistic statistic H Hisis(analog (analogto: to: FF== MSST/MSSE) MSST/MSSE) H= k 2 12 Ti − 3(n + 1) n(n + 1) ni i =1 ∑ Slide 136 Stat 233, UCLA, Ivo Dinov Slide 137 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Example The H Test TheKruskal-Wallis Kruskal-Wallis H Test H H00:: the the kk distributions distributions are are identical identical versus versus H Haa:: at at least least one one distribution distribution isis different different Test statistic: Kruskal-Wallis H Test statistic: Kruskal-Wallis H When is true, true, the the test test statistic statistic H H has has an an When H H00 is 2 2 approximate approximate χχ distribution distribution with with df df == k-1. k-1. Use Use aa right-tailed right-tailed rejection rejection region region or or p-value p-value based based on on the the Chi-square Chi-square distribution. distribution. Stat 233, UCLA, Ivo Dinov Four groups of students were randomly assigned to be taught with four different techniques, and their achievement test scores were recorded. Are the distributions of test scores the same, or do they differ in location? 1 2 3 4 65 75 59 94 87 69 78 89 73 83 67 80 79 81 62 88 Slide 138 Stat 233, UCLA, Ivo Dinov 23 Teaching Methods Teaching Methods 1 Ti 2 65 (3) 75 87 73 79 3 (7) 59 (1) 94 (16) (13) 69 (5) 78 (8) 89 (15) (6) 83 (12) 67 (4) 80 (10) (9) 81 (11) 62 (2) 88 (14) 31 35 15 Test statistic: H = 55 Rank Rankthe the16 16 measurements HH0::the distributions of scores are the same measurements 0 the distributions of scores are the same from distributions differ in location HHa::the from11to to16, 16, a the distributions differ in location and andcalculate calculate 12 T2 the thefour fourrank rank Test statistic: H = ∑ i − 3( n + 1) n ( n + 1 ) ni sums. sums. 12 312 + 352 + 152 + 552 − 3(17) = 8.96 = 16(17) 4 Slide 139 HH0::the distributions of scores are the same 0 the distributions of scores are the same distributions differ in location HHa::the a the distributions differ in location 4 = Stat 233, UCLA, Ivo Dinov Key Concepts 2. Use T1 to test for population 1 to the left of population 2 Nonparametric Methods 1. These methods can be used when the data cannot be measured on a quantitative scale, or when 2. The numerical scale of measurement is arbitrarily set by the researcher, or when 3. The parametric assumptions such as normality or constant variance are seriously violated. II. Wilcoxon Rank Sum Test: Independent Random Samples Use T1* to test for population to the right of population 2. Use the smaller of T1 and T1* to test for a difference in the locations of the two populations. 3. Table 7 of Appendix I has critical values for the rejection of H0. 4. When the sample sizes are large, use the normal T − µT approximation: z= σT n (n + n + 1) n n (n + n2 + 1) µT = 1 1 2 and σ T2 = 1 2 1 2 12 1. Jointly rank the two samples: Designate the smaller sample as sample 1. Then T1 = Rank of sample1 ⇒ T1* = n1(n1 + n2 + 1) − T1 Slide 154 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Key Concepts Key Concepts IV. Sign Test for a Paired Experiment 1. Find x, the number of times that observation A exceeds observation B for a given pair. 2. To test for a difference in two populations, test H0 : p = 0.5 versus a one- or two-tailed alternative. 3. Use Table 1 of Appendix I to calculate the p-value for the test. 4. When the sample sizes are large, use the normal approximation: z= Reject Thereisissufficient sufficient RejectHH00..There evidence evidenceto toindicate indicatethat thatthere there isisaadifference differencein intest testscores scoresfor for the thefour fourteaching teachingtechniques. techniques. Slide 140 Stat 233, UCLA, Ivo Dinov I. III. 12 312 + 352 + 152 + 552 − 3(17) = 8.96 16(17) 4 Rejection Rejectionregion: region:Use UseTable Table5. 5. For Foraaright-tailed right-tailedchi-square chi-squaretest test α = .05 and df = 4-1 =3, with with α = .05 and df = 4-1 =3, reject 7.81. rejectHH00ififHH ≥≥7.81. Key Concepts Slide 153 T2 12 ∑ i − 3( n + 1) n(n + 1) ni x − .5 n Wilcoxon SignedSigned-Rank Test: Paired Experiment 1. Calculate the differences in the paired observations. Rank the absolute values of the differences. Calculate the rank − + sums T and T for the positive and negative differences, respectively. The test statistic T is the smaller of the two rank sums. 2. Table 8 of Appendix I has critical values for the rejection of for both one- and two-tailed tests. 3. When the sampling sizes are large, use the normal approximation: T − [n(n + 1) 4] z= [n(n + 1)(2n + 1)] 24 .5 n Slide 155 Stat 233, UCLA, Ivo Dinov Slide 156 Stat 233, UCLA, Ivo Dinov 24 Key Concepts Key Concepts V. KruskalKruskal-Wallis H Test: Completely Randomized Design VI. The Friedman Fr Test: Randomized Block Design 1. Jointly rank the n observations in the k samples. Calculate the rank sums, Ti = rank sum of sample i, and the test statistic 12 T2 H= ∑ i − 3( n + 1) n(n + 1) ni 2. If the null hypothesis of equality of distributions is false, H will be unusually large, resulting in a one-tailed test. 1. Rank the responses within each block from 1 to k. Calculate the rank sums T1, T2, …, Tk, and the test statistic 12 Fr = ∑ Ti 2 − 3b(k + 1) bk (k + 1) 2. If the null hypothesis of equality of treatment distributions is false, Fr will be unusually large, resulting in a one-tailed test. 3. For sample sizes of five or greater, the rejection region for H is based on the chi-square distribution with (k − 1) degrees of freedom. Slide 157 3. For block sizes of five or greater, the rejection region for Fr is based on the chi-square distribution with (k − 1) degrees of freedom. Slide 158 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Sensitivity vs. Specificity Key Concepts VII. Spearman's Rank Correlation Coefficient zSensitivity is a measure of the fraction of gold standard known examples that are correctly classified/identified. 1. Rank the responses for the two variables from smallest to largest. 2. Calculate the correlation coefficient for the ranked observations: S xy 6 ∑ di2 or rs = 1 − if there are no ties rs = n( n 2 − 1) S xx S yy zSpecificity is a measure of the fraction of negative examples that are correctly classified: 3. Table 9 in Appendix I gives critical values for rank correlations significantly different from 0. zTP = True Positives 4. The rank correlation coefficient detects not only significant linear correlation but also any other monotonic relationship between the two variables. zTN = True Negatives Slide 159 z Sensitivity Fraction True Positives 1.0 Worse Test Bigger area Î higher test accuracy and better discrimination i.e. the ability of the test to correctly classify those with and without the disease. Specificity Stat 233, UCLA, Ivo Dinov Slide 161 zFN = False Negatives zFP = False Positives Ho: no effects (µ=0) True Reality Ho true Ho false Can’t reject TN FN Reject Ho FP TP Slide 160 Receiver-Operating Characteristic curve zReceiver Operating Characteristic (ROC) curve Fraction False Positives Specificity = TN/(TN+FP) Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov The ROC Curve Better Test Sensitivity= TP/(TP+FN) Test Results z 1.0 zROC curve demonstrates several things: It shows the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity). The closer the curve follows the left border and then the top border of the ROC space, the more accurate the test. The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test. The slope of the tangent line at a cut-point gives the likelihood ratio (LR) for that value of the test. You can check this out on the graph above. The area under the curve is a measure of test accuracy. Stat 233, UCLA, Ivo Dinov Slide 162 25 Bias and Precision Bias and Precision z The bias in an estimator is the distance between between the center of the sampling distribution of the estimator and the true value of the parameter being ˆ ) − θ , where estimated. In math terms, bias = E (Θ theta Θ̂ is the estimator, as a RV, of the true (unknown) parameter θ . z The precision of an estimator is a measure of how variable is the estimator in repeated sampling. value of parameter value of parameter (a) No bias, high precision (b) No bias, low precision z Example, Why is the sample mean an unbiased estimate for the population mean? How about ¾ of n the sample mean? ˆ ) − µ = E 3 1 ∑ X − µ = E (Θ 4n k =1 k 3 µ µ − µ = ≠ 0, in general. 4 4 n ˆ ) − µ = E 1 ∑ X − µ = 0 E (Θ n k =1 k Slide 163 z Suppose: {Y1, Y2, ………….., Y4N} IID (µ, σ). 3N 4N z Let Y = 1 Y; Y =1 3 N ∑k =1 k 2 2 Y = Yk ; z And Y = 1 4 N ∑ k =1 2 z Which estimator is better? Y or Y ? E (Y ) = E (Y ) = µ Slide 164 Var (Y ) = σ < 2 3N N ∑k =3 N +1 ( ) z χ2 [Chi-Square] goodness of fit test: Let {X1, X2,…, XN} are IID N(0, 1) W = X12 + X22 + X32 + …+ XN2 χ2 W ~ χ2(df=N) Note: If {Y1, Y2, …, YN} are IID N(µ, σ2), then N 1 SD 2 (Y ) = (Yk − Y )2 N −1 k =1 N −1 2 Yk ∑ Var Y = σ W= 2 N σ2 And the Statistics W ~ χ2(df=N-1) E(W)=N; Var(W)=2N z Both are unbiased, but variance of second one is smaller Î estimator is more precise!!! Slide 165 Stat 233, UCLA, Ivo Dinov Continuous Distributions – χ2 [Chi-Square] 4N 1 Y +Y value of parameter (d) Biased, low precision Stat 233, UCLA, Ivo Dinov Bias and Precision – 2 unbiased estimators of µ 1 value of parameter (c) Biased, high precision Slide 166 Stat 233, UCLA, Ivo Dinov Continuous Distributions – F-distribution SD (Y ) Stat 233, UCLA, Ivo Dinov Continuous Distributions – F-distribution z F-distribution is the ratio of two χ2 random variables. z F-distribution is the ratio of two χ2 random variables. z Snedecor's F distribution is most commonly used in tests of variance (e.g., ANOVA). The ratio of two chi-squares divided by their respective degrees of freedom is said to follow an F distribution z Snedecor's F distribution is most commonly used in tests of variance (e.g., ANOVA). The ratio of two chi-squares divided by their respective degrees of freedom is said to follow an F distribution SD 2 (Y ) = WY = Fo = 1 N −1 N N −1 σ Y2 1 k =1 2 SD (Y ); W X = M −1 σ 2X l =1 2 SD ( X ); WY /( N − 1) ~ F( df1 = N − 1, df 2 = M − 1) WX /( M − 1) Slide 168 M ∑ (Yk − Y )2; SD2 ( X ) = M − 1 ∑ (X l − X )2 Stat 233, UCLA, Ivo Dinov k {Y1;1, Y1;2, ………….., Y1;N1} IID from a Normal(µ1;σ1) {Y2;1, Y2;2,.., Y2;N2} IID from a Normal(µ2;σ2) .,.. {Yk;1, Yk;2, ….., Yk;N2} IID from a Normal(µ2;σ2) σ1= σ2= σ3=… σn = σ. (1/2 <= σk/σj<=2) k Samples are independent! Slide 169 Stat 233, UCLA, Ivo Dinov 26 Continuous Distributions – Cauchy’s Continuous Distributions – Review z Cauchy’s distribution, X~Cauchy(t,s), t=location; s=scale z PDF(X): 1 f ( x) = ; x ∈ R (reals) sπ (1 + ( x − t ) / s ) 2 ) 1 f ( x) = z PDF(Std Cauchy’s(0,1)): ( sπ 1 + x 2 ) z The Cauchy distribution is (theoretically) important as an example of a pathological case. Cauchy distributions look similar to a normal distribution. However, they have much heavier tails. When studying hypothesis tests that assume normality, seeing how the tests perform on data from a Cauchy distribution is a good indicator of how sensitive the tests are to heavy-tail departures from normality. The mean and standard deviation of the Cauchy distribution are undefined!!! The practical meaning of this is that collecting 1,000 data points gives no more accurate of an estimate of the mean and standard deviation than does a single point (CauchyÆTdfÆNormal). Slide 171 z Remained to see a good ANOVA (F-distribution Example) z SYSTAT Æ FileÆLoad (Data) ÆC:\Ivo.dir\Research\Data.dir\WM_GM_CSF_tissueMap s.dir\ATLAS_IVO_WM_GM.xls Æ Statistics Æ ANOVA ÆEst.Model Æ Dependent(Value) Æ Factors(Method, Hemi, TissueType) [For 1/2/3-Way ANOVA] Slide 172 Stat 233, UCLA, Ivo Dinov Continuous Distributions – Exponential z Exponential distribution, X~Exponential(λ) f ( x) = λe ; x≥0 z E(X)=1/ λ; Var(X)=1/ λ2; z Another name for the exponential mean is the Mean Time To Fail or MTTF and we have MTTF = 1/ λ. z If X is the time between occurrences of rare events that happen on the average with a rate l per unit of time, then X is distributed exponentially with parameter λ. Thus, the exponential distribution is frequently used to model the time interval between successive random events. Examples of variables distributed in this manner would be the gap length between cars crossing an intersection, life-times of electronic devices, or arrivals of customers at the check-out counter in a grocery store. Stat 233, UCLA, Ivo Dinov Continuous Distributions – Exponential Examples z Customers arrive at a certain store at an average of 15 per hour. What is the probability that the manager must wait at least 5 minutes for the first customer? z The exponential distribution is often used in probability to model (remaining) lifetimes of mechanical objects for which the average lifetime is known and for which the probability distribution is assumed to decay exponentially. z Suppose after the first 6 hours, the average remaining lifetime of batteries for a portable compact disc player is 8 hours. Find the probability that a set of batteries lasts between 12 and 16 hours. Solutions: Stat 233, UCLA, Ivo Dinov Continuous Distributions – Exponential z Exponential distribution, Example: z The exponential model, with only one unknown parameter, is the simplest of all life distribution models. − λx Slide 173 z Uniform, (General/Std) Normal, Student’s T, F, χ2, Cauchy distributions. By-hand vs. ProbCalc.htm z On weeknight shifts between 6 pm and 10 pm, there are an average of 5.2 calls to the UCLA medical emergency number. Let X measure the time needed for the first call on such a shift. Find the probability that the first call arrives (a) between 6:15 and 6:45 (b) before 6:30. Also find the median time needed for the first call ( 34.578%; 72.865% ). We must first determine the correct average of this exponential distribution. If we consider the time interval to be 4x60=240 minutes, then on average there is a call every 240 / 5.2 (or 46.15) minutes. Then X ~ Exp(1/46), [E(X)=46] measures the time in minutes after 6:00 pm until the first call. Slide 174 Stat 233, UCLA, Ivo Dinov Recall the example of Poisson approx to Binomial z Suppose P(defective chip) = 0.0001=10-4. Find the probability that a lot of 25,000 chips has > 2 defective! z Y~ Binomial(25,000, 0.0001), find P(Y>2). Note that Z~Poisson(λ =n p =25,000 x 0.0001=2.5) 2.5 z −2.5 e = z! z =0 0 1 2 2.5 −2.5 2.5 −2.5 2.5 −2.5 1− e + e + e = 0.456 1! 2! 0! 2 z Here the average waiting time is 60/15=4 minutes. Thus X ~ exp(1/4). E(X)=4. Now we want P(X>5)=1-P(X <= 5). We obtain a right tail value of .2865. So around 28.65% of the time, the store must wait at least 5 minutes for the first customer. z Here the remaining lifetime can be assumed to be X ~ exp(1/8). E(X)=8. For the total lifetime to be from 12 to 16, then the remaining lifetime is from 6 to 10. We find that P(6 <= X <= 10) = .1859. Slide 175 Stat 233, UCLA, Ivo Dinov P ( Z > 2) = 1 − P ( Z ≤ 2) = 1 − ∑ Slide 176 Stat 233, UCLA, Ivo Dinov 27 Normal approximation to Binomial Normal approximation to Binomial – Example z Suppose Y~Binomial(n, p) z Then Y=Y1+ Y2+ Y3+…+ Yn, where z Roulette wheel investigation: z Compute P(Y>=58), where Y~Binomial(100, 0.47) – Yk~Bernoulli(p) , E(Yk)=p & Var(Yk)=p(1-p) Î E(Y)=np & Var(Y)=np(1-p), SD(Y)= (np(1-p)) 1/2 Standardize Y: Z=(Y-np) / (np(1-p))1/2 By CLT Î Z ~ N(0, 1). So, Y ~ N [np, (np(1-p))1/2] z Normal Approx to Binomial is reasonable when np >=10 & n(1-p)>10 (p & (1-p) are NOT too small relative to n). Slide 177 The proportion of the Binomial(100, 0.47) population having more than 58 reds (successes) out of 100 roulette spins (trials). Since np=47>=10 & n(1-p)=53>10 Normal approx is justified. Roulette has 38 slots z Z=(Y-np)/Sqrt(np(1-p)) = 18red 18black 2 neutral 58 – 100*0.47)/Sqrt(100*0.47*0.53)=2.2 z P(Y>=58) Í Î P(Z>=2.2) = 0.0139 z True P(Y>=58) = 0.177, using SOCR (demo!) z Binomial approx useful when no access to SOCR avail. Slide 178 Stat 233, UCLA, Ivo Dinov Normal approximation to Poisson z Let X1~Poisson(λ) & X2~Poisson(µ) ÆX1+ X2~Poisson(λ+µ) Stat 233, UCLA, Ivo Dinov Normal approximation to Poisson – example z Let X1~Poisson(λ) & X2~Poisson(µ) ÆX1+ X2~Poisson(λ+µ) z Let X1, X2, X3, …, Xk ~ Poisson(λ ), and independent, z Let X1, X2, X3, …, X200 ~ Poisson(2), and independent, z Yk = X1 + X2 + ··· + Xk ~ Poisson(kλ), E(Yk)=Var(Yk)=kλ. z Yk = X1 + X2 + ··· + Xk ~ Poisson(400), E(Yk)=Var(Yk)=400. z The random variables in the sum on the right are independent and each has the Poisson distribution with parameter λ . z By CLT the distribution of the standardized variable (Yk − 400) / (400)1/2 Î N(0, 1), as k increases to infinity. z By CLT the distribution of the standardized variable (Yk − kλ ) / (kλ )1/2 Î N(0, 1), as k increases to infinity. z So, for kλ >= 100, Zk = {(Yk − kλ ) / (kλ )1/2 } ~ N(0,1). z Î Yk ~ N(kλ, (kλ)1/2). Slide 179 z P(2 < Yk < 400) = (std’z 2 & 400) = z P( (2−400)/20 < Zk < (400−400)/20 ) = P( -20< Zk<0) = 0.5 Slide 180 Stat 233, UCLA, Ivo Dinov Poisson or Normal approximation to Binomial? z Poisson Approximation (Binomial(n, pn) Æ Poisson(λ) ): y −λ WHY? λ e n p y (1 − p ) n− y n → ∞ → n y n y! n × pn → λ n>=100 & p<=0.01 & λ =n p <=20 z Normal Approximation (Binomial(n, p) Æ N ( np, (np(1-p))1/2) ) np >=10 & n(1-p)>10 Slide 181 z Zk = (Yk − 400) / 20 ~ N(0,1)Î Yk ~ N(400, 400). Exponential family and arrival numbers/times z First, let Tk denote the time of the k'th arrival for k = 1, 2, ... The gamma experiment is to run the process until the k'th arrival occurs and note the time of this arrival. z Next, let Nt denote the number of arrivals in the time interval (0, t] for t ≥ 0. The Poisson experiment is to run the process until time t and note the number of arrivals. density function of the k'th arrival time is z How are Tk & Nt related? zNt ≥ k Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Í Î Tk ≤ t Slide 182 fk(t) = (rt)k − 1re−rt / (k − 1)!, t > 0. This distribution is the gamma distribution with shape parameter k and rate parameter r. Again, 1/r is knows as the scale parameter. A more general version of the gamma distribution, allowing non-integer k, Stat 233, UCLA, Ivo Dinov 28 Bayesian Estimation Independence of continuous RVs zMLE z The RV’s {Y1, Y2, Y3, …, Yn} are independent if for any n-tuple {y1, y2, y3, …, yn} zMLE(mean) for N(µ, σ2) and Poisson(λ) P({Y1 ≤ y1} ∩ {Y2 ≤ y2 } ∩ {Y3 ≤ y3 } ∩ ... ∩ {Yn ≤ yn }) = = P(Y1 ≤ y1 ) × P(Y2 ≤ y2 ) × P(Y3 ≤ y3 ) × ... × P(Yn ≤ yn ) Slide 183 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov Law of Total Probability Bayesian Rule z If {A1, A2, …, An} are a partition of the sample space (mutually exclusive and UAi=S) then for any event B P(B) = P(B|A1)P(A1) + P(B|A2)P(A2) +…+ P(B|An)P(An) S Ex: P(B) = P(B|A1)P(A1) + A2 B|A2 B|A1 B Slide 185 P ( Ai | B ) × P ( Ai ) ∑k =1 P(B | Ak )P( Ak ) n P( B | Ai ) × P( Ai ) ∑k =1 P(B | Ak )P( Ak ) n Slide 186 Stat 233, UCLA, Ivo Dinov Classes vs. Evidence Conditioning D = the test person has the disease. z Classes: healthy(NC), cancer T = the test result is positive. z Evidence: positive mammogram (pos), negative Ex: (Laboratory blood test) Assume: Find: P(positive Test| Disease) = 0.95 P(Disease | positive Test)=? P(positive Test| no Disease)=0.01 P(D | T) = ? P(Disease) = 0.005 P(D I T ) P(T | D) × P(D) P(D | T ) = = P(T ) P(T | D) × P( D) + P(T | Dc ) × P(Dc ) = = IB) / P(B) = [P(B | Ai) x P(Ai)] / P(B) = Stat 233, UCLA, Ivo Dinov Bayesian Rule P ( Ai ) = z If {A1, A2, …, An} are a non-trivial partition of the sample space (mutually exclusive and UAi=S, P(Ai)>0) then for any non-trivial event and B ( P(B)>0 ) P(Ai | B) = P(Ai A1 P(B|A2)P(A2) Slide 184 0.95 × 0.005 0.00475 = = 0.193 0.95 × 0.005 + 0.01× 0.995 0.02465 Slide 187 Stat 233, UCLA, Ivo Dinov mammogram (neg) z If a woman has a positive mammogram result, what is the probability that she has breast cancer? P(class | evidence) = P (cancer ) = 0.01 P (evidence | class) × P(class) ∑classes P(evidence | class) × P(class) P ( pos | cancer ) = 0 .8 P( pos | healthy ) = 0.1 P(C|P)=P(P|C)xP(C)/[P(P|C)xP(C)+ P(P|H)xP(H)] P(C|P)=0.8x0.01 / [0.8x0.01 + 0.1x0.99] = ? P (cancer | pos ) = ? Slide 188 Stat 233, UCLA, Ivo Dinov 29 (Log)Likelihood Function Test Results Bayesian Rule True Disease State No Disease Disease z Suppose we have a sample {X1, …, Xn} IID D(θ) with Total probability density function p = p(X | θ). Then the joint density p({X1, …, Xn} | θ) is a function of the (unknown) parameter θ. OK (0.98505) False Negative II 0.9853 Negative False Positive I (0.00995) Positive Total (0.00025) OK (0.00475) 0.0147 0.995 0.005 z Likelihood function l(θ | {X1, …, Xn})= p({X1,…,Xn}|θ) 1.0 ( I DC ) = P(T | DC )× P(DC ) = 0.01× 0.995 = 0.00995 PT Power of Test = 1 – P(TC | D) = 0.95 Sensitivity: TP/(TP+FN) = 0.00475/(0.00475+ 0.00025)= 0.95 z Log-likelihood L(θ|{X1, …, Xn})=Logel(θ|{X1, …, Xn}) z Maximum-likelihood estimation (MLE): z Suppose {X1, …, Xn} IID N(µ, σ2), µ is unknown. We estimate it by:MLE(µ)=µ^=ArgMaxµL(µ| ({X1,…,Xn}) Specificity: TN/(TN+FP) = 0.98505/(0.98505+ 0.00995) = 0.99 Slide 189 Slide 190 Stat 233, UCLA, Ivo Dinov (Log)Likelihood Function (Log)Likelihood Function z Suppose {X1, …, Xn} IID N(µ, σ2), µ is unknown. We estimate it by:MLE(µ)=µ^=ArgMaxµL(µ| ({X1,…,Xn}) − ( x − µ ) 2 2σ 2 n e i MLE ( µ ) = Log = L( µ ) 2 i =1 2πσ ∏ 0 = L' ( µˆ ) = 1 2 2 − ∑in=1 ( xi − µˆ ) 2σ ∑in=1 2( xi − µˆ ) 2σ 2 e n 2πσ 2 2 ( ) ∑n x ⇔ 0 = 2∑in=1( xi − µˆ ) ⇔ µˆ = i =1 i n 2 ∑ n (x − µ ) Similarly can show that : MLE (σ ) = σˆ = i =1 i Stat 233, UCLA, Ivo Dinov . n −1 Hypothesis Testing – the Likelihood Ratio Principle z Let {X1, …, Xn} be a random sample from a density f(x; p), where p is some population parameter. Then the joint density is f(x1,…, xn; p) = f(x1; p)x… xf(xn; p). z The likelihood function L(p) = f(X1,…, Xn; p) z Testing: Ho: p is in Ω vs Ha: p is in Ωa, where Ω I Ωa= 0 z Find max of L(p) in Ω. z Find max of L(p) in Ωa. λ ( x1,..., xn ) = z Find likelihood ratio max L( p ) p∈Ω max L( p ) p∈Ω a z Reject Ho if likelihood-ratio statistics λ is small (λ<k) Slide 193 z Suppose {X1, …, Xn} IID Poisson(λ), λ is unknown. Estimate λ by:MLE(λ)=λ^=ArgMaxλL(λ|({X1,…,Xn}) MLE (λ ) = Log 0 = L' (λˆ ) = . Slide 191 Stat 233, UCLA, Ivo Dinov Stat 233, UCLA, Ivo Dinov = n ∏i=1 e− λ λx i = L (λ ) ( xi )! e− nλ λ∑in=1 xi ∂ = Log n ∂λ ( xi )! i =1 ∏ 1 ∂ ∑n x − nλ + Log (λ ) ∑in=1 xi = − n + ∑in=1 xi ⇔ λˆ = i =1 i . n ∂λ λ ( ) Slide 192 Stat 233, UCLA, Ivo Dinov Hypothesis Testing – the Likelihood Ratio Principle Example z Let {X1, …, Xn}={0.5, 0.3, 0.6, 0.1, 0.2} be IID N(µ, 1) Î f(x;µ). The joint density is f(x1,…,xn; µ)=f(x1;µ)x… xf(xn;µ). −( x − µ ) z The likelihood function L(p) = f(X1,…,Xn; p) 2 e f ( x) = z Testing: Ho: µ>0 is in Ω vs Ha: µ<=0. 2 z Reject Ho if likelihood-ratio statistics λ is small (λ<k) ln(numer) = quadratic in µ! max L( p) λ0 = λ ( x1,..., xn ) = p∈Ω = ln(deno) = quadratic in µ! max L( p ) Maximize both Æ find ratio p∈Ω a 2 − (0.5− µ ) + (0.3− µ ) + (0.6− µ ) + (0.1− µ ) +( 0.2− µ ) 2 max e µ >0 2 2 2 2 2 − (0.5− µ ) + (0.3− µ ) + (0.6− µ ) + (0.1− µ ) +( 0.2− µ ) 2 max e µ ≤0 2 2 2 2 Slide 194 2 Let P(Type I) = α to~1/λο ~ tα,df=4 Îone-sample T-test Stat 233, UCLA, Ivo Dinov 30 Hypothesis Testing – the Likelihood Ratio Principle Example z Testing: Ho: µ>0 is in Ω vs Ha: µ<=0. Reject Ho if likelihood-ratio statistics λ is small (λ<k) λ0 = λ ( x1,..., xn ) = max L( p ) p∈Ω max L( p ) = p∈Ω a − (0.5− µ ) +(0.3− µ ) + (0.6− µ ) + (0.1− µ ) + (0.2− µ ) 2 max e µ >0 2 2 2 2 2 (0.5− µ ) +(0.3− µ ) + (0.6− µ ) + (0.1− µ ) + (0.2− µ ) − 2 max e µ ≤0 2 e − 2 2 2 (0.5− µ ) 2 +(0.3− µ ) 2 + (0.6− µ ) 2 + (0.1− µ ) 2 + (0.2− µ ) 2 2 e − Test Procedure Let P(Type I) = α 2 µ = 0.34 (0.5− µ ) 2 + (0.3− µ ) 2 + (0.6− µ ) 2 + (0.1− µ ) 2 + (0.2− µ ) 2 2 Slide 195 = to~1/λο ~ tα,df=4 Îone-sample T-test 2. Specify a Rejection Region (Example: zo > zα = µ =0 Slide 196 Stat 233, UCLA, Ivo Dinov z Suppose the true MMSE score for AD subjects is ~ N(23, 12). z A new cognitive test is proposed, and it’s assumed that its values are ~ N(25, 12). A sample of 10 AD subjects take the new test. z Hypotheses are: Ho: µtest=25 vs. Ha: µtest<25 (one-sided, more power) • The value of α is specified by the experimenter z α = P(false-positive, Type I, error) = 0.05. • The value of β is a function of α, n, and δ (the difference between the null hypothesized mean and the true mean). For a two sided hypothesis test of a normally distributed population 2 z Critical Value for α is Zscore= -1.64. Thus, Xavgcritical = Zcritical*σ+µ z Xavgcritical = 25-1.64=23.4, And our conclusion, from {X1, …, X10} which yields Xavg will be reject Ho, if Xavg < 23.4. z β=P(fail to reject Ho|Ho is false)=P(Xavg>=23.4|Xavg ~N(23,12/10)) δ n δ n − Φ − Z α + σ σ 2 z Note: Xavg ~N(23,12/10)) when it’s given that X ~N(23,12)) z Standardize: Z = (23.4 – 23)/(1/10) = 4.0 • It is not true that α =1- β (RHS=this is the test power!) Slide 197 Slide 198 Stat 233, UCLA, Ivo Dinov Type I, Type II Errors & Power of Tests z Suppose the true MMSE score for AD subjects is ~ N(23, Stat 233, UCLA, Ivo Dinov Type I, Type II Errors & Power of Tests β = Pr{Fail to Reject H 0 H 0 is False} ) e−0.086 = e−0.269 = 0.68 e-0.375 α = Pr{Reject H 0 H 0 is true} 2 3. The null hypothesis is rejected iff the computed value for the statistic falls in the rejection region Type I and Type II Errors β = Φ Z α + 1. Calculate a Test Statistic (Example: zo) Stat 233, UCLA, Ivo Dinov Another Example –Type I and II Errors & Power 12). Size-of-studied-effect: effect-size increase Î power increase z About 75% of all 80 year old humans are free of amyloid plaques and tangles, markers of AD. A new AD vaccine is proposed that is supposed to increase this proportion. Let p be the new proportion of subjects with no AD characteristics following vaccination. Ho: p=0.75, H1: p>0.75. z X = number of AD tests with no pathology findings in n=20 80-y/o vaccinated subjects. Under Ho we expect to get about n*p=15 no AD results. Suppose we’d invest in the new vaccine if we get >= 18 no AD tests Î rejection region R={18, 19, 20}. Type of Alternative hypothesis: 1-sized tests are more powerful z Find α and β. How powerful is this test? z A new cognitive test is proposed, and it’s assumed that its values are ~ N(25, 12). A sample of 10 AD subjects take the new test. z β=P(fail to reject Ho|Ho is false)=P(Xavg>=23.4|Xavg ~N(23,12/10)) z Note: Xavg ~N(23,12/10)) when it’s given that X ~N(23,12)) z Standardize: Z = (23.4 – 23)/(1/10) = 4.0. z β=P(fail to reject Ho|Ho is false)=P(Z>4.0) = 0.00003 z Î Power (New Test) = 1 – 0.00003 = 0.99997 ∃different β for each different z How does Power(Test) depend on: α, true mean µ, alternative H Sample size, n=10: n-increase Î power increase Slide 199 Stat 233, UCLA, Ivo Dinov a Slide 200 Stat 233, UCLA, Ivo Dinov 31 Another Example –Type I and Type II Errors z Ho: p=0.75, H1: p>0.75. X = number of test with no AD findings in n=20 experiments. z X~Binomial(20, 0.75). Rejection region R={18, 19, 20}. z Find α =P(Type I) = P(X>=18 when X~Binomial(20, 0.75)). z Use SOCR resource Î α =1-0.91= 0.09 How does Power(Test) depend on n, effect-size? z Find β(p=0.85) =P (Type II) = Distribution Theory z Discrete Distributions z Continuous distributions z Normal Approximation to Binomial and Poisson z Chi-Square z Fisher’s F distributions P(fail to reject Ho | X~Binomial(20, 0.85))=P(X<18 | X~Binomial(20, 0.85)) Use SOCR resource Î β =0.595ÎPower of test = 1- β =0.405 z Find β(p=0.95) =P (Type II) = P(fail to reject Ho | X~Binomial(20, 0.95))=P(X<18 | X~Binomial(20,0.95)) Use SOCR resource Î β =0.076 ÎPower Slide 201 of test = 1- β = 0.924 Stat 233, UCLA, Ivo Dinov Slide 202 Stat 233, UCLA, Ivo Dinov 32