Healthcare Operations Management An Integrated Approach to Improving Quality and Efficiency CHAPTER 7. USING DATA AND STATISTICAL TOOLS FOR OPERATIONS IMPROVEMENT Daniel B. McLaughlin Julie M. Hays Chapter 7.Using Data and Statistical Tools for Operations Management • • • • • • • Data Collection Graphical Tools Mathematical Descriptions Probability and Probability Distributions Confidence Intervals, Hypothesis Tests ANOVA/MANOVA /MANCOVA Regression Copyright 2008 Health Administration Press. All rights reserved. 7-2 Data Collection • Validity: A valid study has no logic, sampling, or measurement errors. - Logic - Selection or sampling - Measurement Copyright 2008 Health Administration Press. All rights reserved. 7-3 Data Collection Diagram created in Inspiration® by Inspiration Software®, Inc. Copyright 2008 Health Administration Press. All rights reserved. 7-4 Data Collection Logic • Why are the data needed? • What will the data be used for? • What questions are going to be asked of the data? • Are the patterns of the past going to be repeated in the future? Copyright 2008 Health Administration Press. All rights reserved. 7-5 Data Collection Selection or Sampling • • • • • • • Census versus sample Nonrandom methods Simple random sampling Stratified sampling Systematic or sequential sampling Cluster or area sampling Sample size Copyright 2008 Health Administration Press. All rights reserved. 7-6 Data Collection Measurement • Accuracy • Precision - How precise should the measurements be? - Does the measurement measure what we want it to measure (i.e., say = do)? • Reliability - Would the measurement be the same if we repeated it? Reliable, but not accurate Reliable and accurate Copyright 2008 Health Administration Press. All rights reserved. Not reliable, but accurate 7-7 Graphical Tools • • • • • • Mapping Visual representations of data Histograms and Pareto charts Stem plots, dot plots Box (and whisker) plots Normal probability plots Copyright 2008 Health Administration Press. All rights reserved. 7-8 Graphical Tools Histograms and Pareto Charts Length of Hospital Stay Diagnosis Category 14 12 10 Frequency 12 Frequency 10 8 8 6 4 6 2 4 0 2 H rt ea 0 1-2 3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18 D e as e is s s ia m es re s s on u o t la em ch ac op r y u e F Ps Pn tN n na ig l a M y D er iv l e Length of Hospital Stay (days) Diagnosis Microsoft Excel® screen shots reprinted with permission from Microsoft Corporation. Copyright 2008 Health Administration Press. All rights reserved. 7-9 Graphical Tools Dot Plots Dotplot of C1 Length of Hospital Stay 3 6 9 12 Days 15 18 Produced with Minitab® Statistical Software Copyright 2008 Health Administration Press. All rights reserved. 7-10 Graphical Tools Turnip Graph Percentage of diabetic Medicare enrollees receiving eye exams among 306 hospital referral regions (2001) Source: Wennberg, J. E. 2005. Data from the Dartmouth Atlas Project. Figure copyrighted by the Trustees of Dartmouth College. Used with permission. Copyright 2008 Health Administration Press. All rights reserved. 7-11 Graphical Tools Normal Probability Plots Length of Hospital Stay 1.00 .75 .50 .25 0.00 0.00 .25 .50 .75 1.00 Observed Cumulative Probability Produced with SPSS for Windows Copyright 2008 Health Administration Press. All rights reserved. 7-12 Graphical Tools Scatter Plots Strong Positive Correlation Strong Negative Correlation Y Y r = -0.86 X r = 0.91 Positive Correlation X No Correlation Y Y r = 0.70 X r = 0.06 X Microsoft Excel® screen shots reprinted with permission from Microsoft Corporation. Copyright 2008 Health Administration Press. All rights reserved. 7-13 Mathematical Descriptions Mean • The mean is the arithmetic average of the population: Population mean μ x , where x individual values and N N number of values in the population . • The population mean can be estimated from a sample: x Sample mean x , where n number of values in the sample. n For our simple data set, x 36853 5. 5 Copyright 2008 Health Administration Press. All rights reserved. 7-14 Mathematical Descriptions Median and Mode • The median is the middle value of the sample or population. If the data are arranged into an array (an ordered data set): 3, 3, 5, 6, 8 5 would be the middle value or median. • The mode is the most frequently occurring value. In the above example, the value 3 occurs more often (two times) than any other value, so 3 would be the mode. Copyright 2008 Health Administration Press. All rights reserved. 7-15 Mathematical Descriptions Range and Mean Absolute Deviation • The range is the difference between the high and low values in a data set. Range x high x low 8 3 5 • The mean absolute deviation (MAD) is the average of the absolute value of the differences from the mean. xx MAD n 2 2 0 1 3 8 1.6 5 5 Copyright 2008 Health Administration Press. All rights reserved. 7-16 Mathematical Descriptions Variance, Standard Deviation • The variance is the average square difference from the mean. (x μ) 4 4 0 1 9 18 Population variance σ 3.6 2 2 Sample variance s 2 N (x x)2 n-1 5 5 4 4 0 1 9 18 4.5 5 1 4 • This standard deviation is the square root of the variance. (x μ) 2 Population standard deviation σ 2 Sample standard deviation s 2 N (x x) n 2 4 4 0 1 9 18 3.6 1.9 5 5 4 4 0 1 9 18 4.5 2.1 5 1 4 Copyright 2008 Health Administration Press. All rights reserved. 7-17 Mathematical Descriptions Coefficient of Variation The coefficient of variation (CV) is a measure of the relative variation in the data. It is the standard deviation divided by the mean. σ s 1.9 CV or 0.4 μ x 5 Copyright 2008 Health Administration Press. All rights reserved. 7-18 Probability and Probability Distributions • • • • • Determination of probabilities Properties of probabilities Probability distributions Discrete probability distributions Continuous probability distributions Copyright 2008 Health Administration Press. All rights reserved. 7-19 Determination of Probabilities Observed Probability Observed probability is the relative frequency of an event—the number of times the event occurred divided by the total number of trials. P(A) Number of times A occured r Total number of observatio ns, trials, or experiment s n Number of times patients are cured r P (drug is effective) Total number of patients given the drug n Copyright 2008 Health Administration Press. All rights reserved. 7-20 Determination of Probabilities Theoretical Probability Theoretical probability is the theoretical relative frequency of an event; the theoretical number of times an event will occur divided by the total number of possible outcomes. Number of times A could occur r P(A) Total number of possible outcomes n Number of spades in the deck 13 P (card is a spade) 0.25 Total number of cards in the deck 52 Copyright 2008 Health Administration Press. All rights reserved. 7-21 Determination of Probabilities Opinion Probability Opinion probability is a subjective determination of the number of times an event will occur divided by the imaginary total number of possible outcomes or trials. P(A) Opinion of number of times an event will occur r Theoretica l total n P (Secretari at winning the Belmont Stakes) Opinion on the number of times Secretariat would win the Belmont r Imaginary total number of times the Belmont would be run n Copyright 2008 Health Administration Press. All rights reserved. 7-22 Properties of Probabilities Bounds on Probability • Probabilities always must be 0, and an event that cannot occur has a probability of 0. P(A) Least number of times A could occur 0 0 Total number of possible outcomes Any number • Probabilities must always be 1. P(A) Greatest number of times A could occur n 1 Total number of possible outcomes n 0 P(A) 1 • P(A) + P(A') = 1 and 1 − P(A') = P(A), where A' is not A. Copyright 2008 Health Administration Press. All rights reserved. 7-23 Properties of Probabilities Multiplicative Property For two independent events, the probability of both A and B occurring, or the intersection () of A and B, is the probability of A occurring times the probability of B occurring. P(A and B occurring) = P(A B) = P(A) x P(B) Copyright 2008 Health Administration Press. All rights reserved. 7-24 Properties of Probabilities Multiplicative Property Coin Toss H Die Toss Probability 1 1/12 2 1/12 3 1/12 4 1/12 5 1/12 6 1/12 1 1/12 2 1/12 3 1/12 4 1/12 5 1/12 6 1/12 P(3) = 1/6 P(H) × P(3) = P(H 3) = 1/12 Start T P(H) = 1/2 Copyright 2008 Health Administration Press. All rights reserved. 1/2 × 1/6 = 1/12 7-25 Properties of Probabilities Additive Property • For two events, the probability of A or B occurring, or the union () of A with B, is the probability of A occurring plus the probability of B occurring, minus the probability of both A and B occurring. P(A or B occurring) = P(A B) = P(A) + P(B) + P(A B) Copyright 2008 Health Administration Press. All rights reserved. 7-26 Properties of Probabilities Additive Property Coin Toss H Die Toss Probability 1 1/12 2 1/12 3 1/12 4 1/12 5 1/12 6 1/12 P(H 3) = 7/12 Start T P(H) = 1/2 1 1/12 2 1/12 3 1/12 4 1/12 5 1/12 6 1/12 P(3) = 1/6 Copyright 2008 Health Administration Press. All rights reserved. P(H) + P(3) − P(H 3) = 7/12 7-27 Properties of Probabilities Conditional Probability The probability of an event occurring if more information is obtained: P( A B) P( A B) P (B ) Contingency Table for ER Wait Times 30 minute wait >30 minute wait Friday night 20 30 50 Other times 40 10 50 60 40 100 Copyright 2008 Health Administration Press. All rights reserved. 7-28 Properties of Probabilities Conditional Probability • Note that: P ( A B) P ( A B) P (B) P (B A) P ( A) and if one event has no effect on the other event (the events are independent), then . P( A B) P( A) and P ( A B) P ( A) P (B) • Bayes’ theorem P (B A) P ( A) P ( A B ) P (B A) P ( A) P ( A B) P (B ) P (B ) P (B A) P ( A) P (B A) P ( A) Copyright 2008 Health Administration Press. All rights reserved. 7-29 Probability Distributions Discrete Probability Distributions 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 The Poisson distribution is used to model the number of events in a specific period. e x P( x ) x! 0.25 0.2 Probability Probability The binomial distribution describes the number of times a binary event will occur in a sequence of events. n! P(x) p x( 1 p)n x x!(n x)! 0.15 0.1 0.05 0 0 1 2 3 Number of Heads in 3 Tosses Copyright 2008 Health Administration Press. All rights reserved. 1 2 3 4 5 6 7 8 9 10 11 Number of Patient Arrivals in 1 Hour 7-30 Probability Distributions Continuous Probability Distributions In the uniform distribution, the probability of occurrence is the same for all outcomes. 1 for a x b ba 2(x a) (b a)(c a) for a x c P(x) 2(b x) for c x b (b a)(b c) Min = 0.0, Mode = 0.5, Max = 2.0 1.2 1 0.8 0.6 0.4 0.2 0 P(X) P(X) P( x ) The triangular distribution is described by the mode, minimum, and maximum values. a X b 0 0.2 0.4 0.6 0.8 Copyright 2008 Health Administration Press. All rights reserved. 1 X 1.2 1.4 1.6 1.8 7-31 2 Probability Distributions Exponential Distribution The exponential distribution is used to model arrival rate, the rate of occurrence of an event. P( x ) e x for x 0 = mean = 1/, median = ln(2)/, mode = 0, and = 1/ 2 P(X) 1.5 lambda = 2 1 0.5 0 0 X 1 Copyright 2008 Health Administration Press. All rights reserved. 2 7-32 Probability Distributions Normal Distribution P(x) 1 2Πσ 2 e 0.6 0, 1.0 0, 2.5 2, 0.7 0.4 P(X) The normal distribution, x ~N(,2), is commonly observed in the world and provides a reasonable approximation for many randomly distributed variables. 0.2 (x μ) 2 / 2σ 2 0 -5 Copyright 2008 Health Administration Press. All rights reserved. -3 -1 X 1 3 5 7-33 Probability Distributions Standard Normal Distribution z-score limits Proportion within the limits (if normally distributed) +/− 1 z 0.680 +/− 2 z 0.950 +/− 3 z 0.997 0, 1.0 0.4 P(X) The standard normal distribution, z distribution, is the normal distribution with = 0 and = 1.0. Any normal distribution can be transformed to a standard xμ normal distribution by: z σ 0.2 0 Copyright 2008 Health Administration Press. All rights reserved. -5 -3 -1 1 3 5 X 7-34 Confidence Intervals, Hypothesis Testing • • • • • • Central Limit Theorem Hypothesis testing Type I () and Type II () errors T-tests Proportions Practical significance versus statistical significance Copyright 2008 Health Administration Press. All rights reserved. 7-35 Confidence Intervals, Hypothesis Testing Central Limit Theorem • As the sample size becomes large, the sampling distribution of the mean approaches normality, no matter what the distribution the original variable, and x and x n Sampling Distribution Simulation Copyright 2008 Health Administration Press. All rights reserved. 7-36 Confidence Intervals Confidence interval for the true value of the population mean: x z / 2 * x x z / 2 * x x z / 2 * n . x z / 2 * n 95% P(X) 0.4 0.2 2.5% 2.5% 0 -3 -2 -1 0 Z Copyright 2008 Health Administration Press. All rights reserved. 1 2 3 7-37 Hypothesis Testing • Belief or null hypothesis, Ho: = b • Alternate belief or hypothesis, Ha: b • Decision rule: If z z* , reject the null hypothesis. Where z x : x -Z*< Z < Z* (95% confidence) P(X) 0.4 0.2 Z<-Z* Z>Z* 0 -3 -2 -1 0 Z Copyright 2008 Health Administration Press. All rights reserved. 1 2 3 7-38 Hypothesis Testing Type I () and Type II () Errors Ho: 1=2 Ha: 12 Type I and Type II Error—Clinic Wait Time Example Reality Wait times at Wait times at the the two clinics two clinics are are the same NOT the same 1=2 Wait times at the two clinics are the Assesssame ment or Wait times at the guess two clinics are NOT the same Type II or error 1=2 12 Copyright 2008 Health Administration Press. All rights reserved. 12 Type I or error 7-39 Equal Variance t-Test • t-tests are used to test hypotheses about two means. • Ho: 1=2 Ha: 12 • Decision rule: If t t*, reject Ho (x1 x2 ) (μ1 μ2 ) t 1 1 sp n1 n2 (n1 1)s12 (n2 1)s22 where s p n1 n2 2 • Confidence interval 1 1 1 1 * ( x1 x 2 ) t * s p 1 2 ( x1 x 2 ) t * s p n1 n2 n1 n2 * Copyright 2008 Health Administration Press. All rights reserved. 7-40 Proportions Ho: 1= 2 Ha: 12 Decision rule: If z z*, reject Ho ( p1 p2 ) (1 2 ) n1p1 n2 p2 z where p p(1 p ) p(1 p ) n1 n2 n1 n2 Confidence interval ( p1 p2 ) z * p(1 p) p(1 p) p(1 p) p(1 p) 1 2 ( p1 p2 ) z * n1 n2 n1 n2 Copyright 2008 Health Administration Press. All rights reserved. 7-41 Practical Significance Versus Statistical Significance • Basic confidence interval statistic – [(z*) * (s.e. statistic)] parameter statistic + [(z*) * (s.e. statistic)] • As n increases, s.e. decreases and the confidence interval gets larger. • Large samples may give statistically significant results that are not practically significant. Copyright 2008 Health Administration Press. All rights reserved. 7-42 ANOVA/MANOVA/MANCOVA • One-way ANalysis Of VAariance (ANOVA) is used to test hypotheses about three or more levels of treatment. A t-test will give the same information as an ANOVA when there are only two treatment levels of interest. • Two-way and higher ANOVAs are used when there is more than one type of treatment variable of interest. • MANOVA/MANCOVA are used when there is more than one outcome or dependent variable of interest. Copyright 2008 Health Administration Press. All rights reserved. 7-43 Regression • Simple linear regression—used to describe the relationship between two variables • Multiple regression—used to describe the relationship between multiple predictor variables and a single dependent variable • General linear model • Artificial neural networks • Design of experiments Copyright 2008 Health Administration Press. All rights reserved. 7-44 What Is the Equation of a Line? Algebra y mx b Statistics Ŷ bX a Where rise Δy b slope run Δx a y intercept Copyright 2008 Health Administration Press. All rights reserved. y, when x 0 7-45 Problem Student A owns a health insurance firm and wants us to determine the cost (price would be a more difficult problem) of providing healthcare to insured individuals. Copyright 2008 Health Administration Press. All rights reserved. 7-46 Seeing the Future Data Experiences are relevant Judgment: To what degree are these experiences still relevant? Experiences are irrelevant Deductive reasoning versus inductive reasoning Copyright 2008 Health Administration Press. All rights reserved. 7-47 What Is the Cost of Healthcare Related To? Quantitative ______________ ______________ ______________ ______________ ______________ ______________ Copyright 2008 Health Administration Press. All rights reserved. Qualitative _____________ _____________ _____________ _____________ _____________ _____________ 7-48 Selection • • • • Define population Census or sample Type of sample Measurement—accurate, reliable, precise? X = number of dependents; Y = annual healthcare expense ($1,000) • Is the study valid? • How do we create knowledge from data? Copyright 2008 Health Administration Press. All rights reserved. 7-49 Data Number of Dependents 0 Annual Healthcare Expense ($1,000) 3 1 2 2 6 3 7 4 7 Copyright 2008 Health Administration Press. All rights reserved. 7-50 Scatterplot Y—Annual Healthcare Cost $1,000 10 y = 1.3x + 2.4 9 8 7 6 y=x+3 5 y=5 y = 1.2x + 2 4 3 2 1 0 0 1 2 3 4 5 6 X—Number of Dependents Copyright 2008 Health Administration Press. All rights reserved. 7-51 Scatterplot Questions • Which is the “best” line on the scatterplot? • How would you define “best” (e.g., must be quantifiable)? Copyright 2008 Health Administration Press. All rights reserved. 7-52 Professor’s Model ˆ bX a Y ˆ cost estimate ($1,000) Y a Y intercept 3 Y b slope 1 X ˆ 1X 3 knowledge Y Copyright 2008 Health Administration Press. All rights reserved. 7-53 Model Comparison Prof’s Yˆ 1.2( X ) Yˆ 1.3( X ) e= 2 2 .4 Y − Yhat Student 1 Student 2 e e X Y Yhat = X+3 0 3 3 0 −1 −0.6 1 2 4 -2 1.2 1.7 2 6 5 1 −1.6 −1 3 7 6 1 −1.4 −0.7 4 7 7 0 −0.2 0.6 0 −3 0 (sum) Copyright 2008 Health Administration Press. All rights reserved. 7-54 Good Model • A good model must be unbiased. e = 0 • Is that enough? What else? Does this remind you of 2? • How do we get rid of signs? Copyright 2008 Health Administration Press. All rights reserved. 7-55 Model Comparison X Y Yhat = X+3 e= Y − Yhat e2 Student 1 e2 0 3 3 0 0 1 1 2 2 −2 4 1.44 2 6 6 1 1 2.56 3 7 7 1 1 1.96 4 7 7 0 0 0.04 (sum) 25 25 0 6 7 Copyright 2008 Health Administration Press. All rights reserved. 7-56 Least Squares Technique Gauss proved that if you use: (Y Y)(X X) b and a Y bX 2 (X X) You are guaranteed that e = 0 and e2 is a minimum. Yhat = 1.3X + 2.4, e = 0, and e2 = 5.1. Copyright 2008 Health Administration Press. All rights reserved. 7-57 Coefficient of Determination Are we better off making estimates by using information (X = number of dependents) and having created knowledge (Yhat = 1.3X + 2.1) than using no information or knowledge (i.e., is the model “better”)? How would you estimate without using our knowledge (our model)? Copyright 2008 Health Administration Press. All rights reserved. 7-58 Sum of Squares Total X Y Yhat = Ybar e=Y− Ybar SSTO (Y − Ybar)2 0 3 5 −2 4 1 2 5 −3 9 2 6 5 1 1 3 7 5 2 4 4 7 5 2 4 (sum) 25 25 0 22 Note that this method is unbiased. Copyright 2008 Health Administration Press. All rights reserved. 7-59 Graph 10 Y—Annual Healthcare Cost $1,000 9 8 7 6 5 y=5 4 3 2 1 0 0 1 2 3 4 5 6 X—Number of Dependents Copyright 2008 Health Administration Press. All rights reserved. 7-60 Y—Annual Healthcare Costs $1,000 Errors 8 7 6 5 4 3 2 1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 X—Number of Dependents Copyright 2008 Health Administration Press. All rights reserved. 7-61 Sum of Squares Error e= Y− Yhat SSE e2 = (Y − Yhat)2 Ybar Y− Ybar SSTO (Y − Ybar)2 X Y Yhat = 1.3X + 2.4 0 3 2.4 0.6 0.36 5 −2 4 1 2 3.7 −1.7 2.89 5 −3 9 2 6 5 1.0 1.00 5 1 1 3 7 6.3 0.7 0.49 5 2 4 4 7 7.6 −0.6 0.36 5 2 4 (sum) 25 25 0 5.1 25 0 22 Copyright 2008 Health Administration Press. All rights reserved. 7-62 Coefficient of Determination What is the percentage of improvement when we use knowledge gained from our model? New error level old error level % improvemen t Old error level 5.1 22 16.9 100 77% 22 22 r2 = coefficient of determination = 77% r2 = 0.77 Copyright 2008 Health Administration Press. All rights reserved. 7-63 Another Viewpoint Variation in cost of removal is either explained by knowledge (the model) or not explained. Copyright 2008 Health Administration Press. All rights reserved. 7-64 Explained and Unexplained Error Y—Annual Healthcare Costs $1,000 8 7 6 5 4 3 ----- Explained 2 ___ Unexplained 1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 X—Number of Dependents Copyright 2008 Health Administration Press. All rights reserved. 7-65 Sum of Squares Regression e= Y− Yhat SSE e2 = (Y − Yhat)2 SSTO (Y − Ybar)2 Yhat – Ybar SSR (Yhat − Ybar)2 X Y Yhat = 1.3X + 2.4 0 3 2.4 0.6 0.36 5 −2 4 −2.6 6.76 1 2 3.7 −1.7 2.89 5 −3 9 −1.3 1.69 2 6 5 1.0 1.00 5 1 1 0 0 3 7 6.3 0.7 0.49 5 2 4 1.3 1.69 4 7 7.6 −0.6 0.36 5 2 4 2.6 6.76 35 (sum) 25 0 5.1 25 0 22 0 16.9 Y Y− bar Ybar Copyright 2008 Health Administration Press. All rights reserved. 7-66 Coefficient of Determination Explained SSR 16.9 r 0.77 Total SSTO 22.0 2 Note: r2 is not based on statistics or probability; it is just a percentage. Copyright 2008 Health Administration Press. All rights reserved. 7-67 Correlation Coefficient r = r2 r = Correlation coefficient = Measure of the strength of the linear relationship between two variables −1 r 1 r = −1 Copyright 2008 Health Administration Press. All rights reserved. r = +1 7-68 Correlation Coefficient Examples r = 0.0 r = 0.9 r = −0.5 Copyright 2008 Health Administration Press. All rights reserved. 7-69 Coefficient of Determination Questions: • If r2 is low, does that mean there is no relationship between your variables? • If r2 is high (close to 1), does that mean you always get useful predictions from your model? • If r2 is high, does that mean your model has a “good” fit? Copyright 2008 Health Administration Press. All rights reserved. 7-70 2 r and Curves • Can we fit a straight line to this? • Yes, and we are guaranteed that the errors sum to zero and are a minimum. • However, a curve would be better. Y Copyright 2008 Health Administration Press. All rights reserved. X 7-71 Excel Output To get this sheet, go to Tools -> Data Analysis -> Regression. If you don't have Data Analysis listed in your tools, see Excel help "Install and Use the Analysis ToolPak.” X—Number of Dependents SUMMARY OUTPUT Regression Statistics Multiple R 0.8765 R Square 0.7682 Adjusted R Square 0.6909 Standard Error 0.8790 Observations 5 SS 7.6818 2.3182 10 Coefficients Standard Error -0.9545 1.0162 MS 7.6818 0.7727 F Significance F 9.9412 0.0511 Predicted X —Number of Dependents t Stat P-value -0.9393 0.4169 Residual Plot 1.0000 0.5000 Lower 95% Upper 95% Lower 90.0% 0.0000 Upper 90.0% -4.1885 2.2794 -3.3460 1.4369 2 -0.5000 0 4 6 8 -1.0000 0.5909 RESIDUAL OUTPUT Predicted X Number of Observation Dependents 1 0.8182 2 0.2273 3 2.5909 4 3.1818 5 3.1818 0.1874 3.1530 Standard Residuals Residuals -0.8182 -1.0747 0.7727 1.0150 -0.5909 -0.7762 -0.1818 -0.2388 0.8182 1.0747 0.0511 -0.0055 1.1873 PROBABILITY OUTPUT X - Number of Percentile Dependents 10 0 30 1 50 2 70 3 90 4 Copyright 2008 Health Administration Press. All rights reserved. Y—$ 1,000 Annual Healthcare Expense 1.0320 0.1499 Normal Probability Plot X—Number of Dependents Intercept Y - $ 1000 Annual Health Care Expense 1 3 4 Residuals df X—Number of Dependents 0 2 4 6 8 Y—$ 1,000 Annual Healthcare Expense ANOVA Regression Residual Total Line Fit Plot 5 4 3 2 1 0 5 0 0 20 40 60 80 100 Sample Percentile 7-72 F Test MSR SSR / 1 F* MSE SSE / n 2 If F* > F(1-;1;n-2), reject H0: = 0 (in this case) MSR/MSE 1 = 0 MSR/MSE big 0 Copyright 2008 Health Administration Press. All rights reserved. 7-73 Assumptions of Linear Regression Linear regression is based on several assumptions. If these assumptions are violated, the resulting model will be misleading. The principal assumptions are: - The dependent and independent variables are linearly related. - The errors associated with the model are not serially correlated. - The errors are normally distributed and have constant variance. Copyright 2008 Health Administration Press. All rights reserved. 7-74 Transformations X Y Transform X ->X2 −3 9 9 −2 4 4 −1 1 1 0 0 0 1 1 1 2 4 4 3 9 9 Y If the variables are not linearly related or the assumptions of regression are violated, the variables can be transformed to produce a possibly better model. 10 8 6 4 2 0 0 2 Copyright 2008 Health Administration Press. All rights reserved. 4 6 8 10 X2 7-75 Multiple Regression • Multiple independent variables are used to predict a single dependent variable to “improve” the model. • Y = + 1X1 + 2X2 + … + kXk + • Multicollinearity can be a problem. Copyright 2008 Health Administration Press. All rights reserved. 7-76 General Linear Model • The most general of all linear models • Multiple predictor variables: - Metric - Categorical - Both • Multiple dependent variables: - Metric - Categorical - Both • Can be used to build complex models Copyright 2008 Health Administration Press. All rights reserved. 7-77 Artificial Neural Networks Neural Networks • Large amounts of data • No explanation of how/why • Used to predict outcomes Traditional Models • Limited amount of data • Model explains how/why • Used to predict outcomes Copyright 2008 Health Administration Press. All rights reserved. 7-78 Outline for Analyses 1. Define the problem/question. 2. Determine what data will be needed to address the problem question. 3. Collect the data. 4. Graph the data. 5. Analyze the data using the appropriate tool. 6. “Fix” the problem. 7. Evaluate the effectiveness of the “fix.” 8. Start again. Copyright 2008 Health Administration Press. All rights reserved. 7-79 Choice of Statistical Technique Independent Variable Categorical Dependent Variable One Categorical Metric Many Categorical Metric Mathematical Graphical One 2 Many 2 (layered) One t-Test Histogram type Many MANOVA Box plot One 2 Many 2 (layered) One ANOVA Many MANOVA Both Copyright 2008 Health Administration Press. All rights reserved. Box plots GLM 7-80 Choice of Statistical Technique Independent Variable Metric Dependent Variable One Categorical Mathematical One Graphical Logit Many GLM Metric One Simple regression Scatterplot Many GLM Both Many Categorical MANCOVA One Logit Many GLM Metric One Multiple regression Many GLM Both GLM; neural net Copyright 2008 Health Administration Press. All rights reserved. 7-81 Choice of Statistical Technique Independent Variable Dependent Variable Both Categorical Metric Mathematical One ANCOVA Many MANCOVA One Simple regression Many Multiple regression Both Copyright 2008 Health Administration Press. All rights reserved. Graphical GLM Neural Net 7-82