Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC Insurance Solutions Irvine, California Presentation Outline • Introduction to the problem • Introduction to Bootstrap Resampling • Two resampling approaches for comparing two groups • Examples • Conclusions Introduction to the Problem • Compare two groups from observational data – Outcome (Y) {e.g. Claim Cost} – Characteristics (X) have distributions F1 and F2 • Difficulties – F1 F2 – X is associated with Y (i.e. X is a confounder) – example: claim severity associated with claim cost Introduction to the Problem • Ideal solution – Randomize subjects into the two groups – Ideal solution not usually possible • Alternate solution {Topic of the paper} – Identify characteristics where F1 F2 – Adjust the distribution of Y to account for the differing distributions of X Introduction to Bootstrap Resampling • Purpose – Obtain the distribution of a parameter estimate (i.e. sampling distribution) – Not rely on assumptions about the underlying distribution – Often used when parameter estimate • has difficult to obtain distribution • relies heavily on unrealistic assumptions Introduction to Bootstrap Resampling • Given Data – {X1, X2, …, Xn} where Xi is a p x 1 vector – X has unspecified distribution F • Parameter of interest Q – Q = T(F) is a parameter of interest • We want the distribution of ˆ T ( Fˆ ) Q Introduction to Bootstrap Resampling • Distribution of Q̂ – usually obtained through theoretical properties if repeated sampling is performed on a population with a known distribution of X – bootstrap techniques resample from the data to simulate repeated sampling from the population with unknown distribution of X Introduction to Bootstrap Resampling Example -- Population Mean • Example -- Population Mean Q T ( F ) xdF ( x) • Resample with replacement from data – Data is (X1, …, Xn). – Each data point equally likely to be selected – Resampled data is (X(b)1, …, X(b)n). ˆ (b) is the bth bootstrap estimate of m –Q (b ) (b ) ˆ ˆ Q xdF ( x) 1 n X (b) i X (b ) Introduction to Bootstrap Resampling Example -- Population Mean • B bootstrap samples are drawn • Distribution of Q̂ is estimated with the empirical distribution function of ˆ (1) , ..., Q ˆ ( B) ) (Q • Mean and variance of this distribution used to estimate mean and variance of Q̂ Two Resampling Methods for Comparing Two Groups • Method 1: Normalized comparisons – Y is a response of interest – X is a category variable, confounder – Z=1 for group 1, Z=2 for group 2 – F(Y|Z=1) normalized for distribution of X in group 2 F ( 2) (Y | Z 1) F (Y | Z 1, X x j ) P( X x j | Z 2) all x – F(Y|Z=2) non- normalized F (Y | Z 2) F (Y | Z 2, X x j ) P( X x j | Z 2) all x Two Resampling Methods for Comparing Two Groups • Method 1: Normalized comparisons – Resample from (Yi,Xi) seperately for groups 1 and 2 – Construct estimates of F(Y|X=xj) and P(X=xj) for two groups – Construct estimates of the normalized distribution functions on the previous slide – Parameter estimates can be obtained from this Two Resampling Methods for Comparing Two Groups • Method 2: Bootstrapping linear regression – – – – Y is a response of interest X is vector of variables, confounders Z=1 for group 1, Z=2 for group 2 Use the regression model Y I ( Z 2) X ' Two Resampling Methods for Comparing Two Groups • Method 2: Bootstrapping linear regression – Estimate (, , ) with (ˆ , ˆ , ˆ ) the least squares estimates on original data – Resample with replacement from the residuals – Construct the bth bootstrap value of Y as Yi (b) ˆ ˆI (Zi 2) X i ' ˆ i(b ) – bth bootstrap sample is (Yi ( b ) , Z i , X i ) i 1, ..., n Two Resampling Methods for Comparing Two Groups • Method 2: Bootstrapping linear regression – Construct estimates of (, , ) with (ˆ , ˆ , ˆ ) ( b ) the least squares estimates on bootstrap sample – Using the B bootstrap estimates of (, , ), construct the distribution of the parameters of interest Examples Using Data from a Nationwide Data Base of Workers Compensation Claims • Normalized comparisons of percentiles – – – – Y= Total claim cost Group 1: Providers in network A Group 2: Providers not in network A X is a 10 level variable representing claim severity derived through ICD9 code on a claim – B = 500 bootstrap sample drawn – median, 75th, and 95th percentiles compared – Normalization relative to group 1 Examples Using Data from a Nationwide Data Base of Workers Compensation Claims • Normalized comparisons of percentiles Examples Using Data from a Nationwide Data Base of Workers Compensation Claims • Bootstrapping linear regression – Y = log(Total Indemnity Costs) – X consists of several variables • NCCI body part designation, nature of injury designation, accident cause, industry class code, and injury type • 10 level claim severity measure derived with ICD9 code • Age and gender – Group 1: Specific provider of interest (Provider Z) – Group 2: All other providers – B=500 bootstrap samples Examples Using Data from a Nationwide Data Base of Workers Compensation Claims Conclusions • Bootstrap methodology is a flexible robust method for deriving sampling distributions • Can be used to compare two groups while considering possible confounder variables • Useful method for observational studies • Only a few examples shown in this paper/presentation, much more potential