Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’

advertisement
Using Resampling Techniques to Measure the
Effectiveness of Providers in Workers’
Compensation Insurance
David Speights
Senior Research Statistician
HNC Insurance Solutions
Irvine, California
Presentation Outline
• Introduction to the problem
• Introduction to Bootstrap Resampling
• Two resampling approaches for comparing
two groups
• Examples
• Conclusions
Introduction to the Problem
• Compare two groups from observational data
– Outcome (Y) {e.g. Claim Cost}
– Characteristics (X) have distributions F1 and F2
• Difficulties
– F1 F2
– X is associated with Y (i.e. X is a confounder)
– example: claim severity associated with claim cost
Introduction to the Problem
• Ideal solution
– Randomize subjects into the two groups
– Ideal solution not usually possible
• Alternate solution {Topic of the paper}
– Identify characteristics where F1 F2
– Adjust the distribution of Y to account for the
differing distributions of X
Introduction to Bootstrap
Resampling
• Purpose
– Obtain the distribution of a parameter estimate
(i.e. sampling distribution)
– Not rely on assumptions about the underlying
distribution
– Often used when parameter estimate
• has difficult to obtain distribution
• relies heavily on unrealistic assumptions
Introduction to Bootstrap
Resampling
• Given Data
– {X1, X2, …, Xn} where Xi is a p x 1 vector
– X has unspecified distribution F
• Parameter of interest Q
– Q = T(F) is a parameter of interest
• We want the distribution of
ˆ  T ( Fˆ )
Q
Introduction to Bootstrap
Resampling
• Distribution of Q̂
– usually obtained through theoretical properties
if repeated sampling is performed on a
population with a known distribution of X
– bootstrap techniques resample from the data to
simulate repeated sampling from the population
with unknown distribution of X
Introduction to Bootstrap
Resampling
Example -- Population Mean
• Example -- Population Mean
Q  T ( F )   xdF ( x)
• Resample with replacement from data
– Data is (X1, …, Xn).
– Each data point equally likely to be selected
– Resampled data is (X(b)1, …, X(b)n).
ˆ (b) is the bth bootstrap estimate of m
–Q
(b )
(b )
ˆ
ˆ
Q   xdF ( x) 
1
n
X
(b)
i
X
(b )
Introduction to Bootstrap
Resampling
Example -- Population Mean
• B bootstrap samples are drawn
• Distribution of Q̂ is estimated with the
empirical distribution function of
ˆ (1) , ..., Q
ˆ ( B) )
(Q
• Mean and variance of this distribution used
to estimate mean and variance of Q̂
Two Resampling Methods for
Comparing Two Groups
• Method 1: Normalized comparisons
– Y is a response of interest
– X is a category variable, confounder
– Z=1 for group 1, Z=2 for group 2
– F(Y|Z=1) normalized for distribution of X in group 2
F ( 2) (Y | Z  1)   F (Y | Z  1, X  x j ) P( X  x j | Z  2)
all x
– F(Y|Z=2) non- normalized
F (Y | Z  2)   F (Y | Z  2, X  x j ) P( X  x j | Z  2)
all x
Two Resampling Methods for
Comparing Two Groups
• Method 1: Normalized comparisons
– Resample from (Yi,Xi) seperately for groups 1
and 2
– Construct estimates of F(Y|X=xj) and P(X=xj) for
two groups
– Construct estimates of the normalized distribution
functions on the previous slide
– Parameter estimates can be obtained from this
Two Resampling Methods for
Comparing Two Groups
• Method 2: Bootstrapping linear regression
–
–
–
–
Y is a response of interest
X is vector of variables, confounders
Z=1 for group 1, Z=2 for group 2
Use the regression model
Y    I ( Z  2)  X '   
Two Resampling Methods for
Comparing Two Groups
• Method 2: Bootstrapping linear regression
– Estimate (, , ) with (ˆ , ˆ , ˆ ) the least
squares estimates on original data
– Resample with replacement from the residuals
– Construct the bth bootstrap value of Y as
Yi (b)  ˆ  ˆI (Zi  2)  X i ' ˆ   i(b )
– bth bootstrap sample is
(Yi ( b ) , Z i , X i ) i  1, ..., n
Two Resampling Methods for
Comparing Two Groups
• Method 2: Bootstrapping linear regression
– Construct estimates of (, , ) with (ˆ , ˆ , ˆ ) ( b )
the least squares estimates on bootstrap sample
– Using the B bootstrap estimates of (, , ),
construct the distribution of the parameters of
interest
Examples
Using Data from a Nationwide Data Base of Workers
Compensation Claims
• Normalized comparisons of percentiles
–
–
–
–
Y= Total claim cost
Group 1: Providers in network A
Group 2: Providers not in network A
X is a 10 level variable representing claim
severity derived through ICD9 code on a claim
– B = 500 bootstrap sample drawn
– median, 75th, and 95th percentiles compared
– Normalization relative to group 1
Examples
Using Data from a Nationwide Data Base of Workers
Compensation Claims
• Normalized comparisons of percentiles
Examples
Using Data from a Nationwide Data Base of Workers
Compensation Claims
• Bootstrapping linear regression
– Y = log(Total Indemnity Costs)
– X consists of several variables
• NCCI body part designation, nature of injury designation, accident
cause, industry class code, and injury type
• 10 level claim severity measure derived with ICD9 code
• Age and gender
– Group 1: Specific provider of interest (Provider Z)
– Group 2: All other providers
– B=500 bootstrap samples
Examples
Using Data from a Nationwide Data Base of Workers
Compensation Claims
Conclusions
• Bootstrap methodology is a flexible robust
method for deriving sampling distributions
• Can be used to compare two groups while
considering possible confounder variables
• Useful method for observational studies
• Only a few examples shown in this
paper/presentation, much more potential
Download