APPENDIX To decompose how much of the overall sex difference in

APPENDIX
To decompose how much of the overall sex difference in injury type and location can be
attributed to these compositional differences, we used a non-linear decomposition technique
suggested by Fairlie.26 The method was first introduced by Oaxaca and Blinder27 but was adapted
by Fairlie for use in models with binary dependent variables. The details of this method are
reported below.
The average difference between males and females in the probability of being seen for an
overuse injury can be expressed as:
𝑁𝑀
𝐹
𝑖=1
𝑖=1
𝑁
𝐹(𝑋𝑖𝑀 𝛽̂ 𝑀 )
𝐹(𝑋𝑖𝐹 𝛽̂ 𝑀 )
𝑀
𝐹
̅
̅
𝑌 − 𝑌 = [∑
−
∑
]+
𝑁𝑀
𝑁𝐹
𝑁𝐹
𝐹
𝑖=1
𝑖=1
𝑁
𝐹(𝑋𝑖𝐹 𝛽̂ 𝑀 )
𝐹(𝑋𝑖𝐹 𝛽̂ 𝐹 )
−
∑
[∑
],
𝑁𝐹
𝑁𝐹
where 𝑁𝑗 is the sample size for sex j (M=male, F=female), 𝑌̅𝑗 is the mean probability of being
𝑗
seen for an overuse injury for sex j, 𝑋𝑖 is the vector of independent variables for case i in sex j,
𝛽̂ 𝑗 is the vector of coefficient estimates including a constant term, and F is the cumulative
distribution function from the logistic distribution. The first term is the portion of the overall
difference in overuse injuries attributed to compositional differences (i.e., differences in the
distributions of the independent variables). It can be interpreted as the extent to which the malefemale gap in overuse injuries would close if males were assigned the characteristics of females.
The second term shows the part of the overall difference due to differences in the processes that
lead to overuse injuries (i.e., differences in the coefficients). This second term also includes
differences due to unmeasured characteristics. Notably, we could just as easily use the female
coefficient estimates (𝛽̂ 𝐹 ) as weights in the first term of the equation and the male distribution of
independent variables (𝑋 𝑀 ) as weights in the second term. A third possibility is to weight the
first term of the decomposition with coefficient estimates from a pooled sample of males and
females. For the models in this paper, the choice of weights did not substantively alter our
conclusions, so for brevity’s sake we only report the results from the decompositions using the
pooled weights.
This technique estimates the total contribution of sex differences in the independent
variables to the male-female gap in the dependent variable. It also allows us to estimate the
separate contribution of each independent variable to the overall gap. Each contribution is equal
to the change in the mean predicted probability of the outcome from substituting the female
distribution with the male distribution of a specific variable, holding the distributions of the other
variables constant.
To compute the separate contributions, we follow Fairlie’s recommendation,26 pooling
the male and female samples and computing the predicted probability of being seen for an
overuse injury for each male and female in the sample. Since 𝑁 𝑀 ≠ 𝑁 𝐹 , we draw a random
subsample of females equal to the size of the male group and match and rank the two groups
based on their predicted probabilities. The results are sensitive to the subsample chosen; we thus
draw 1,000 different subsamples and base our results on average values obtained from
decompositions carried out over these subsamples. The decomposition estimates are also
sensitive to the ordering of the variables in the equation. We therefore randomize the order of the
variables across the simulations. We use the ‘fairlie’ command in Stata to compute both
estimates and standard errors, the latter of which are approximated using the delta method.26