I saw in the news that a program is being reviewed by CMS and/or

advertisement
I saw in the news that a program is being reviewed by CMS and/or the OPTN/UNOS for worse-thanexpected 1-year outcomes, but on your website, the Program-Specific Report for that program says its
1-year outcomes were “as expected” or “not significantly different.” Why do CMS and/or the
OPTN/UNOS reach different conclusions from the SRTR?
If you flipped a coin 10 times, you might expect to get 5 “heads”, but sometimes you’d get more or
fewer than 5 “heads” due to random variation. Similarly, it is possible for a program to have more or
fewer “observed outcomes” (deaths or graft failures) than expected due to random variation.
The determination as to whether a program’s outcomes are different from what would be expected
based on the national experience is made after a statistical test. A statistical test can never determine
with absolute certainty that a program’s outcomes truly are different from what would be expected;
during any specified period, a program’s outcomes may appear to be different from expected simply
because of random variation. Outcomes can differ from expected in two ways: better-than-expected or
worse-than-expected (i.e., a program can be “overperforming” or “underperforming”). Typically,
statisticians allow for a 5% chance that the statistical test will lead to a wrong conclusion that the center
is underperforming or overperforming when the program is actually performing as expected.
The Program-Specific Reports on the SRTR website note a program’s outcomes to be “as expected,”
“higher than expected,” or “lower than expected.” Because we look at both higher and lower than
expected, and because we allow for a 5% chance that the statistical test will lead to the wrong
conclusion, we are therefore allowing a 2.5% chance that the test is wrong in identifying overperforming
centers and a 2.5% chance that the test is wrong in identifying underperforming centers.
CMS and the OPTN/UNOS are primarily concerned with identifying programs that appear to be
underperforming, so that these programs can receive further quality review. In their analyses, CMS and
the OPTN/UNOS also allow for a 5% chance that the statistical test will lead to the incorrect conclusion
that a program is underperforming when in fact the difference between what was expected and what
was observed was simply due to chance. But notice the difference with regard to potentially
underperforming programs: the SRTR allows for a 2.5% chance of concluding a program was
underperforming when in fact it performed as expected, whereas CMS and the OPTN/UNOS allow for a
5% chance. As a result, analyses by CMS and the OPTN/UNOS identify more programs for review than
would be indicated in the SRTR results.
In more statistical terms, the test used to compare a program’s outcomes with what would be expected
based on the national experience results in a number called a p-value. If the p-value is less than 0.05
(i.e., there is less than a 5% chance that the test result is wrong), then the conclusion is that the program
is not performing as expected. A p-value can be “2-sided” or “1-sided.” Analysis for program outcomes
higher or lower than expected generates a 2-sided p-value; that is the kind of analysis performed by the
SRTR for its Program-Specific Reports. Analysis for program outcomes only lower than expected
generates a 1-sided p-value; that is the kind of analysis performed by CMS and the OPTN/UNOS as part
of their program review process. A 2-sided p-value is twice as large as a 1-sided p-value when a
program’s outcomes are worse than expected. Therefore, in some instances, a 1-sided analysis may
show that a program was underperforming, whereas a 2-sided analysis may not.
For example, in an SRTR (2-sided) analysis, a program may have more patient deaths than expected and
the p-value may be 0.08 (which is greater than 0.05), leading to the conclusion that we cannot state that
this program is underperforming. However, in a CMS or OPTN/UNOS (1-sided) analysis, the p-value for
that same program would be 0.04 (which is less than 0.05), leading to the conclusion that the program
was underperforming. This is how it happens that CMS and/or the OPTN/UNOS and the SRTR sometimes
reach different conclusions. But, this does not mean one is wrong or one is better than another.
Download