Appendix B There are several sources and statistical methods that

advertisement
Appendix B
There are several sources and statistical methods that can be used to assess validity and reliability.
For readers wishing to learn more there are many academic resources available1–4,6.
Traditionally, there are three types of validity (content, predictive and concurrent validity and
construct validity) and four methods of testing the reliability of a tool. In this paper, the following
four definitions/descriptions from Messick5 are used when discussing validity and from Downing3
when discussing rater reliability:
Face or content validity
“Content validity is evaluated by showing how well the content of the test samples the class of
situations or subject matter about which conclusions are to be drawn.”
This subject relevance or representativeness is usually judged by experts and established through
consensus. E.g. Does the checklist contain items related to what is performed during an
ultrasound guided regional anesthesia block?
Construct validity
“Construct validity is evaluated by investigating what qualities a test measures, that is, by
determining the degree to which certain explanatory concepts or constructs account for
performance on the test.”
The test does not have to define the construct completely, but rather be an approximate measure.
It can be assessed through the discernment of performance. E.g. When this evaluation is used, do
novices in this procedure perform poorly as compared to experienced participants?
Concurrent validity
“Concurrent validity indicates the extent to which the tests estimate an individual's present
standing on the criterion.”
E.g. How well does this evaluation (i.e. simulation model performance) reflect or correlate with
another measure (i.e. patient block performance)?
Inter-rater reliability
“Reliability refers to the reproducibility of assessment data or scores, over time or occasions…
All reliability estimates quantify some consistency of measurement and indicate the amount of
random error associated with the measurement data.”
Intraclass correlation coefficients (ICCs) provide an estimate of the consistency of measurements
made by different raters evaluating the same task while considering the various components of
variability inherent to the study design (i.e. the raters, participant evaluation).3 There are six
forms of ICCs, which one is reported is dependent on the analysis of variance used (one or twoway), the relevance of raters (random or fixed raters) and the unit of analysis (individual or mean
ratings).6 In our paper, two-way average measures intra-class correlations (ICC) were calculated
for absolute agreement, as all participants were rated by the same two raters and our interest was
in quantifying the reliability of average ratings.
References
1. Cook DA, Beckman TJ. Current Concepts in Validity and Reliability for Psychometric
Instruments: Theory and Application. Am J Med 2006;119:166.e7–166.e16
2. Downing SM, Haladyna TM. Validity threats: overcoming interference with proposed
interpretations of assessment data. Med Educ 2004; 38:327–333
3. Downing SM. Reliability: on the reproducibility of assessment data. Med Educ 2004;
38:1006–1012
4. Downing SM. Validity: on the meaningful interpretation of assessment data. Med Educ 2003;
37:830–837
5. Messick S. Validity of test interpretation and use. Education Testing Service 1990; 1-33
6. Shrout P, Fleiss J: Intraclass Correlations: Uses in Assessing Rater Reliability. Psychol
Bullentin 1979; 86.2:420–428
Download