Repeatability of a skin tent test for dehydration

advertisement
1
Repeatability of a skin tent test for dehydration in working horses and
2
donkeys
3
JC Pritchard†‡ *, ARS Barr† and HR Whay†
4
5
†
6
BS40 5DU, UK
7
‡
8
SW1Y 4DR, UK
9
* Contact for correspondence and request for reprints: joy.pritchard@bristol.ac.uk
Department of Clinical Veterinary Science, University of Bristol, Langford House,
Brooke Hospital for Animals, Broadmead House, 21 Panton St, London
10
11
Running title: Dehydration in working horses and donkeys
12
13
Abstract
14
Dehydration is a serious welfare issue for equines working in developing countries.
15
Risk factors such as high ambient temperature, heavy workload and poor water
16
availability are exacerbated by the traditional belief that provision of water to
17
working animals will reduce their strength or cause colic and muscle cramps.
18
As part of the welfare assessment of 4889 working horses and donkeys during
19
2002/3, eight observers were trained to perform a standardised skin tent test. The
20
prevalence of a prolonged duration of skin tenting was 50% in horses and 37% in
21
donkeys. Two studies investigated inter-observer repeatability of skin tent test
22
techniques, using a total of 220 horses and donkeys in India and then Egypt:
23
measures of agreement with a ‘gold standard’ observer varied from 40 to 99%.
24
Simplifying the test by reducing the number of possible scores for skin tent from
25
three (immediate return of skin to normal position; delayed return up to three
1
1
seconds; delayed return more than three seconds) to two (immediate return of skin
2
to normal position; delayed return of any duration) did not improve overall
3
repeatability of the skin tent test. Potential reasons for not achieving high levels of
4
agreement include variations in assessment method, assessors’ previous experience,
5
subjective demarcation between score categories and biological variability.
6
7
Keywords: animal welfare, dehydration, donkey; horse, repeatability, skin tent
8
9
Introduction
10
Assessment of welfare using direct, animal-based observation of health and
11
behaviour outputs has increased in recent years. It is now being used by the Soil
12
Association for monitoring of organic standards and by the European Union
13
Welfare Quality Project to standardise welfare output criteria within the farming
14
sector. Animal-based welfare assessment also has applications in other sectors and
15
in 2002 was adopted by the UK-based charity, Brooke Hospital for Animals, as an
16
effective way to inform and monitor welfare improvement strategies for equines
17
working in developing countries. Although often assessed subjectively, animal-
18
based observations should provide a more direct, and therefore more valid,
19
assessment of welfare than resource measurements. A key consideration when
20
developing an effective welfare assessment tool is ensuring repeatability of results
21
between different observers. Subjective assessment of health parameters, such as
22
locomotion scoring of pigs, has been found to be highly repeatable between trained
23
observers (Main et al 2000).
24
A welfare assessment was carried out on 4889 equines that pull carts or carry people
25
or goods in urban and peri-urban areas of Afghanistan, Egypt, India, Jordan and
2
1
Pakistan (Pritchard et al 2005). As part of the study, which took place during winter
2
2002 and spring 2003, eight observers were trained to perform a skin tent test, using
3
a standard method and anatomical location described by detailed guidance notes and
4
photographs. Skin tent was scored in 4664 animals: the prevalence of Score 1 (some
5
loss of skin elasticity) was 32% in horses and 28% in donkeys. The prevalence of
6
Score 2 (prominent tenting of skin) was 18% in horses and 9% in donkeys.
7
This paper describes two tests for repeatability of the skin tent test and discusses
8
potential reasons why the skin tent test may not be highly repeatable between
9
observers.
10
11
Materials and methods
12
Study A
13
This investigation was carried out in Delhi, India, during August 2003. Eighty
14
horses and 80 donkeys were recruited from the population working in the vicinity of
15
Brooke’s field clinics, in order to test inter-observer repeatability for each parameter
16
in the animal-based welfare assessment described above, including the standardised
17
skin tent test. The animal was positioned with its head up, in a natural position,
18
facing straight ahead. The lateral edge of the observer’s hand rested against the
19
cranial margin of the animal’s scapula and a vertical fold of skin overlying the m.
20
brachiocephalicus was pinched and released immediately. Although the height of
21
the skin pinch was not standardised, the fixed position of the observer’s hand was
22
designed to reduce variation whilst remaining practically applicable. Skin tent
23
duration was scored as follows: 0) if there was no loss of skin elasticity and skin
24
returned to normal immediately, 1) if there was some loss of skin elasticity and
3
1
tented skin remained visible but not prominent for up to three seconds and 2) if
2
there was prominent tenting of skin, visible for more than three seconds.
3
Each animal was identified with a unique number on a hoof brand and harness tag.
4
No prior information about the animals’ health or behaviour was provided to the
5
observers. Six observers carried out the standardised skin tent test on the same
6
80 animals, with an interval of approximately ten minutes between observations.
7
Forty horses were assessed on each of days 1 and 2 and forty donkeys were assessed
8
on each of days 3 and 4.
9
10
Study B
11
The results of the first repeatability study were used to modify the welfare
12
assessment protocol with the aim of increasing the repeatability of some measures.
13
The skin tent test was simplified from three scores to two. Animals were scored: 0)
14
(absent) if skin returned to a normal position immediately after it was pinched and
15
released, or 1) (present) if there was any delay in return of tented skin to its normal
16
position. To test the success of modifications to the welfare assessment, a second
17
repeatability study was undertaken in Cairo, Egypt during April 2004, using the
18
method described for Study A. Ten observers (including five who took part in Study
19
A) assessed 30 working horses and 30 working donkeys over two days.
20
21
Statistical analysis
22
For both studies, data were analysed for the level of agreement between Observer 1
23
and each of the other observers, using Cohen’s kappa coefficient and calculations of
24
percentage agreement. Observer 1 was the same for both studies. Statistical analysis
4
1
was carried out using SPSS v. 12.0 (SPSS Inc). Significance is reported at the
2
P < 0.05 level.
3
4
Table 1
5
6
Results
7
Levels of agreement for the skin tent test between Observer 1 and each of the other
8
observers are illustrated in Table 1. In Study A, two observers achieved greater than
9
75% agreement with Observer 1 for horses and three achieved greater than 75%
10
agreement for donkeys. The kappa statistic could not be calculated for all observers
11
in Study A because observers did not use the full range of possible scores. In Study
12
B, one observer achieved over 75% agreement with Observer 1 for horses
13
(κ = 0.664, P < 0.01) and three achieved over 75% agreement for donkeys
14
(κ = 0.529-0.667, P < 0.01). Some observers improved their percentage agreement
15
with Observer 1 when the scoring system was simplified; others had a lower
16
percentage agreement. Overall, for both systems the repeatability was above 60%,
17
but the two-score system achieved a lower inter-observer repeatability (61 and 64%
18
for horses and donkeys respectively) than the three-score system (66 and 83%).
19
20
Discussion
21
Cohen’s kappa coefficient is a measure of agreement, which relates the actual
22
agreement between observers with that which would have been obtained by chance.
23
It is a quotient that can take any value between 0 (no agreement beyond chance) and
24
1 (perfect agreement). There are no objective criteria for judging kappa, although
25
0.4-0.5 is considered to be moderate agreement beyond chance. Levels below this
5
1
indicate that a test has poor specificity and/or sensitivity (Martin et al 1987). Kappa
2
depends on the number of categories that are used in its calculation, with its value
3
being greater if there are fewer categories (Petrie & Watson 1999). However,
4
because kappa compares pairs of observations, it cannot generate a statistic for
5
agreement between observers who did not use the full range of possible scores. In
6
Study A, most observers did not allocate all possible scores within the group of
7
animals, so could not be compared with Observer 1. Calculating the percentage
8
agreement between Observer 1 and each other observer produced the most useful
9
outcome in this case. It may be argued that each observer should be compared with
10
the mode result rather than a ‘gold standard’; however, this assumes that all
11
observers have an equal level of training, use an identical method and can produce
12
absolutely standardised observations. All observers were trained by JCP (Observer
13
1), HRW or both and provided with a comprehensive set of guidance notes and
14
photographs written by JCP, so for these studies it was decided to use JCP as the
15
gold standard.
16
Potential reasons for not achieving good agreement between observers include:
17
18
• Variation between observers in the method of assessment used for some
19
parameters, possibly attributable to guidance notes not being sufficiently specific
20
about how a parameter should be assessed or how to define ‘normal’.
21
22
• Assessors’ previous experience: unfamiliarity with observing the parameter during
previous field experience, not recognising normal, applying a skewed scale.
23
• Subjective demarcation between score categories; for example in Study A,
24
Scores 1 and 2 for skin tent were defined by time taken for skin to return to
25
normal, but times were estimated rather than measured accurately.
6
1
2
3
• Biological variability in the parameter over the time taken to carry out the
repeatability test; for example, changes in skin elasticity or hydration status.
• Test not valid for the parameter under assessment.
4
5
The first three may be improved by refining training and guidance notes, prior
6
exposure of observers to animals demonstrating the full range of possible scores for
7
the parameter in question and standardising methodology. For skin tent, this may
8
include introduction of an objective time measurement. Biological variability in the
9
skin tent test is poorly understood; reducing the time taken for the inter-observer
10
repeatability test may improve agreement between observers, although it risks
11
introducing variables relating to repeated handling of the skin within a short time
12
period. Many observer repeatability studies use photographs or video footage, rather
13
than consecutive direct observations, in order to reduce this variability (Fuller et al
14
2006). Simplifying the score categories from three in Study A to two in Study B
15
was intended to minimise errors relating to subjective demarcation of score
16
categories, although this did not appear to have the desired effect.
17
18
Conclusions and animal welfare implications
19
This study concluded that a standardised skin tent test for dehydration was
20
moderately repeatable and some observers could achieve a high level of
21
repeatability compared with a ‘gold standard’ observer. Simplifying the scoring
22
system did not result in better inter-observer repeatability overall. A repeatable and
23
practical measure of hydration status is needed in order to develop and evaluate
24
intervention programmes to reduce dehydration and thus improve the welfare of
25
equines working in hot and humid environments.
7
1
2
Acknowledgements
3
This work was funded by the Brooke Hospital for Animals. The authors would like
4
to thank all Brooke field staff overseas who participated in and supported these
5
studies.
6
7
References
8
Fuller CJ, Bladon BM, Driver AJ and Barr ARS 2006 The intra- and inter-
9
assessor reliability of measurement of functional outcome by lameness scoring in
10
horses. The Veterinary Journal 171: 281-286
11
Main DCJ, Clegg J, Spatz A and Green LE 2000. Repeatability of a lameness
12
scoring system for finishing pigs. Veterinary Record 147: 574-576
13
Martin SW, Meek AH and Willeberg P 1987. Measurement of disease frequency
14
and production. In: Veterinary Epidemiology. Principles and Methods. Iowa State
15
University Press: Ames, Iowa, USA [Au: Please supply the editor(s) of
16
Veterinary Epidemiology]
17
Petrie A and Watson P 1999. The kappa measure of agreement for a categorical
18
variable. In: Statistics for Veterinary and Animal Science. Blackwell Science Ltd:
19
Oxford, UK[Au: Please supply the editor(s) of Statistics for Veterinary and...]
20
Pritchard JC, Lindberg AC, Main DCJ and Whay HR 2005. Assessment of the
21
welfare of working horses, mules and donkeys, using health and behaviour
22
parameters. Preventive Veterinary Medicine 69: 265-283
8
Download