the HTAi conference in Dublin in 2010

advertisement
Improving search efficiency for economic evaluations in major databases
using semantic technology
Julie Glanville(1), Bill Porter(2), Pamela Negosanti(2), Carol Lefebvre(3)
(1) York Health Economics Consortium, University of York, York, YO10 5NH,
United Kingdom. Email: jmg1@york.ac.uk
(2) Expert System SpA, Modena, Italy. www.expertsystem.net. Email:
bporter@expertsystem.net
(3) UK Cochrane Centre, National Institute for Health Research, Oxford,
United Kingdom. Email: clefebvre@cochrane.ac.uk
Objective
Many technology appraisals require evidence from economic studies, and in
particular from economic evaluations, such as cost-benefit, cost-effectiveness or
cost-utility studies. Identifying economic evaluations efficiently in major
databases is problematic because it is difficult to find terms which distinguish
economic evaluations effectively from other studies, in particular terms which
distinguish them from other economic studies which are not economic
evaluations. Available search filters are sensitive but have low precision which
means many irrelevant records have to be sifted manually to identify the few
relevant records.(1) Semantic technology software understands automatically the
meaning of text written in natural language. This research explores whether
semantic technology post-processing software, COGITO® Studio Discover, can
help to improve search efficiency for difficult to identify study designs such as
economic evaluations.
Methods
We identified a gold standard set of known economic evaluation records from the
NHS EED database published in 3 years (2000, 2003 and 2006) and retrieved
their matching MEDLINE records (2). The records consisted of cost-benefit
studies, cost-effectiveness studies and cost-utility studies as shown in Table 1.
We also identified a comparison set of records which were not economic
evaluations (but which contained economic text words such as cost) from
MEDLINE for the same years and retrieved their matching MEDLINE records.
(Table 1).
The records were imported into the semantic environment tool COGITO® Studio
Discover in XML format. The XML structure allowed COGITO to use the different
fields of the records to inform the rules of the filter processing, allowing different
rules to be used for the different fields, for example the title and abstract. The
advantage of using semantic rules lies in two main areas: managing synonyms
and polysemy. Managing synonyms means that using a powerful semantic
network the system is able to understand the concepts, recognizing them
according to their meaning and not to the way they are written. Polysemes and
homonyms management uses context information so that the algorithm is able to
identify the correct meaning of a word even if the word could have different
meanings.
Within the COGITO environment the two sets of records were each divided
randomly into 2 subsets. One subset formed the training (test) sets of records
and the second formed the validation sets of records on which the performance
of the semantic rules could be validated. We trained Cogito semantic technology
software to recognize economic evaluations from records which contained
economic text words but which were not economic evaluations using the test
subsets of the gold standard and comparator records. We created semantic rules
and tested the precision and sensitivity of those rules in identifying economic
evaluations accurately in the test subset of the gold standard. We then tested out
how well the software performed in distinguishing economic evaluations from
comparator records which were not economic evaluations records in the
validation sets of records.
Results
The training process yielded a sensitivity of 100% and precision of 82.77% in the
test set (Table 2). When the rules were tested in the validation set the sensitivity
was maintained at 100% and the precision reduced to 71.69%. This represented
a Number Needed to Read of 1.21 in the test set and 1.39 in the validation set.
Discussion
In a recent assessment of the performance of economic evaluation search filters
in finding these same gold standard records the best performing MEDLINE filters
in terms of sensitivity also achieved 100% sensitivity. However, the most
sensitive filters had very low precision at 2%.(1,3) This represented a Number
Needed to Read of 50. The highest precision achieved in the assessment was
26% but that strategy (designed as part of the project) had only 72% sensitivity.
The current research indicates that with this technology it is possible to achieve
much higher precision with no sacrifice in sensitivity.
The value of the approach used in this study is that it focuses on the meaningful
co-presence of key terms within records to identify relevance as opposed to
simply co-presence. This approach is difficult to achieve in the current interfaces
to bibliographic databases such as MEDLINE because of the strictures of the
Boolean approach and the stepwise nature of searching using set combination.
The issue of how best to leverage the benefits of COGITO in conjunction with
databases such as MEDLINE needs to explored. At present using COGITO is
likely to be a two-step process involving a sensitive search of the database,
loading the results into COGITO and then running the semantic rules against the
result set. The benefits of loading database records into COGITO and running
queries directly (one-step process) need to be explored so that a one-step
process could be facilitated.
Conclusions
Initial exploration has shown that it is feasible to develop semantic rules to
identify economic evaluation records efficiently from among a mix of evaluations
and records which were not economic evaluations. Semantic technology, for
post-processing of search results achieved from sensitive searches, may provide
a helpful solution to the current challenges of identifying difficult to distinguish
study designs such as economic evaluations, observational studies, quality of life
studies, patient preferences and diagnostic test accuracy studies. This also
seems to be a promising technology to explore in terms of improving the
precision of searching for economic evaluations among records obtained from
EMBASE where extensive indexing can actually impede efficient retrieval at
present. If COGITO rules can be developed to efficiently retrieve hard to
distinguish studies accurately there may be real benefits for health technology
assessment in terms of reducing the resources required to scan records for
relevance.
Table 1. Numbers of economic evaluation records and comparator records
identified from MEDLINE.
Year
Number of NHS EED
Number of records in
Published
records with matching
MEDLINE comparator
MEDLINE records:
set
MEDLINE gold standard
2000
577
1,226
2003
618
1,335
2006
755
1,575
Total
1,950
4,136
Table 2. Performance of semantic rules in identifying gold standard records in
test and validation sets of records.
Gold standard records
retrieved
Test set
(GS=975)
(Comparator set
2068)
Validation set
(GS=975)
(Comparator set 2068)
975
975
Comparator records
retrieved
Sensitivity
(number GS
retrieved/number of
GS)
Precision
(number of GS
retrieved/number of
records retrieved)
203
385
100%
100%
82.77%
71.69%
References
(1) Glanville J, Fleetwood K, Yellowlees A, Kaunelis D, Mensinkai S.
Development and Testing of Search Filters to Identify Economic
Evaluations in MEDLINE and EMBASE. Ottawa: Canadian Agency for
Drugs and Technologies in Health; 2009.
(2) Centre for Reviews and Dissemination. NHS Economic Evaluation
Database [database online]. York: Centre for Reviews and Dissemination;
2010. Available from: http://www.crd.york.ac.uk/crdweb/
(3) Glanville J, Kaunelis D, Mensinkai S. How well do search filters perform in
identifying economic evaluations in MEDLINE and EMBASE. International
Journal of Technology Assessment in Health Care 2009;25:522-529.
Download