Automatic acquisition of synonyms from French UMLS for enhanced

advertisement
Grouping pharmacovigilance terms with
semantic distance
Marie DUPUCHab, Anne JAMETbc, Marie-Christine JAULENTab, Reinhard FESCHAREKd, Natalia
GRABARe
a
Université Pierre et Marie Curie - Paris6, Paris, F-75006 France; b INSERM, U872 eq. 20, Paris,
F-75006 France; cHEGP AP-HP, Paris, France; d CSL Behring GmbH, Marburg, Germany; e
CNRS UMR 8163 STL, Université Lille 3, France
Abstract. Pharmacovigilance is the activity related to collection, analysis and
prevention of adverse drug reactions (ADRs) induced by drugs or biologics. Beside
other methods, statistical methods are in use to detect new ADR and it was noted that
their combination with groupings of terms gathering similar ADRs allows to improve
the detection of new ADRs. SMQs, reference groupings in the pharmacovigilance area,
are built thanks to the exploitation of the MedDRA structure. Currently SMQs are overinclusive, although they can miss several relevant terms. Moreover, several important
security topics are not covered by the SMQs. The objective of this work is to propose an
automatic method for the creation of groupings of terms. This method is based on
exploitation of the semantic distance between MedDRA terms. Through several
experiences performed, we obtain a high precision and an acceptable recall.
Keywords. Natural Language Processing, Medical informatics, Drug toxicity,
Pharmacoepidemiology, Semantics, Terminology
1.
Introduction
Pharmacovigilance is the activity related to the collection, analysis and prevention of
adverse drug reactions (ADRs) induced by drugs or biologics. ADRs are encoded with
terms from dedicated terminologies, WHO-ART (World Health Organization - Adverse
Reaction Terminology) and MedDRA (Medical Dictionary for Drug Regulatory Activities).
Signal detection – unexpected relation between a drug and an ADR – exploits this
encoding. Statistical methods are typically used for signal detection in data banks with
sufficiently large volume of cases [2, 11], but for increasing of the signal intensity,
groupings of similar pharmacovigilance cases is suitable [8]. This need becomes more
relevant because the structure of MedDRA is very granular: use of very specific terms for
encoding of ADRs may cause a dilution of signal [7]. Thus, various hierarchical levels of
MedDRA (SOC, HLT, PT) and more recently SMQs (Standartized MedDRA Queries) have
been used. The PT (Preferred Terms) level, very often exploited for the signal detection,
corresponds mainly to specific ADRs, while SOCs (System Organ Class) and HLTs (High
Level Terms) correspond to MedDRA hierarchical levels, which are higher than PTs and
provide with hierarchical groupings of PTs. As for the SMQs, their objective is to group
together terms associated with a given diagnosis. SMQs are designed by groups of experts
further to the manual analysis of the MedDRA structure and of the scientific literature [5].
This is a long and tedious task. Few evaluation works of these groupings have
demonstrated that SMQs often present the highest sensibility [12, 14] but can be overinclusive [14] and, because the cases found lack specificity, their evaluation is extremely
long. Finally, relevant PTs may not be included in SMQs [14], and SMQs are not
exhaustive: several serious diagnostics are not yet addressed.
In order to ease and systematize the process of creation of groupings of terms,
automatic methods can be exploited. One existing work proposes hierarchical groupings of
ADRs [1], which does not necessarily respect the medical reasoning. For instance, in
relation to renal diseases, in addition to terms such as Acute nephritis and Insufficiency
renal, which have a hierarchical relation among them, it can be also relevant to consider
terms relative to laboratory results or to medical procedures. Semantic distance may lead to
creation of groupings which respect better the medical reasoning: it has previously been
applied to subsets of terms from MedDRA [3] and WHO-ART [9]. In the WHO-ART
related work [9], the obtained groupings demonstrated several types of relations: synonyms,
antonyms, physiological functions or abnormalities, associated symptoms, abnormal
laboratory tests, pathologies and their causes, close anatomical localizations, degrees of
severity, and heterogeneous groupings. No evaluation has been realized against existing
groupings (SMQs, SOCs, HLTs). We propose to continue the adaptation of the semantic
distance approaches to the creation of ADR term groupings. The whole set of MedDRA
terms is used, and a special attention is paid to the evaluation of the obtained groupings
against the SMQs, which become the reference in the pharmacovigilance area.
2.
Material and Method
Material. Material used is issued from MedDRA [4]: ontoEIM and SMQs. The ADR
ontology ontoEIM [1] has been created thanks to the projection of WHO-ART and
MedDRA on Snomed CT [18], through the exploitation of UMLS [13]. Up to 85% of
WHO-ART terms, but only 46% of MedDRA terms are thus aligned. We exploit the
resource built with MedDRA, which terms are used for the creation of SMQs. Thanks to
this alignment, representation of MedDRA terms is enriched: their structure is improved
and becomes parallel to the structuring in Snomed CT, and they receive formal definitions
(terms can be defined on four Snomed CT axes: morphology, topography, causality and
expression). Our second material, SMQs (Standartized MedDRA Queries), are groupings of
MedDRA terms related to a given diagnosis (i.e., Acute renal failure, Hepatic disorders).
SMQs are created in order to provide assistance when searching cases relevant to this
diagnosis. 82 existing SMQs contain MedDRA PTs and LLTs. In our experience, ontoEIM
is the material we use for the creation of groupings of ADR terms, while SMQs in their
broad version are used as gold standard for the evaluation of these groupings.
Method. Semantic distance between terms is often computed within terminologies or
ontologies. It depends on the number of edges (or the shortest path) between two terms
(four edges between terms Abdominal abscess and Pharyngeal abscess on figure 1),
although other factors may be also exploited. A detailed description of our method is
proposed in [6], while here we remind its main steps and new aspects. First, because of the
partial alignment of the MedDRA terms (46%), we try to improve it by the transposition of
alignments between LLT and PT terms when one of them is aligned and others are missing.
Four different cases where distinguished and successfully processed [6]. Then, semantic
distance [16] is computed between MedDRA PT and LLT terms through the ontoEIM
resource. We exploit for this either the ADR terms only (which belong mainly to the axis
Diagnostic D) or ADR terms and their formal definitions. Within the formal definitions, we
exploit elements provided by two axes: morphology M (the kind of the abnormality) and
topography T (the anatomical localization). These axes are often involved in the definition
of diagnostics [17] and they are also the most frequently present in ontoEIM. Let’s consider
figure 1 and its two terms Abdominal abscess and Pharyngeal abscess defined as follows:
– Abdominal abscess: M = Abscess morphology, T = Abdominal cavity structure
– Pharyngeal abscess: M = Abscess morphology, T = Neck structure
The shortest paths sp are computed between these two terms (axis D) and between their
formal definitions (axes T and M). Weight of edges is set to 1, and the value of each
shortest path corresponds to the sum of weights of all its edges. For this pair of terms we
obtain the following values: spD =4, spT=10 and spM=0.
Figure 1: The shortest paths sp between Abdominal abscess and Pharyngeal abscess
computed on the three axes: Diagnostics (D), formal definitions (T and M).
Semantic distance is computed [6], which allows to generate a semi-matrix and to apply an
ascendant hierarchical classification [10] for the creation of groupings of terms. The
minimal threshold is set to 2. We perform several experiences, where we exploit: one axis
(D) and three axes (D, M, T); all terms in SMQs and only terms aligned with Snomed CT;
the best grouping for a given SMQ and merged groupings. Groupings are compared with 9
SMQs [6] issued from [19]. Evaluation is performed with three classical measures:
precision P (number of relevant terms grouped as a percentage of the total number of the
grouped terms), recall R (number of relevant terms grouped as a percentage of the number
of terms in the corresponding SMQ) and F-measure F (the harmonic mean of P and R).
3.
Results and Discussion
Figure 2 indicates results, mean values for precision, recall and F-measure as well as their
min and max values, obtained with eight experiences performed. Four experiences are done
with the best grouping for each SMQ: exploitation of one axis with the complete set of
SMQ terms (1a-bc) or with the aligned terms only (1a-ba), exploitation of three axes with
the complete set of SMQ terms (3a-bc) or with the aligned terms only (3a-ba). Four more
experiences are similar to these but are performed with a merging of groupings (with Fmeasure higher than 10): exploitation of one axis with the complete set of SMQ terms (1amc) or with the aligned terms only (1a-ma), exploitation of three axes with the complete set
of SMQ terms (3a-mc) or with the aligned terms only (3a-ma). The first four experiences,
performed with the best grouping, show a high precision although recall and F-measure are
low [6]. Although unsatisfactory for recall and F-measure, such high precision seems
nevertheless to meet the expectations of the pharmacovigilance people looking for highly
specific groupings. The four additional experiences, where we apply a merging of n-best
groupings for each SMQ, were expected to increase the whole performances. As the graphs
show, we can indeed improve the recall with only small deterioration of precision. Fmeasure is also improved. As a general remark on the obtained results, when we take into
account one axis (D) the performances are always higher than when three axes (D, M, T)
are exploited, which seems to be due to the incompleteness of the formal definitions.
Another factor, which influences the performances of the method, is related to the
exploitation of complete SMQs and of SMQs composed of only aligned terms. When the
set of terms is reduced to aligned terms it has a positive effect of the evolution of recall and
F-measure, because the number of terms to be found is reduced. Although it has a negative
effect on precision. Additionally graphs from figure 2 indicate also the min and max values
for the three evaluation metrics. It appears very clearly that the interval is very large, which
means that there is a great variability of performances according to SMQs, and that
probably various strategies should be used for different diagnostics.
2(a) Precision
2(b) Recall
2(c) F-measure
Figure 2: Mean, min and max values for precision, recall and F-measure.
4.
Conclusion and Perspectives
The proposed method exploits the semantic distance for the creation of groupings of ADR
terms. Such groupings, especially when they show a high specificity, may be an useful tool
for the signal detection in pharmacovigilance area. The method proposed, once it is
optimized and improved, may also be exploited during the creation of new SMQs. In our
work, several experiences have been performed and groupings have been evaluated against
the SMQs in their broad version. A novel aspect of our work, which consisted in a merging
of n-best groupings, allows to improve recall without a significant deterioration of
precision, which is a very positive result. Additional future experiences may lead to an
adjustment of thresholds and of variables (edge weights, coefficients of axes) and to the
definition of other factors which influence the quality of groupings. We will also provide a
detailed analysis of individual SMQs in order to decide if different strategies may lead to
better results according to the SMQs. We assume also that methods provided by Natural
Language Processing may enrich and improve the groupings.
Acknowledgment. This work was partly supported by funding from the European Community's Seventh
Framework Programme (FP7/2007-2013) for the Innovative Medicine Initiative (IMI) under Grant Agreement
[1150004]. The research leading to these results was conducted as part of the PROTECT consortium (Pharmacoepidemiological Research on Outcomes of Therapeutics by a European ConsorTium, www.imi-protect.eu) which
is a public-private partnership coordinated by the European Medicines Agency. Authors are thankful to other
participants of this task (C. Bousquet, O. Caster, G. Declerck, R. Hill, A. Kluczka, X. Kurz, M. Lerch, N. Noren,
V. Pinkston, E. Sadou, J. Souvignet, T. Vardar), but views expressed are those of the authors only.
References
[1]Alecu I., Bousquet C., Jaulent MC. (2008). A case report: using Snomed CT for grouping adverse drug
reactions terms. BMC Med Inform Decis Mak, 8(S1), 4.
[2]Bate A., Lindquist M., Edwards I., Olsson S., Orre R., Lansner A. & De Freitas R. (1998). A bayesian neural
network method for adverse drug reaction signal generation. Eur J Clin Pharmacol, 54(4), 315–21.
[3]Bousquet C., Henegar C., Louët A., Degoulet P. & Jaulent M. (2005). Implementation of automated signal
generation in pharmacovigilance using a knowledge-based approach. Int J Med Inform, 74(7-8), 563–71.
[4]Brown E., Wood L. & Wood S. (1999). The medical dictionary for regulatory activities (MedDRA). Drug Saf.,
20(2), 109–17.
[5]CIOMS (August 2004). Development and Rational Use of Standardised MedDRA Queries (SMQs): Retrieving
Adverse Drug Reactions with MedDRA. Rapport interne, CIOMS.
[6]Dupuch M., Jamet A., Jaulent MC., FescharekR., Grabar N. (2011). Exploitation de la distance sémantique
pour la création de groupements de termes en pharmacovigilance. To appear in JFIM 2011.
[7]Fescharek R., Kübler J., Elsasser U., Frank M. & Güthlein P. (2004). Medical dictionary for regulatory
activities (MedDRA): Data retrieval and presentation. Int J Pharm Med, 18(5), 259–269.
[8]Hauben M. & Bate A. (2009). Decision support methods for the detection of adverse events in post-marketing
data. Drug Discov Today, 14(7-8), 343–57.
[9]Iavindrasana J., Bousquet C., Degoulet P. & Jaulent M.(2006). Clustering WHO-ART terms using semantic
distance and machine algorithms. In AMIA Annu Symp Proc, p. 369–73.
[10]Lebart L. & Salem A.(1994). Statistique textuelle. Dunod, Paris
[11]Meyboom R., Lindquist M., Egberts A. & Edwards I.(2002). Signal selection and follow-up in
pharmacovigilance. Drug Saf, 25(6), 459–65.
[12]Mozzicato P.(2007). Standardised MedDRA queries: their role in signal detection. Drug Saf, 30(7), 617–9.
[13]NLM (2008). UMLS Knowledge Sources Manual. National Library of Medicine, Bethesda, Maryland.
www.nlm.nih.gov/research/umls/.
[14]Pearson R, Hauben M, Goldsmith D, Gould A, Madigan D, O’Hara D, Reisinger S, Hochberg A.(2009).
Influence of the MedDRA hierarchy on pharmacovigilance data mining results. Int J Med Inform, 78(12), 97–103.
[15]Petiot D., Burgun A. & Le Beux P. (1996). Modelisation of a criterion of proximity: Application to medical
thesauri. In Medical Informatics Europe, p. 149–52.
[16]Rada R., Mili H., Bicknell E. & Blettner M. (1989). Development and application of a metric on semantic
nets. IEEE Transactions on systems, man and cybernetics, 19(1), 17–30.
[17]Spackman K. & Campbell K. (1998). Compositional concept representation using Snomed: Towards further
convergence of clinical terminologies. In AMIA 1998, p. 740–744.
[18]Stearns M., Price C., Spackman K. & Wang A. (2001). Snomed clinical terms: overview of the development
process and project status. In AMIA, p. 662–666.
[19]Trifirò G., et al. (2009). EU-ADR group. Data mining on electronic health record databases for signal
detection in pharmacovigilance: Which events to monitor? Pharmacoepidemiol Drug Saf, 18(12), 1176–84.
Download