supplementary material: annotation guidelines

advertisement
SUPPLEMENTARY MATERIAL: ANNOTATION GUIDELINES
1. Concepts with a Concept Unique Identifier (CUI) in the Unified Medical
Language System (UMLS), should only be annotated if they belong to the Mantra
terminology. This means that concepts should only be annotated if they belong to
MeSH, MedDRA, or SNOMED-CT, and if they have a semantic type that is part
of one of the following semantic groups: Anatomy (ANAT), Chemicals and drugs
(CHEM), Devices (DEVI), Disorders (DISO), Geographic areas (GEOG), Living
beings (LIVB), Objects (OBJC), Phenomena (PHEN), Physiology (PHYS),
Procedures (PROC). Concepts belonging to other vocabularies or semantic groups
should not be annotated. Information about the mapping of semantic types to
semantic groups can be found at http://semanticnetwork.nlm.nih.gov/SemGroups/.
2. Annotations from a silver standard corpus (SSC) derived from a number of
indexing systems will be provided as pre-annotations of concepts. A preannotation consists of the span of text corresponding with the concept, preferred
name, semantic type (possibly more than one), semantic group, and CUI.
Different pre-annotations for the same span of text may be provided.
3. To find further information on a span of text (pre-annotated or selected by the
annotator), annotators can link out to the UMLS Terminology Services
(https://uts.nlm.nih.gov/home.html) or to the Mantra terminology through the
Search field in the brat Edit annotation or New annotation screens by clicking the
UMLS or Mantra link, respectively.
4. Annotators have to check that a pre-annotation is correct, i.e., they should verify
that the preferred name, semantic type and group, and CUI of the pre-annotation
match the term in the text. Definitions of many (but not all) concepts can be found
in the UMLS Terminology Services. The context of the annotation can be used to
assess the correct meaning.
Examples of wrongly pre-annotated concepts (underlined):
“CMV attacks the retina …”
=> “attacks” has been pre-annotated as C1304680 (preferred term (PT) “attack”, type
“finding”, group DISO), which is not correct. The annotation should be removed.
“… in patients with normal coronary angiography”
=> “normal” (PT “skin appearance normal”, type “finding”, group DISO, C0558145)
is incorrectly annotated. The annotation should be removed.
“… may affect your ability to drive …”
=> “drive” (PT “drive”, type “mental process”, group PHYS, C0013126) is
incorrectly annotated and should be removed. Note that MeSH contains the concept
“automobile driving” (C0004379), but it belongs to type “daily or recreational
activity” (ACTI), and thus should not be annotated.
5. In case of multiple pre-annotations of the same span of text, the annotators should
try to disambiguate, using contextual information and information about the preannotated concepts (PT, type, group, concept definition if available). If the
difference in meaning between the concepts is not clear or the context provides
insufficient information to disambiguate, annotations are kept. If several
annotations are applicable but one annotation is more specific than another, only
the most specific annotation should be kept.
Examples of multiple pre-annotated concepts (underlined):
“… can lead not only to pain …”
=> “pain” is pre-annotated as C0030193 (PT “pain”, type “sign or symptom”, group
DISO) and C0242936 (PT “pain clinics”, “manufactured object”, OBJC). The latter
annotation is incorrect and should be removed.
“… retina of the eye …”
=> “eye” has two annotations: C0015392 (PT “eye”) and C1280202 (PT “entire
eye”). Both concepts belong to type “body part, organ, or organ component”, group
ANAT. The distinction between the concepts is not clear and both annotations should
be kept.
“Breast-feeding.”
=> “breast-feeding” has two annotations: C1623040 (PT “breast-feeding (mother)”,
type “finding”, group DISO) and C0006147 (PT “breast feeding”, type “organism
function”, group PHYS). Since there is no context information to disambiguate
between the concepts, both annotations should be kept.
“Neupro 4 mg/24 h transdermal patch. Each patch releases …”
=> “transdermal patch” is correctly pre-annotated as C0991556 (PT “transdermal
patch”). The second occurrence of “patch” has multiple pre-annotations, including
C1305400 (“surgical patch”), C1707974 (“extended-release film”), and C0991556
(“transdermal patch”). Based on contextual information, only the last annotation is
kept.
“1 ml of solution contains 40 micrograms travoprost and 5 mg timolol”
=> “solution” has two pre-annotations: C0037633 (PT “solutions”, type “substance”,
group OBJC) and C0525069 (PT “pharmaceutical solutions”, type “biomedical or
dental material”, group CHEM). Although both annotations are applicable, the latter
is the more specific annotation, and the former should be removed.
2
6. When a concept is nested within another concept, annotate the most detailed
description of the concept. The general principle is to annotate the concept that is
more specific and informative. If the more specific concept is not contained in the
Mantra terminology, annotate the less specific concept or concepts. If a concept is
overlapping with another concept, annotate both concepts.
Examples of nested or overlapping concepts:
“Exercised-induced asthma …”
=> Both “exercised-induced asthma” and “asthma” have been annotated. Since only
the more specific and informative concept should be annotated, the annotation for
“asthma” is removed.
“Two cases of subcutaneous panniculitis-like T-cell lymphoma …”
=> “subcutaneous panniculitis-like T-cell lymphoma” has been annotated, as well as
“panniculitis”, “cell”, and “lymphoma”. Only the annotation for “subcutaneous
panniculitis-like T-cell lymphoma” should be kept, as being most specific and
informative.
“Musculoskeletal tumors: …”
=> “tumors” has been pre-annotated (PT “neoplasms”, type “neoplastic process”,
C0027651) since the more specific concept “musculoskeletal tumors” (PT “malignant
neoplasm musculoskeletal”, “neoplastic process”, C0036210) is not part of MeSH,
MedDRA or SNOMED-CT and thus not contained in the Mantra terminology.
“This results in smooth muscle relaxation and inflow of …”
=> The concepts “smooth muscle” (C1267092) and “muscle relaxation” (C0026836)
have both been pre-annotated. Since the more specific concept “smooth muscle
relaxation” does not exist in the UMLS, both annotations are kept.
7. If a concept consists of two discontiguous spans of text, the annotator should mark
the related text spans (using the Add Frag. feature in the brat Edit annotation or
New annotation screens) and assign the corresponding CUI.
Examples of concepts that consist of fragmented text spans:
“Patients with renal or hepatic impairment …”
=> “renal” (PT “kidney”, group ANAT, C0022646) and “hepatic impairment” (PT
“hepatic impairment”, DISO, C0948807) have been pre-annotated. The former should
be replaced by an annotation of the fragments “renal” and “impairment” (PT “renal
impairment”, DISO, C0341697)
“… sympathetic middle cervical ganglion …”
=> The fragments “sympathetic” and “cervical ganglion” should be annotated with
the single concept C0446846 (PT “Cervical sympathetic ganglion”, ANAT); “middle
cervical ganglion” should be annotated separately, as C1281049 (PT “Entire middle
cervical ganglion”, ANAT) and C0228999 (PT “Structure of middle cervical
ganglion”, ANAT).
3
“… Chinese Hamster Ovary (CHO) cells.”
=> The concept C0085080 (PT “Chinese hamster ovary cell”) should be annotated
twice, first by annotation of the fragments “Chinese Hamster Ovary” and “cells”, and
second by annotation of the fragments “CHO” (without parentheses) and “cells”.
8. If a concept was not pre-annotated, the annotator should indicate the boundaries of
the concept and its CUI (specified as “C” followed by seven digits in the Notes
field of the brat New annotation or Edit annotation screens). Misspelled terms are
also annotated.
Examples of missed annotations:
“… tablets are packed in unit dose blisters in packs of …”
=> “blisters” has wrongly been pre-annotated as C0005758 (PT “blister”, type
“pathologic function”, DISO) and C0344311 (PT “blistering eruption”, type “disease
or syndrome”, DISO). These annotations should be removed and the annotation
C1319688 (PT “blister – unit of product usage”, type “biomedical or dental material”,
group CHEM) should be added.
“Malignant skin tumours.”
=> “malignant” (C1306459) and “skin tumours” (C0037286) have been pre-annotated
as two separate concepts. They should be removed and the whole term should be
annotated as C0007114 (PT “malignant neoplasm of skin”).
“…, Sjorgen’s syndrome, …”
=> The term “Sjorgen’s syndrome” is misspelled and has not been pre-annotated. An
annotation with the concept C1527336 (PT “Sjogren’s syndrome”) should be added.
9. The annotator should annotate a subword, i.e., a part of a word, if the subword is
contained in the Mantra terminology and the full word is not.
Examples of subword annotations:
“lumbaalzak” (Dutch for “lumbar sac”)
=> There is no concept corresponding with this term in the Mantra terminology (nor
in the UMLS). In English, the annotator should annotate “lumbar” (C0024090, PT
“Lumbar region”, ANAT). In Dutch, the subword “lumbaal” should be annotated with
the same CUI.
“Penisgewebe” (German for “penile tissues”)
=> The subword “Penis” should be annotated with two concepts, C0030851 (PT
“Penis”, “ANAT”) and C1280739 (PT “Entire penis”, ANAT); the subword “gewebe”
should be annotated with C0040300 (PT “Body tissue”, ANAT).
4
Download