Review on naming convention documents Daniel Schober, EMBL-EBI A informal review of guideline documents containing naming conventions This review lists naming convention developments in scientific domains that are distantly related to ontology engineering (part 1), as well as some of the more prominent conventions and recommendation documents tackling ‘how to label classes’ in representational artefacts more comparable to ontologies (part 2). Table of contents 1 Naming conventions in fields other than ontology engineering ................................. 2 Relational Database curation ..................................................................................... 2 Developing High Quality Data Models, EPISTLE ................................................ 2 Programming Languages ........................................................................................... 2 The New C Standard, An Economic and Cultural Commentary ........................... 2 Wikipedia category naming conventions ................................................................... 3 Natural Language processing, NLP ........................................................................... 5 Named entity normalization, NEN ........................................................................ 5 Linguistics ontology, GOLD ................................................................................. 5 Template element construction, TEC .................................................................... 6 Constrained Natural Languages, CNL ....................................................................... 6 2. Naming conventions in ontology related domains..................................................... 7 The ANSI/ISO Z39.19-2005 Standard ...................................................................... 7 ISO/IEC 11179-5, Metadata registries (MDR) ........................................................ 11 W3C HCLS .............................................................................................................. 13 GO Editorial style guide, GO Consortium ............................................................... 14 IUPAC golden book, IUPAC................................................................................... 16 Meta Content Framework Using XML, W3C ......................................................... 17 Law and Order ......................................................................................................... 17 Ontologies for molecular biology and bioinformatics, Steffen Schulze-Kremer .... 17 Guideline for creating medical terms, Barbara Heller ............................................. 18 1 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI 1 Naming conventions in fields other than ontology engineering Relational Database curation In Database design remotely related is ‘record linkage’ (find heterogonous names/entities that refer to the same entity in different tables/data sources) and ‘datade-duplication’ (normalizing these heterogenous names and remove redundant information). (Ref: The State of Record Linkage and Current Research Problems, William E. Winkler, U. S. Bureau of the Census, http://www.census.gov/srd/papers/pdf/rr9904.pdf. ) Record linkage research is generally characterized by its synergism of statistics, computer science, and operations research and hence is not applicable to human ‘to on the fly’ name creation. Developing High Quality Data Models, EPISTLE EPISTLE: European Process Industries STEP Technical Liaison Executive, Version: 2.0, Matthew West Editor: Julian Fowler http://www.matthew-west.org.uk/Documents/princ03.pdf This document is centred on data models (relational DTB schemata) and therefore it contains conventions on attributes and fields, but lacks an object-oriented view on ontological classes. Its target audience is primarily from a non-biomedical domain, mostly tailored for the business/enterprise domain. In terms of coverage it provides very few actual naming conventions (an exception is chapter ‘7.3 Naming Entity Types’) , but more general design recommendations. Programming Languages The New C Standard, An Economic and Cultural Commentary Derek M. Jones, 2005 http://www.coding-guidelines.com/cbook/sent787.pdf This very detailed document was intended primarily for C programmers and therefore refers mainly to names in programming source code (where they are called ‘identifiers’). Nevertheless this document has a very good coverage on the general neurophysiological and cognitive basics underlying naming and human name recognition in representational artefacts. The title is a bit misleading, since a large fraction of their guidelines refer to naming entities in general and are transferable/applicable to other representational artefacts than programming languages. This is illustrated, e.g. as shown here in the table of contents: 2 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI Wikipedia category naming conventions http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(categories) There is a confusing plurality of naming convention sites on WIKI, e.g. http://en.wikipedia.org/wiki/Category:Wikipedia_naming_conventions The only more interesting (general) one seems to be the one on naming categories. This is a quite similar approach to ours, and amny of the conventions in the General Conventions section can be mapped to our set: 3 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI E.G.: Avoid abbreviations. Example: "World War II equipment", not "WW2 equipment". However, former abbreviations that have become the official name should be used in their official form where there are no other conflicts. Don't hard-code the category structure into names. Example: "Monarchs", not "People - Monarchs". Choose category names that are able to stand alone, independent of the way a category is connected to other categories. Example: "Wikipedia policy precedents and examples", not "Precedents and examples" (a subcategory of "Wikipedia policies and guidelines"). Topical category names should be singular. Examples: "Law", "Civilization" …each of these have a corresponding recommendation in our naming conventions. This effort also states conventions for certain categories, e.g. lists (=instances?), which are only confusing to the ontology developer: Special conventions for lists of items If a category contains pages which are each about a kind of X or an individual X, the name of the category is Xs (plural), e.g. if a category contains pages which are each about a river and/or a kind of river, the name of the category is "rivers", and similarly for "writers". Such a category may additionally contain subcategories with similar, more restricted content. It is also possible that the category exclusively contains subcategories.” Many given specialized conventions (the majority of the whole set and the ones on the extra page on http://en.wikipedia.org/wiki/Category:Wikipedia_naming_conventions ) are not applicable to OE, e.g. : “For geographical photo requests, the category name should be 'Wikipedia requested photographs in xxx' as in [[Category:Wikipedia requested photographs in England]].” These conventions are a bit hard to browse because there are special conventions and general conventions listed on the same level, and the same conventions, e.g. on Abbreviation resolution are listed on different pages (e.g. on http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(categories) And also on http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Stub_sorting/Naming_guidelines #Categories ). There are multiple other wiki naming convention sites that where it is not clear how they stand to the main Website (e.g. http://en.wikipedia.org/wiki/Wikipedia_talk:Naming_conventions_(categories) ) They have conventions that are applicable to certain ontological classes incl. their ancestors e.g. under http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(categories)#Special_co nventions they have special conventions on people, man-made objects, countries and companies. 4 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI The technical restriction in naming conventions as stated on http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(technical_restrictions) don’t apply. Natural Language processing, NLP Since NLP is concerned more with named entity recognition than creation, naming conventions can be found rather sparsely. The general workflow of NLP is the other way round, because it parses merely full sentences in which the named entities are not formal, but natural language expressions without any conventions (corresponding to a ‘user-preferred name’ for which we explicitly exclude the validity of our conventions). An inverse NLP parser, a ‘linguistic realizer’ would be needed here and here the approaches discussed in the discussion section on constrained natural languages, CNLs (p. 8) apply here. Named entity normalization, NEN A related area in NLP is Named entity normalization, the mapping of surface forms to unambiguous names, but conventions to be found here are limited, since they refer to special named entities/classes (mostly ‘Gene name’, ‘Person’ and ‘Organisation’). They are also specialized, therefore NEN - is only applicable for restricted classes, corpora and sublanguages. NEN is intended for computer processing and ’conventions’ are laid out as algorithms that - due to time constraints - can hardly be applied by human editors on each term on-the-fly in practice. Many NLP NEN algorithms contain rigid syntactic conventions that require a thorough knowledge of linguistics. They usually require access to additional lexical resources (e.g. reducing morphological variance through word form normalization via lemmatization or stemming). (see Jijkoun V B., Khalid M A., Marx M., de Rijke M., Named Entity Normalization in User Generated Content, SIGIR 2008 Workshop on Analytics for Noisy Unstructured Text Data, Singapore, 2008, http://ilps.science.uva.nl/biblio/named-entitynormalization-user-generated-content, p. 23-30) Linguistics ontology, GOLD One emerging standard in NLP is the GOLD ontology (http://www.linguisticsontology.org/gold.html): “GOLD is an ontology for descriptive linguistics. It gives a formalized account of the most basic categories and relations (the "atoms") used in the scientific description of human language. First and foremost, GOLD is intended to capture the knowledge of a well-trained linguist, and can thus be viewed as an attempt to codify the general knowledge of the field.” NLP approaches usually annotate single words in text with xml elements for linguistic structures and part of speech (POS) tags. Classes from this ontology can be used as such tags and in this respect could be seen as a ‘naming convention’. These are however not of interest to our field, and no conventions are given for the actual appearance of names of the classes in this ontology. 5 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI However if we decide in the future to give additional naming conventions on the syntax and morphology in composite names, we might use the GOLD ontology to provide an appropriate and concise terminology. The drawback would of cause be that this terminology is large, the average ontology editor is not familiar with it and the cost to learn might not justify the effort. Template element construction, TEC Although ‘template element construction (TE, which adds descriptive information to named entity results) could remotely be viewed as the creation of some sort of ‘defined class’, its formalism and scope is too different from what we envision here. Generally all these other domains address domain specialists and use a specific NLP vocabulary that ontology editors are not familiar with. They are specialised and of limited coverage is so far as they only tackle certain named entities. Constrained Natural Languages, CNL Some aspects of what we propose here mirror features of so-called Constrained Natural Languages, CNL [34]. In particular, defined restrictions in the use of grammar and terminology can be found in CNL, and exploiting developments in this field could prove fruitful. However we must be careful not to be seen to be trying to impose too great a burden on ontology editors by attempting to require them to learn another full representation language. Constrained natural languages (CNL) are not mainly concerned with the naming of single word entities, but rather refer to complete sentences [1]. The majority of our naming conventions on the other hand refer more to words in the lexicon as used by a CNL (i.e. the so called ‘content words’). Capturing logical axioms in natural language, CNLs apply to the textual definitions given for each RU and could here serve as a semi-formal intermediate layer that will allow for a definition-based automatic generation or verification of logical axioms and defined classes. A look at some more detailed terms from ontologies, e.g. GO, reveals that, -in order to be explicit and context independent- here the term names get rather long and can be seen as natural language definitions themselves, e.g. GO:0000184 , "nucleartranscribed mRNA catabolic process, nonsense-mediated decay. These long names illustrate that a border where CNL could/should be applied can not strictly be defined. But this is a different area and we doubt capturing another layer of formality will foster OE velocity. However, controlled language tools analyze text, performing pattern recognition and string analysis tasks to determine if a text conforms to the grammatical, terminological and syntactic rules of a CNL. These seem to be promising candidates to learn how semi formal syntaxes in harmony with computer, as well as human readability, can enforced. (this has in fact been done, see ref to validator in the paper ). These tools may examine basic syntax and morphology and may also include a generation component which provides suggestions for approved alternate expressions, e.g. as described here http://www.shlrc.mq.edu.au/masters/students/raltwarg/clgeneration.htm or here http://www.ics.mq.edu.au/~rolfs/peng/context-menu-words.jpeg and as now being applied by the latest OBO Edit 2 tool. 6 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI For an original compound name ”phosphor-added protein” or “phosphor-bound protein” the system would check if its single word components are ‘alternative names’ for existing classes and then substitute these with the ‘preferred name’ in a new generated name recommendation, e.g. “phosphorylated protein”. Such ‘lexical lookup’ and ‘morphologic normalisation’ can also resolve acronyms and ambiguous slang words in names. 2. Naming conventions in ontology related domains This section tackles more concrete naming conventions in the ontology related domains knowledge representation, artificial intelligence, object oriented programming and semantic web. The ANSI/ISO Z39.19-2005 Standard Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, ISBN: 1-880124-65-3, National Information Standards Organization, NISO Press 2005, Bethesda, Maryland, U.S.A., Approved July 25, 2005 by American National Standards Institute http://www.niso.org/standards/resources/Z39-19-2005.pdf This is a general ‘best practice’ recommendation for all aspects of controlled vocabulary engineering. The scope is very broad and it tries to provide guidelines for representational artefacts that are as diverse as ‘subject headings’ and ontologies. From its scope definition: “This Standard presents guidelines and conventions for the contents, display, construction, testing, maintenance, and management of controlled vocabularies. It covers all aspects of constructing controlled vocabularies including extensive rules and guidelines for term selection and format, the use of compound terms, and establishing and displaying various types of relationships among terms. This Standard focuses on controlled vocabularies that are used for the representation of content objects. Controlled vocabularies covered by this Standard include lists of controlled terms, synonym rings, taxonomies, and thesauri. The guidelines apply to all four types unless noted otherwise.“ The standard is intended for KOS in general, so the scope is very broad: “This Standard is primarily intended to be applied to controlled vocabularies for use with knowledge organization systems. […] The term knowledge organization systems is intended to encompass all types of schemes for organizing information and promoting knowledge management. Knowledge organization systems include classification schemes that organize materials at a general level (such as books on a shelf), subject headings that provide more detailed access, and authority files that control variant versions of key information (such as geographic names and personal names). They also include less-traditional schemes, such as semantic networks and ontologies.” In terms of coverage and applicability this document comes close to a general usable recommendation in the field of ontology engineering, but still it is dealing with controlled vocabularies and not with ontologies in most of its recommendations. In Chapter 5.4 the standard lists the representational artifact types, it claims to be 7 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI intended for: Lists, Synonym rings, Taxonomies and Thesauri. Ontologies are not mentioned there. It does not explicitly deal with ontological types and relational properties, but rather with CV terms. This is reflected in the usage of ‘broader term’ and ‘narrower term’ relations, which are more useful for lexical structuring in thesauri than in formal ontologies. This makes the overhead of information that needs to be verified regarding applicability for ontologies too high. The terminology used in this standard is so different from what is used in the OBO world, that the mapping to the ‘meta‘terminology biologists are familiar with constantly distracts the reader from its content. One implication of this is that the conventions put less weight on compound term refactoring into relations and more atomic terms. Also the document is a bit too long to be expected to be read by ontology editors seeking for fast practical advice. In terms of coverage, of the 11 Chapters only two deal with naming of classes. This standard is hard to read also, because it constantly cross-references to other externally defined standards (e.g. 3 times on page 33 alone). For our target readers we need something more lightweight. Another issue is the notion of ‘concepts’ which violates the realist perspective underlying the OBO approach. The interesting chapters regarding naming issues are Chapter 6.2, 6.2.1, and 6.3 -7.4. In the Chapter ‘6.3 Term Form’ a top level ‘ontology’ of general types (chapter 6.3.2) is introduced without clearly stating the purpose of this approach or giving an actual convention. Proper names are mentioned in Chapter ‘6.3.3 Unique Entities’, but a connection to an actual convention is hard to find. It loosely states “Unique entities, or “classes-ofone,” are usually expressed as proper nouns.“ The last two issues are examples that show that this standard often lists what is done, but does not provide clear naming conventions. All in all actual naming conventions are hard to find within this document and when such conventions are given then they are conflated with related discussions, e.g. the chapter ‘6.2 Term scope’ also discusses metadata to be associated, e.g. scope notes and history notes. Some recommendations are not too well backed up by justifications, e.g. in Chapter 6.5.2 it is stated that count nouns should normally be expressed as plurals, e.g. use books, vertebrates, chemical reactions instead of their singular form. It is not stated why and this convention immediately creates the need for exceptions (see Chapter 6.5.1.1). In some aspects the conventions given are contradictory at least to some extend, e.g. in Chapter 6.2.1.a. the standard recommends to indicate homonyms via pre-term qualifiers, and in Chapter 6.5.4 they tolerate homonyms in Singular and Plural forms with post-term qualifiers (e.g. ‘bridge (game)’, ‘bridges (dentistry)’, ‘bridges (structures)’. Chapter 6.6 talks about the selection of ‘preferred forms’ of terms, but has a rather blurred definition of what a preferred form means: “preferred term: One of two or more synonyms or lexical variants selected as a term for inclusion in a controlled vocabulary. See also nonpreferred term.” Basically they define it as any alternative term in a CV, whatever ‘in a CV’ means. This definition is another example of the reoccurring ‘blown up’ language to describe simple things in a seemingly formal way. This definition does not say by whom a term/form is actually preferred by: Although the first sentence of Chapter 6.6 states 8 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI “The authority for the form selected should be recorded in the term record (see section 11.1.4)“, a quick verifications shows that there is no reference of a preferred term (I guess here they call it ‘term’) authority in the term records chapter: “11.1.4 Term Records An individual record should be created for every term, and optionally for every entry term, as soon as it is admitted into a controlled vocabulary. Records for entry terms may include source notes as well as the date of admission into the controlled vocabulary. For terms, the record may contain any or all of the following elements: • term • source(s) consulted for terms and entry terms. NOTE: This field is especially important for neologisms or unfamiliar terms; it may include citations to published sources or the names of personal authorities consulted. • scope note • USED FOR references – to indicate which synonyms, near synonyms, and other expressions are covered by the term. • nondisplayable variations, e.g., common spelling errors (see section 6.6.2) • broader terms • narrower terms • related terms • locally established relationships • category or classification number • history note, including minimally the date added, as well as the record of changes, if any (see section 6.2.3) See section 9.3.3 for examples of term records. Section 11.4.1 discusses field definition in controlled vocabulary management systems.” Sometimes the way the document is structured is inconsistent as well, e.g. on page 33 the handling of trade names is discussed in the chapter on place names. To summarize: we came to the conclusion that it is more plausible to develop a more restrictive and targeted recommendation from scratch rather than re-use from ISO what is usable for our divergent scope. Besides the core recommendations in this large document are few concerning naming, and most of these have been addressed by our convention. The chapters concerning naming are: 6 Term Choice, Scope, and Form 6.1 Choice of Terms ...................................................................................................................... 20 6.2 Scope of Terms.............................................................................................................20 6.2.1 Homographs..............................................................................................................20 6.2.2 Scope Notes................................................................................................................22 6.2.3 History Notes...............................................................................................................22 6.3 Term Form ...............................................................................................................................23 6.3.1 Single-Word vs. Multiword Terms...............................................................................23 6.3.2 Types of Concepts ......................................................................................................23 6.3.3 Unique Entities ............................................................................................................24 6.4 Grammatical Forms of Terms ..................................................................................................25 6.4.1 Nouns and Noun Phrases...........................................................................................25 6.4.2 Adjectives....................................................................................................................26 6.4.3 Adverbs ......................................................................................................................27 6.4.4 Initial Articles...............................................................................................................27 6.5 Nouns....................................................................................................................................28 9 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI 6.5.1 Count Nouns ...............................................................................................................28 6.5.2 Mass Nouns ................................................................................................................29 6.5.3 Other Types of Singular Nouns...................................................................................29 6.5.4 Coexistence of Singular and Plural Forms .................................................................29 6.6 Selecting the Preferred Form...................................................................................................30 6.6.1 Usage..........................................................................................................................30 6.6.2 Spelling .......................................................................................................................30 6.6.3 Abbreviations, Initialisms, and Acronyms ...................................................................31 6.6.4 Neologisms, Slang, and Jargon..................................................................................31 6.6.5 Popular and Scientific Names.....................................................................................32 6.6.6 Loanwords, Translations of Loanwords, and Foreign-Language Equivalents............32 6.6.7 Proper Names.............................................................................................................33 6.7 Capitalization and Non-alphabetic Characters ........................................................34 6.7.1 Capitalization...............................................................................................................34 6.7.2 Non-alphabetic Characters .........................................................................................34 6.7.3 Romanization ..............................................................................................................36 7 Compound Terms 36 7.1 General ...................................................................................................................................36 7.2 Purpose of Guidelines on Compound Terms..............................................................36 7.2.1 Precoordinated Terms ................................................................................................37 7.2.2 Retrieval Considerations.............................................................................................37 7.3 Factors to be Considered When Establishing Compound Terms..................................37 7.4 Elements of Compound Terms ..................................................................38 7.5 Criteria for Establishing Compound Terms.......................................................39 7.6 Criteria for Determining When Compound Terms Should be Split ........................40 7.6.1 Factors to be Considered............................................................................................40 7.6.2 Hierarchical Structure..................................................................................................40 7.7 Node Labels .............................................................................................................................41 7.8 Order of Words in Compound Terms...............................................................41 7.8.1 Cross-references from Inversions.......................................................................41 8 Relationships 42 8.1 Semantic Linking......................................................................................................................42 8.2 Equivalence Relationships ...................................................................................................... 43 8.2.1 Synonyms................................................................................................................... 44 8.2.2 Lexical Variants .......................................................................................................... 45 8.2.3 Near-Synonyms.......................................................................................................... 45 8.2.4 Generic Posting .......................................................................................................... 45 8.2.5 Cross-references to Elements of Compound Terms.................................................. 46 8.3 Hierarchical Relationships ........................................................................... 46 8.3.1 Generic Relationships ................................................................................................ 47 8.3.2 Instance Relationships ............................................................................................... 48 8.3.3 Whole-Part Relationships ........................................................................................... 49 8.3.4 Polyhierarchical Relationships ................................................................................... 49 8.3.5 Node Labels in Hierarchies ........................................................................................ 51 8.4 Associative Relationships....................................................................................... 51 8.4.1 Relationships Between Terms Belonging to the Same Hierarchy.............................. 51 8.4.2 Relationships Between Terms Belonging to Different Hierarchies ............................ 53 8.4.3 Node Labels for Related Terms ................................................................................. 56 8.4.4 Specifying Types of Related Term References.......................................................... 57 But also refers to displaying idioms in various Formats (print, web, …): 10 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI 9 Displaying Controlled Vocabularies 57 9.1 General Considerations........................................................................................................... 57 9.1.1 Elements to Address .................................................................................................. 57 9.1.2 User Categories.......................................................................................................... 57 9.2 Presentation................................................................................................................... 58 9.2.1 Displaying the Equivalence Relationship ................................................................... 58 9.2.2 Displaying Hierarchical and Associative Relationships.............................................. 60 9.2.3 Indentation.................................................................................................................. 61 9.2.4 Typography................................................................................................................. 62 9.2.5 Capitals and Lowercase Letters ................................................................................. 63 9.2.6 Filing and Sorting........................................................................................................ 63 9.3 Types of Displays ................................................................................................ 64 9.3.1 Alphabetical Displays ................................................................................................. 64 9.3.2 Permuted Displays ..................................................................................................... 65 9.3.3 Term Detail Displays .................................................................................................. 66 9.3.4 Hierarchical Displays .................................................................................................. 68 9.3.5 Graphic Displays ........................................................................................................ 73 9.4 Display Formats – Physical Form.......................................................................... 74 9.4.1 Print Format – Special Considerations....................................................................... 74 9.4.2 Screen Format – Special Considerations................................................................... 75 9.4.3 Web Format – Special Considerations....................................................................... 79 ISO/IEC 11179-5, Metadata registries (MDR) Part 5:Naming and identification principles, Second edition, 2005-09-01 This document was freely available in January 2006, but now it got commercialized (it costs 40 GBP to look at the 17 pages document). The scope as taken from the abstract: “ISO/IEC 11179-5:2005 provides instruction for naming and identification of the following administered items: data element concept, conceptual domain, data element, and value domain. It describes the parts and structure of identification. Identification is narrowly defined to encompass only the means to establish unique identification of these administered items within a register. It describes naming in an MDR; includes principles and rules by which naming conventions can be developed; and describes example naming conventions. The naming principles and rules described in ISO/IEC 11179-5:2005 apply primarily to names of data element concepts, conceptual domains, data elements, and value domains.” This is one of the few documents that explicitly state naming conventions (and call them that way), but unfortunately it is not very detailed and nor of great coverage. It was done for the MDR (so is rather database centric). As many other ISO standards it constantly cross-refers to further external documents (ISO) which makes it rather unreadable. As other ISO recommendations it tends to be ‘over-formal’ and complex to serve as a standalone guideline for the biomedical ontology editor. Nevertheless it has some good examples on how semantic, syntactic and lexical conventions can look like. Annex A contains an “example naming conventions for names within an MDR registry”, but the general document does not contain actual naming recommendations. Instead it rather provides a basic introduction to what naming conventions are and 11 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI what types of naming conventions one could create. It also addresses what naming convention documents should be constructed and what they need to cover. E.g.: “A naming convention shall cover all relevant documentation aspects. This includes, as applicable, the scope of the naming convention, e.g. established industry name; the authority that establishes names; semantic rules enable meaning to be conveyed and governing the source and content of the terms used in a name, e.g. terms derived from data models, terms commonly used in the discipline, etc.; syntactic rules covering required term order; lexical rules (word form and vocabulary) covering controlled term lists, e.g. a rule citing an authority for spelling words within terms , name length, character set, language; reduce redundancy and increase precision a rule establishing whether or not names must be unique. a uniqueness rule documents how to prevent homonyms occurring within the scope of the naming convention. Relevant parts on naming conventions: 6.1 Names in a registry................................................................................. 4 6.2 Naming conventions............................................................................... 4 7 Development of naming conventions........................................................ 5 7.1 Introduction ........................................................................................... 5 7.2 Scope principle ...................................................................................... 5 7.3 Authority principle................................................................................. 5 7.4 Semantic principle ................................................................................. 5 7.5 Syntactic principle ................................................................................. 6 7.6 Lexical principle..................................................................................... 6 7.7 Uniqueness principle.............................................................................. 6 Annex A Example naming conventions for names within an MDR registry..7 Annex B Example naming conventions for Asian languages........................16 To give the reader a feeling of the confusing mass of terminological standard recommendations in ISO alone, I here state some of them. This also illustrates that a review of the whole ISO recommendations is totally out of our reach. ISO 704:2000 Terminology work – Principles and methods ISO 860:1996 Terminology work – Harmonization of concepts and terms ISO 1087-1:2000 Terminology work – Vocabulary – Part 1: Theory and application ISO 15188:2001 Project management guidelines for terminology standardization ISO 1087-2:2000 Terminology work – Vocabulary – Part 2: Computer applications ISO 12620:1999 Computer applications in terminology – Data categories ISO 16642:2003 Computer applications in terminology – Terminological Additional ISO documents ( as taken from: http://www.iso.org/iso/en/ISOOnline.frontpage) ISO 1951:1997 Lexicographical symbols particularly for use in classified defining vocabularies 12 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI ISO 12200:1999 Computer applications in terminology - Machine-readable terminology interchange format (MARTIF) - Negotiated interchange ISO/TR 12618:1994 Computer aids in terminology - Creation and use of terminological databases and text corpora ISO 12620:1999 Computer applications in terminology - Data categories ISO/TS 20225:2001 Global medical device nomenclature for the purpose of regulatory data exchange ISO/IEC Guide 2:2004 Standardization and related activities -- General vocabulary ISO 10241:1992 International terminology standards -- Preparation and layout ISO 15188:2001 Project management guidelines for terminology standardization ISO/IEC TR 19764:2005 Information technology -- Guidelines, methodology and reference criteria for cultural and linguistic adaptability in information technology products ISO 690-2:1997 Information and documentation -- Bibliographic references -Part 2: Electronic documents or parts thereof ISO 2788:1986 Documentation -- Guidelines for the establishment and development of monolingual thesauri ISO 5127:2001 Information and documentation -- Vocabulary ISO 5963:1985 Documentation -- Methods for examining documents, determining their subjects, and selecting indexing terms ISO 7220:1996 Information and documentation -- Presentation of catalogues of standards ISO 15924:2004 Information and documentation -- Codes for the representation of names of scripts ISO/TR 21449:2004 Content Delivery and Rights Management: Functional requirements for identifiers and descriptors for use in the music, film, video, sound recording and publishing industries ISO 8601:2004 Data elements and interchange formats -- Information interchange -- Representation of dates and times 35.020 Information technology (IT) in general, including general aspects of IT equipment 35.040 Character sets and information coding, including coding of audio, picture, multimedia and hypermedia information, IT security techniques, encryption, bar coding, electronic signatures, etc. ISO/IEC TR 14652:2004 Information technology -- Specification method for cultural conventions Further relevant ISO documents can be found under the ICS field 35.240.30 IT applications in information, documentation and publishing. W3C HCLS Harmonization of the representation of labels, descriptions and definitions of entities in biomedical ontologies, W3C Semantic Web Healthcare and Life Sciences interest group (HLCS) http://esw.w3.org/topic/HCLS/Labels_and_Definitions http://esw.w3.org/topic/MatthiasSamwald 13 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI Very short and only addresses name categories and their implementation in owl. Not to be used as general guideline. Very limited in scope: “Informal recommendation Make use of rdfs:label and rdfs:comment where possible (?) If you really need to define a new annotation property, make it a subproperty of rdfs:label or rdfs:comment. Please be aware that making a owl:datatypeProperty a subclass of rdfs:label or rdfs:comment is NOT valid.” The page is basically a summary of the OBI metadata discussions. Contains the following: Examples from biomedical ontologies of non-standard constructs to represent Labels, names, descriptions and definitions of entities. An analysis of the motivations behind the creation of these constructs A description of the problems that arise through a lack of harmonization (e.g. for queries and user interfaces) A review of the basic constructs in the RDF, RDFS and OWL vocabularies and their intended usage for labels, descriptions and definitions An informal recommendation for the representation of labels, descriptions and definitions and suggestions for the harmonization of biomedical ontologies in this regard. file:///C:/Documents%20and%20Settings/schober/Desktop/OBI/Naming%20Conventi ons/conventions.html.htm Contains a section “2.1. Rationales against the InterCap style“ GO Editorial style guide, GO Consortium http://www.geneontology.org/GO.usage.shtml Scope (taken from the website): “The GO Style Guide introduces new users to (and reminds old users of) both the philosophy and the practicalities behind developing and maintaining GO. Its main purpose is to serve as a user manual for GO curators.” This document is a design principle documentation and addresses all the immediate practical needs of ontology editors. The term ‘style’ in the title basically refers to a quite heterogenous set of issues, of which actual naming conventions are a rather minor part. It lists all important conventions for the construction of all types of representational units found in the OBO Format. Some naming conventions are outlined in the first section “General Conventions When Adding Terms” (refered to as ‘stylistic points’): “The following stylistic points should be applied to all aspects of the ontologies. Spelling conventions Where there are differences in the accepted spelling between English and US usage, use the US form, e.g. polymerizing, signaling, rather than polymerising, signalling. There is a dictionary of words used in GO terms in the file GODict.DAT. Abbreviations 14 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI Avoid abbreviations unless they're self-explanatory. Use full element names, not symbols. Use hydrogen for H+. Use copper and zinc rather than Cu and Zn. Use copper(II), copper(III), etc., rather than cuprous, cupric, etc.. For biomolecules, spell out the term in full wherever practical: use fibroblast growth factor, not FGF. Greek symbols Spell out Greek symbols in full: e.g. alpha, beta, gamma. Upper vs. lower case GO terms are all lower case except where demanded by context, e.g. DNA, not dna. Singular vs. Plural Use the singular form of the term, except where a term is only used in the plural (e.g. caveolae). Be Descriptive Aim to be reasonably descriptive, even at the risk of some verbal redundancy. Remember, databases that refer to GO terms might list only the finest-level terms associated with a particular gene product. If the parent is aromatic amino acid family biosynthesis, then the child should be aromatic amino acid family biosynthesis, anthranilate pathway, not just "anthranilate pathway". Anatomical Qualifiers Do not use anatomical qualifiers in the cellular process and molecular function ontologies. For example, GO has the molecular function term DNA-directed DNA polymerase activity but neither "nuclear DNA polymerase" nor "mitochondrial DNA polymerase". These terms with anatomical qualifiers are not necessary because annotators can use the cellular component ontology to attribute location to gene products, independently of process or function.” In large, these conventions are in harmony with what we recommend in the OBO Foundry conventions and we hope these can be omitted in future versions of this resource and instead the OBO Foundry naming conventions will be referenced here. GO provides a dictionary of words in use (GODict.DAT) to build GO terms. This serves as a lexical help to avoid synonym overload and render terms lexically more uniform. However this dictionary could be of even greater usefulness if it would be accessible as a concordance, providing the word neighbourhood (usage contexts) and usage frequencies of these re-occuring terms morphemes to the editors. There are some syntactic recommendations (word order) given implicitly, e.g. “If the parent is aromatic amino acid family biosynthesis, then the child should be aromatic amino acid family biosynthesis, anthranilate pathway, not just "anthranilate pathway". “, but I am not sure to what extend the word order given is useful. However we fully agree with this sentence main recommendation, to be explicit in naming, also to provide enough descriptive information when looking at an annotated data item, and the full superclass hierarchy is not immediately accessible. In general I would regard the overall-structure of this guide a bit informal, e.g. Synonyms are discussed dispersed under different headers. The guide also mixes domain dependent recommendations with general, domain-independent ones. This guideline is very good at domain dependent conventions and recommendations, e.g. it contains also rules like “If either X biosynthesis or X catabolism exists, then the parent X metabolism must also exist.”, which guide in name coverage and granularity decisions. Having the scope of this document in mind, all issues mentioned above are minor drawbacks and senseful compromises. 15 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI IUPAC golden book, IUPAC http://goldbook.iupac.org/ and IUPAC blue book: http://www.acdlabs.com/iupac/nomenclature/93/r93_316.htm The general naming creation flow path is the following: http://www.acdlabs.com/iupac/nomenclature/93/r93_317.htm “R-4.1 General Principles The formation of the systematic name for an organic compound involves several steps, to be taken as far as they are applicable in the following order: (a) from the nature of the compound, determine the type(s) of nomenclature operations (see Section R-1.2) to be used. Although the so-called "substitutive nomenclature" is emphasized in these recommendations, other kinds of names, for example, functional class names, are often given, usually as alternatives; (b) determine the kind of characteristic group to be cited as suffix (if any) or as a functional class name. Only one kind of characteristic group (known as the principal group) can be cited as suffix or functional class name . All substituents not so cited must be specified as prefixes; (c) determine the parent hydride, including any appropriate nondetachable prefixes [detailed rules for choice of the principal chain, the preferred ring or ring system, the functional parent compound, or conjunctive components are described in the 1979 edition of the IUPAC Nomenclature of Organic Chemistry (see, for example, Rule C-12]; (d) name the parent hydride and the principal characteristic group, if any, or the functional parent compound; (e) determine infixes and/or prefixes [with the appropriate multiplying prefixes (see Table 11)], and number the structure as far as possible ; (f) name the detachable substitutive prefixes and complete the numbering of the structure, if necessary; (g) assemble the components into a complete name, using alphabetical order for all substitutive prefixes. In substitutive nomenclature, some characteristic groups can be denoted either as prefixes or suffixes (see Table 5), but others only as prefixes (see Table 9). Functional class names differ in that a separate word (or suffix in some languages) designating the name of a functional class is associated with a "radical" name designating the remainder of the structure.” Tab 5: Class Formula Prefix Suffix Acid halides lcoholates, Phenolates oxido- -olate Alcohols, Phenols hydroxy- -ol Aldehydes formyl- -carbaldehyde Amides carbamoyl- -carboxamide … 16 oxo- -al 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI The IUPAC nomenclature documents relate to the restrictive target domain of chemical names. For this application field they are very exhaustive and highly developed. They are however not of great help for a broader biodomain. Further documents that we looked at, but which were not included in this review (mainly due to time constraints): Meta Content Framework Using XML, W3C NOTE-MCF-XML, W3C, 6 June 97 http://www.w3.org/TR/NOTE-MCF-XML-970624 This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE. Law and Order Zhang, Bodenreider More design Principles, refers to semantic and structural issues, but has small part on naming and refactoring parts of names into relations. As section on compound word consistency is present (see “Lobe of left lung”), illustrating the conscious use of qualifiers and their mappings to - or refactoring into - explicit relations (sec. 4.2). Ontologies for molecular biology and bioinformatics, Steffen Schulze-Kremer http://www.bioinfo.de/isb/2002/02/0017/main.html This paper gives valuable concept naming guidelines and states the following rules that are supposed to make an ontology more readable: 1. use singular form in a concept name 2. use lower case letters for classes o instances and names should begin with capital letter o acronyms should be all upper case 3. observe syntax requirements of selected representation formalism o quotes, hyphens etc may be required or forbidden o unique names may be required by representation formalism 4. if there is a good English word, use it o otherwise concatenate not more than four words to describe the concept 5. when naming a subclass specialise superclass concept name o specialising text should be appended, not prepended o makes concept easier to recognize 6. add subclassifying criterion immediately when obvious 7. always provide aliases where known 17 - 20.08.2008 Review on naming convention documents Daniel Schober, EMBL-EBI These guidelines do not represent the main part of the paper and are merely a sidenote. They are quite in harmony with what the Foundry recommends, but some are not conventions more guidelines (e.g. 3.). Coverage is limited and the conventions are embedded in a paper of low visibility (in the keyword list for this paper naming is not mentioned). Other conventions, e.g. on instances are not currently found in the Foundry. Guideline for creating medical terms, Barbara Heller http://www.onto-med.de/en/applications/ontobuild/document/om-report-no4guideline.pdf It needs to be mentioned that the notions of concept and terms in this report stem from DIN 22/1992, and are not realism-based, in so far that terms are perceived as to denote concepts in the mind (idioms of thought) rather than universals in reality (page 6). The terminology used herein refers to classes as well as to relations with the word ‘concept’. The whole approach tackles thesauric semantics rather than description locics. The lexical guideline 3.1.1. recommends the noun forms for class names. 3.1.2. is a convention that is language dependent in that it recommends upper case for nouns as abundant in the German language but not in English. The Hyphen usage for multiple word terms is inspired from the German formalism as well. Some conventions are unintuitively labelled and hence very hard to find. A recommendation to use Singular terms is actually given under the header “3.1.3. Numbers”. Nevertheless this report provides many valuable guidelines on definition construction, on role and relation naming as well as on multiple term refactoring. 18 20.08.2008