Developing and Using Standards for Data and Information in Science and Technology John Rumble, Jr. Bonnie Carroll Gail Hodge Laura Bartolo PV 2005 Edinburgh Scotland The Reason for Standards • Economic: Savings in time, money, other resources PV 2005 Edinburgh Scotland The Reason for Standards • Economic: Savings in time, money and other resources • Intellectual superiority of a solution PV 2005 Edinburgh Scotland The Reason for Standards • Economic: Savings in time, money and other resources • Intellectual superiority of a solution • Codification and communication of knowledge PV 2005 Edinburgh Scotland The Reason for Standards • • • • Economic: Savings in time, money and other resources Intellectual superiority of a solution Codification and communication of knowledge Description of a structure of knowledge, methodology, data or information PV 2005 Edinburgh Scotland The Reason for Standards • • • • Economic: Savings in time, money and other resources Intellectual superiority of a solution Codification and communication of knowledge Description of a structure of knowledge, methodology, data or information • Accuracy in description PV 2005 Edinburgh Scotland Use of Standards for S&T Data and Information • Data Generation Capturing results & conditions • Database building Schema; input; uniformity • Data evaluation Assess quality; comparison • Database use Retrieval; interoperability • Data reporting Understandability • Data access Locating; cross-DB use • Data archiving What and how stored • Data exploitation Input into apps; auto-retrieval • Data visualization Inspection and analysis PV 2005 Edinburgh Scotland Standards • We want them! PV 2005 Edinburgh Scotland Standards • We want them! • Why are they so hard to get? PV 2005 Edinburgh Scotland Barriers to Greater Progress • • • • Nomenclature Linguistic Socio-economic Technical PV 2005 Edinburgh Scotland Nomenclature • S&T nomenclature arises from history Geography, education, scientific circle, language, conceptual differences, rivalry • How many areas of science have competing nomenclatures? PV 2005 Edinburgh Scotland Nomenclature Scientific knowledge evolves over time • What is appropriate to describe a substance, system or species yesterday is obsolete today • In science, increased knowledge is expressed as independent variables and how exactly they affect something PV 2005 Edinburgh Scotland Nomenclature Scientific knowledge evolves over time • What is appropriate to describe a substance, system or species yesterday is obsolete today • In science, increased knowledge is expressed as independent variables and how exactly they affect something • The concept of a gene has changed since Gregor Mendel • The explosion of variables used to describe it From the chromosomes on which it is located to the base pair sequence PV 2005 Edinburgh Scotland Nomenclature Scientific knowledge evolves over time • Experimental Science: routinely express our increase of knowledge as new independent variables (IVs) • Observation Science: We usually catalog features instead of IVs; will change over time • Computed Science: Virtually no effort to preserve on the basis of IVs PV 2005 Edinburgh Scotland Linguistic Beyond nomenclature, scientific language evolves • Exactly as with everyday language New words, changes of meaning, regionalization, prefixes and suffixes; grammar • Scientific languages follow the rule of linguistics PV 2005 Edinburgh Scotland Linguistic Beyond nomenclature, scientific language evolves • Exactly as with everyday language New words, changes of meaning, regionalization, prefixes and suffixes; grammar • Scientific languages follow the rule of linguistics • Change increases at intersection of disciplines “Creole” languages Language of quantum chemistry comes from chemistry (bonding) and atomic and molecular physics Mixed language differs from each field PV 2005 Edinburgh Scotland Socio-Economic Typical scientific practice hinders standards • Competitiveness • Striving for uniqueness • Constant clarification • Reluctance to repeat past experiments • Desire to use new techniques • My way should be the standard way! • Lack of economic motivation to create or use standards PV 2005 Edinburgh Scotland Socio-Economic “Basically physicists are too undisciplined to let anyone else tell us what to name something. It’s basically whatever name catches on.” Gordon Kane (U. Michigan) as quoted in the New York Times PV 2005 Edinburgh Scotland Technical Science is moving from reductionism to complexity • Real systems are complicated Contain many parts, components, items Large number of properties Larger number of independent variables • Consider standards for describing 9 6x10 people Countless objects in space 23 to 1028 molecules in a system 10 Millions of flora and fauna species PV 2005 Edinburgh Scotland Making Progress on Standards • With understanding comes knowledge • The barriers just discussed can be overcome Understanding the dynamic nature of S&T data and information is the key • Possible approaches Modeling Creating tiers Allowing change Self-definition Allowing change PV 2005 Edinburgh Scotland Modeling • • • • • Making better use of information modeling Formal tools are rarely used Too much is definitional: entities and attributes Relationships and dependencies are often overlooked Very labor intensive; especially if bringing together different points of view • Leads to stronger standards capable of being altered over time PV 2005 Edinburgh Scotland Creating Tiers Not all data and information need be at the same level • Core (prescriptive): Those items without which data and information is useless; as few as possible • Suggestive (Descriptive): Those items that if reported, should be reported in a certain manner • Other (Self-defining): Ways to report other items PV 2005 Edinburgh Scotland Creating Tiers Not all data and information need be at the same level • Core (prescriptive): Those items without which data and information is useless; as few as possible • Suggestive (Descriptive): Those items that if reported, should be reported in a certain manner • Other (Self-defining): Ways to report other items • Recognizing goals of system description: equivalency and uniqueness PV 2005 Edinburgh Scotland Creating Tiers Not all data and information need be at the same level • Core (prescriptive): Those items without which data and information is useless; as few as possible • Suggestive (Descriptive): Those items that if reported, should be reported in a certain manner • Other (Self-defining): Ways to report other items • Recognizing goals of system description: equivalency and uniqueness • Classifying independent variables: global and varying PV 2005 Edinburgh Scotland Allowing Change Must allow for addition of new information (metadata) • Decomposing an independent variable into two or more components • Adding new independent variables • Anticipating discovery of new knowledge • Standard developers are very reluctant to consider change PV 2005 Edinburgh Scotland Self-Defining • Language change and knowledge expansion over time – decades and longer – must be recognized • Including meaning with content increases chance that content can be interpreted correctly at a later date PV 2005 Edinburgh Scotland Allowing Change What is the goal of a S&T data and information standard? • Is it to establish a correct way of doing something? • Or is it a way to facilitate communicating a result generated at a certain time under certain circumstances? Scientific knowledge continues to grow • Today’s knowledge will likely become obsolete and be replaced • We want to preserve what we saw in the world when we saw it! PV 2005 Edinburgh Scotland Standards and Preservation Preservation supports • Documenting how we did science at one time • Future scientific discovery • Standards are critical for using the preserved record • We must understand better how standards interact with the dynamic nature of science to make them useful PV 2005 Edinburgh Scotland