Developing and Using Standards for Data and Information in Science and Technology

advertisement
Developing and Using Standards for
Data and Information in Science and
Technology
John Rumble, Jr.
Bonnie Carroll
Gail Hodge
Laura Bartolo
PV 2005 Edinburgh Scotland
The Reason for Standards
• Economic: Savings in time, money, other resources
PV 2005 Edinburgh Scotland
The Reason for Standards
• Economic: Savings in time, money and other resources
• Intellectual superiority of a solution
PV 2005 Edinburgh Scotland
The Reason for Standards
• Economic: Savings in time, money and other resources
• Intellectual superiority of a solution
• Codification and communication of knowledge
PV 2005 Edinburgh Scotland
The Reason for Standards
•
•
•
•
Economic: Savings in time, money and other resources
Intellectual superiority of a solution
Codification and communication of knowledge
Description of a structure of knowledge,
methodology, data or information
PV 2005 Edinburgh Scotland
The Reason for Standards
•
•
•
•
Economic: Savings in time, money and other resources
Intellectual superiority of a solution
Codification and communication of knowledge
Description of a structure of knowledge, methodology,
data or information
• Accuracy in description
PV 2005 Edinburgh Scotland
Use of Standards for S&T Data and
Information
•
Data Generation

Capturing results & conditions
•
Database building

Schema; input; uniformity
•
Data evaluation

Assess quality; comparison
•
Database use

Retrieval; interoperability
•
Data reporting

Understandability
•
Data access

Locating; cross-DB use
•
Data archiving

What and how stored
•
Data exploitation

Input into apps; auto-retrieval
•
Data visualization

Inspection and analysis
PV 2005 Edinburgh Scotland
Standards
• We want them!
PV 2005 Edinburgh Scotland
Standards
• We want them!
• Why are they so hard to get?
PV 2005 Edinburgh Scotland
Barriers to Greater Progress
•
•
•
•
Nomenclature
Linguistic
Socio-economic
Technical
PV 2005 Edinburgh Scotland
Nomenclature
• S&T nomenclature arises from history
 Geography, education, scientific circle, language,
conceptual differences, rivalry
• How many areas of science have competing
nomenclatures?
PV 2005 Edinburgh Scotland
Nomenclature
Scientific knowledge evolves over time
• What is appropriate to describe a substance, system or
species yesterday is obsolete today
• In science, increased knowledge is expressed as
independent variables and how exactly they affect
something
PV 2005 Edinburgh Scotland
Nomenclature
Scientific knowledge evolves over time
• What is appropriate to describe a substance, system or
species yesterday is obsolete today
• In science, increased knowledge is expressed as
independent variables and how exactly they affect
something
• The concept of a gene has changed since Gregor
Mendel
• The explosion of variables used to describe it
 From the chromosomes on which it is located to the
base pair sequence
PV 2005 Edinburgh Scotland
Nomenclature
Scientific knowledge evolves over time
• Experimental Science: routinely express our increase of
knowledge as new independent variables (IVs)
• Observation Science: We usually catalog features
instead of IVs; will change over time
• Computed Science: Virtually no effort to preserve on the
basis of IVs
PV 2005 Edinburgh Scotland
Linguistic
Beyond nomenclature, scientific language evolves
• Exactly as with everyday language
 New words, changes of meaning, regionalization,
prefixes and suffixes; grammar
• Scientific languages follow the rule of linguistics
PV 2005 Edinburgh Scotland
Linguistic
Beyond nomenclature, scientific language evolves
• Exactly as with everyday language
 New words, changes of meaning, regionalization,
prefixes and suffixes; grammar
• Scientific languages follow the rule of linguistics
• Change increases at intersection of disciplines
 “Creole” languages
 Language of quantum chemistry comes from
chemistry (bonding) and atomic and molecular
physics
 Mixed language differs from each field
PV 2005 Edinburgh Scotland
Socio-Economic
Typical scientific practice hinders standards
• Competitiveness
• Striving for uniqueness
• Constant clarification
• Reluctance to repeat past experiments
• Desire to use new techniques
• My way should be the standard way!
• Lack of economic motivation to create or use standards
PV 2005 Edinburgh Scotland
Socio-Economic
“Basically physicists are too undisciplined to let
anyone else tell us what to name something. It’s
basically whatever name catches on.”
Gordon Kane (U. Michigan) as quoted in the New York Times
PV 2005 Edinburgh Scotland
Technical
Science is moving from reductionism to complexity
• Real systems are complicated
 Contain many parts, components, items
 Large number of properties
 Larger number of independent variables
• Consider standards for describing
9
 6x10 people
 Countless objects in space
23 to 1028 molecules in a system
 10
 Millions of flora and fauna species
PV 2005 Edinburgh Scotland
Making Progress on Standards
• With understanding comes knowledge
• The barriers just discussed can be overcome
Understanding the dynamic nature of S&T data and
information is the key
• Possible approaches
 Modeling
 Creating tiers
 Allowing change
 Self-definition
 Allowing change
PV 2005 Edinburgh Scotland
Modeling
•
•
•
•
•
Making better use of information modeling
Formal tools are rarely used
Too much is definitional: entities and attributes
Relationships and dependencies are often overlooked
Very labor intensive; especially if bringing together
different points of view
• Leads to stronger standards capable of being altered
over time
PV 2005 Edinburgh Scotland
Creating Tiers
Not all data and information need be at the same level
• Core (prescriptive): Those items without which data
and information is useless; as few as possible
• Suggestive (Descriptive): Those items that if reported,
should be reported in a certain manner
• Other (Self-defining): Ways to report other items
PV 2005 Edinburgh Scotland
Creating Tiers
Not all data and information need be at the same level
• Core (prescriptive): Those items without which data and
information is useless; as few as possible
• Suggestive (Descriptive): Those items that if reported,
should be reported in a certain manner
• Other (Self-defining): Ways to report other items
• Recognizing goals of system description:
equivalency and uniqueness
PV 2005 Edinburgh Scotland
Creating Tiers
Not all data and information need be at the same level
• Core (prescriptive): Those items without which data and
information is useless; as few as possible
• Suggestive (Descriptive): Those items that if reported,
should be reported in a certain manner
• Other (Self-defining): Ways to report other items
• Recognizing goals of system description: equivalency
and uniqueness
• Classifying independent variables: global and varying
PV 2005 Edinburgh Scotland
Allowing Change
Must allow for addition of new information (metadata)
• Decomposing an independent variable into two or more
components
• Adding new independent variables
• Anticipating discovery of new knowledge
• Standard developers are very reluctant to consider
change
PV 2005 Edinburgh Scotland
Self-Defining
• Language change and knowledge expansion over time –
decades and longer – must be recognized
• Including meaning with content increases chance that
content can be interpreted correctly at a later date
PV 2005 Edinburgh Scotland
Allowing Change
What is the goal of a S&T data and information
standard?
• Is it to establish a correct way of doing something?
• Or is it a way to facilitate communicating a result
generated at a certain time under certain circumstances?
Scientific knowledge continues to grow
• Today’s knowledge will likely become obsolete and be
replaced
• We want to preserve what we saw in the world when we
saw it!
PV 2005 Edinburgh Scotland
Standards and Preservation
Preservation supports
• Documenting how we did science at one time
• Future scientific discovery
• Standards are critical for using the preserved record
• We must understand better how standards interact
with the dynamic nature of science to make them
useful
PV 2005 Edinburgh Scotland
Download