DATA SHARING ISSUES, METADATA, ARCHIVES, AND COMPREHENSION • Urgency • NEESgrid (

advertisement
DATA SHARING ISSUES, METADATA,
ARCHIVES, AND COMPREHENSION
• Urgency
• NEESgrid (www.neesgrid.org) schedule:
– Characterize the Earthquake Engineering community
use of data and metadata: January 2002.
– Distribute preliminary metadata standards: May 2002.
– Publish standards for data and metadata models and
representations by September 2002
– (Prudhomme and Mish, 2001).
• Consortium Developer of NEES www.nees.org
– Working groups on data issues: looking for interested
volunteers
Identify/define uses of data and metadata
•
•
•
•
To help me remember what I did last time
To permit other researchers to duplicate test
Real time remote PI interaction
To allow numerical simulation
– Interactive decision making during experiment
– Years after the test
• Automated control of the experiment
• Visualization
– Research and education, sponsors
• Data search/query filter
• Artificial Intelligence, inverse/system identification
• Software sharing by common interface opensees
Use of data
• Data search/query filter
• Artificial Intelligence, inverse/system
identification
• Software sharing by common interface
opensees
Experience of geotech
community
•
•
•
•
•
CWRU database on element tests
VELACS USC
COSMOS and IRIS
PEER structures data bases UCSD, UW
UCD cgm.engr.ucdavis.edu
Other community examples
• Atmosphere/ocean research NCAR, NOAA, Navy
– Example of flux vector interchanged between programs
– User specific API to interface with “black box”
– CORBA – Common Object Request Broker
Architecture. A spec for an “object that may be
accessed by many platforms – java, fortran, etc.
• Fluid flow
–
–
–
–
Visualization code runs with solver
Open GL
Generic flux vector
Connection of mismatched meshes (regular and
scattered.)
– Meshing experimental data with numerical data.
Data use and format
• Think ahead for uses
– Needs assessment
– Format changes
– Visualization of large data sets is demanding
• What is data ?
• Format
– Access tools input and output
– Don’t store twice because it is in different format
(calibration?)
Formats, coding
• Oracle
• Flat ASCII
• XML
What are benefits of standardization?
• Knowledge of data format at one facility is
transferable to others.
– E.g., numerical simulation of tests at CWRU,
UCD.
– Training of experimenters may transferable.
• User interfaces to databases may be sharable;
so, maybe we will not have to each develop
the interfaces independently.
– Search, query, automated IO, visualization……..
Barriers to standardization and how to
overcome them
• Need a “killer app” that assumes a standard
• The gap between Civil Engineering and
Information Technology.
“Killer App” features
• To help me remember what I did last time:
automated metadata documentation
• To permit other researchers to duplicate test
• Real time remote PI interaction- teleparticipation
• To allow numerical simulation
– Interactive decision making during experiment
– Years after the test
• Automated control of the experiment
“Killer App” features(2)
• Visualization
• Data search/query/access/filter
• Web portal - for all of the above?
Metadata Design
• Determine the structure of metadata to
optimize
–
–
–
–
Intuitive query language
Readable to computers and humans
Completeness without redundancy
Flexibility and Evolution
• Curation by NEES SI and Consortium
• Write code- XML document type definitions
Strawman metadata structure
1. Project Identifiers
2. Catalog of Materials, Objects, Sensors and
Apparatus
3. Sequence of Model Test Events and
Measurements
4. Sensor Channel Gain Lists (1)
5. Image Data
6. Control Data Files
Discussion Items
• Philosophical issues related to culture of
data sharing?
– Data producer should get first shot at
publication
– How long should we allow a data generator to
ponder before other people can have access?
– How do we publish electronic data?
– Give academic credit to data publishers,
XML
<ModelTest>
<Catalog>
<Sensors>
<Sensor SN="PCB3245">
<Type>Piezoelectric Accelerometer</Type>
<Manufacturer>PCB</Manufacturer>
<Model>352</Model>
<CalibrationDate>092899</CalibrationDate>
<Sensitivity Unit="mV/g">100</Sensitivity>
<Range>50g</Range>
<SensorData> http://www.pcb.com/pcb3245
</SensorData>
</Sensor>
</Sensors>
</Catalog>
There must be nice interfaces to complex data structures.
Automatic metadata generator should do most of the work.
TEDS (Transducer Electronic Data Sheets), SCEDS,
automated geometry definition will make the job do-able.
Discussion Items
• At what metadata level do we refer to other
archives instead of re-archiving?
Example:
– Accelerometer amplifier gain for each test
event archive
– Accelerometer calibration in the test archive
– Date and method of calibration in facility
archive
– Cross-axis sensitivity at manufacturers archive
Strawman metadata hierarchy
• Section 1 of the outline in Table 1 contains
metadata associated with the research project.
• Section 2 is a catalog of physical objects used to
construct or test the model. This includes:
apparatus used to test the model, passive materials
and markers that are placed in the model, and
sensors that are used in the model tests.
Strawman metadata hierarchy
• Section 5 describes image data. This could
include photographs, video camera data, and/or
engineering drawings of configuration.
• Section 6 describes the data required to control the
experiment. This could determine the location of
a CPT sounding, the rate of penetration of a
penetrometer, or command files to control a
shaker.
Strawman metadata hierarchy
• Section 3 describes sequencing of events. A
sequence can be the measurement of the location
of an object, or an event involving activation of an
actuator or a penetrometer sounding.
• Section 4 includes the sensor-channel-gain lists;
this documents which sensors are plugged into
which amplifier channels, and also includes the
sequence in which the sensor data was recorded,
and parameters that define gains and filters.
CAD of geometry and instrument location numbers
Printable version of report (pdf) describing
experiment and automatically generated data time
histories
Excel spreadsheets of metadata
ASCII data files of sensor readings
during about 90 simulated
earthquakes (about 1 MB each)
Excel spread sheet describing
calibration factors, amplifier channel
numbers, gains, data file format, ...
Event BV, page 3 of pdf document - semiautomatic plot
generation using MathCAD program, central vertical array
of accelerometer data
Other
site 2
Site
A
Site
B
Site
C
Site
Council
System
Integrator
NEESgrid
Simulation/Experimental Facilities
Other
site 1
NEES Collaboratory
Earthquake
Researchers
NEES
Consortium
Development
Educators &
Students
Other
Practitioners
Professional
Engineers
UC Davis Research Network
3D
visualization
machine_1
3D
visualization
machine_2
SGI
16 processor
Parallel computer
Imaging
NEES
SGI image
processor
GSR_1
80 PC cluster
GSR_2
OXC
Prototype
OLS router
OXC
Prototype
OLS router
OXC
To Berkeley, SantaCruz
To Sacramento, Merced
HDTV
Camera
Environmental
Monitoring
1651 mm
DKS02
A22
A13
Z
Transverse
array
A28
558 mm
Dry Nevada sand
Dr ~ 100%
Ins ide c ontainer width = 787 m m
1762 mm
5 mm cover sand
A20
A25
DKS03
A10
474 mm
Transverse
array
Dry Nevada Sand
Dr ~ 98%
Ins ide c ontainer width = 904 m m
4 mm cover sand
DKS04
Transverse
array
549 mm
Saturated Nevada
Sand, Dr ~ 102%
Dry Nevada sand
Dr ~ 100%
Concrete Basin
accelerometer
pore fluid pressure
DKS05
Transverse
array
245 mm
displacement
shaking
air hammers
Download