NEESgrid Data Overview Charles Severance Goals Operates both as local and central (curated) repository using same toolset Object oriented - referential integrity - data and metadata are supported together Data migration and replication transparently including security, and metadata Uses Grid Services throughout (OGSA/GT3) Easily used access controls - Grid as “single sign on” Data provenance Data in support of research publication Support for repeatable experiments Data oriented research computation support Support for workflow Vision: data on the Grid Data Meta Data Gathering Extracting Repository Mapping studies NEESgrid Data – Core Elements Local Repository Central Repository JAVA APIs – Run locally on the same system as a repository or over OGSA Web Services – NEES File Management Services – NEES Meta Data Services Data Viewers – Streaming (numeric, X/Y graph) – Stored (X/Y graph, 2-D structure, video) Core Elements Data Acquisition NEESpop API Data/MD Ingest Tools Local Repository Workstation API Data Teamlets NEESdata Data tools API Data viewers Data Teamlets Central Repository NEESgrid Data – Technologies Grid – GRIDFTP is used for data transport – Grid Web Services are used to insure security and provide access control between systems over the Internet – also provide for credential passthrough – Grid credentials are used as part of login providing a single sign-on framework CHEF – Provides a flexible mechanism for deploying GUI tools like the data viewers and data browsers. A Simple Experimental Scenario Developer System Labview Test Specimen DAQ System Glue Researcher System A Simulation Scenario Developer System Simulation System Simulation System Simulation System Code Code Code The MOST Scenario Part of the run-up to NEESPop 2.0 – – – – – Used Beta of NEESpop and Beta of CHEF Tested the data ingestion Tested the metadata capabilities Developed sample metadata Tested mapping capabilities System still available at https://cee-nees.cee.uiuc.edu/chef/ Colorado Test Specimen LabView DAQ MatLab Host And Real-Time Target Control System UIUC/Newmark Test Specimen Sim Controller Shore-Western LabView DAQ UIUC Matlab NEESPop (1.1) NSDS NTCP CO Incoming FTP NTCP Ingest Repository NCSA CO UIUC NCSA Meta NSDS NEESPop (1.1) NTCP UIUC NEESMost (Win XP) Matlab Computational Model Site / Location Computer Ingest Incoming FTP NEESPop (2.0) NCSA Ingest Ftp NTCP Process Series of files Complete file (aggregated) MOST Data Flows Wires NFMS/NMDS Plug In NSDS File I/O DAQ 0 5 5 0 0 2 3 3 3 6 3 3 4 4 8 The experiment was viewed using the standard NEES stored viewer with synchronized video and data and the ability to move back and forth NEESpop 2.0 Alpha Metadata Time 0.00 0.01 ch01 ch02 3 6 4 8 DAQ 0 3 4 0 6 8 <experiment> <blah> <public-view> </experiment> University of Minnesota 0 3 4 0 6 8 <experiment> <blah> <public-view> </experiment> NEES Markup Language (NEESML) Provides an RDF-like structure capable of representing semantic information – XML is the syntax which is used – Logic is more “object oriented” • Can define objects • Can create objects • Can reference objects RDF/XML Versus NEESML NEESml is topologically equivalent to RDF but more straightforward to use – A compromise between usability and functionality – Focused on solving the problems of ingesting types and data – rather than “cross-server ontology webs” – Used to build a reference set of ingestion tools – RDF is a moving target Repository does not store either RDF or NEESML – It is an relational database tuned to store “three-tuples” There is a layer is where we develop tools which take advantage and begin to depend on of the “meaning” of the data – where we begin to depend on the meaning of a second. The Slide Data Ingestors m e a s u re m e n t d is ta n c e ra te e s tim a te m ile n u m e ra to r u n it tra v e l ra tio d e n o m in a to r m ile a g e m odel car c o n s u m p tio n v o lu m e g a llo n v e h ic le flu id gas e ffic ie n c y fu e l Metadata Data Viewers Data Mappers Data Where we make a viewer capable of viewing a certain type of object. This is where we build things which make use of knowledge. This layer will never be complete but it is a large focus of the coming months. Looking for Data Models… Looking for Semantics … Strategies Take Existing Data Models and Adopt Build tools and data models at the same time Find existing tools that produce types of data we find interesting Build meta data extractors for important file formats Build converters for things like Excel Why start with the ORST Model? It provides good coverage of a wellunderstood scope It has consensus across multiple sites A prototype toolset exists around the model which is an important validation http://nees.orst.edu/IT/data.model/ We can adopt the core elements and extend as necessary http://nees.orst.edu/IT/data.model/docs/v1.3a.july16.pdf QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTim e™ and a TIFF (LZW) de co m press or are ne ede d to s ee th is picture. QuickTime™ and a TIF F (LZW) decompressor are needed to see this picture. o:project specialCondions title startDate o:experiment[s] o:role[s] o:acknowledgement[s] o:experiment <type id="o:project" title=”Project"> <specialConditions title=”Special Conditions"/> <title title=”Title"/> <startDate title=”Starting Date"/> <o:experiment allow="o:experiment” max="unbounded" /> <o:role allow="o:role” max="unbounded" /> <o:acknowledgements allow=“o:acknowledgement” max=“unbounded” /> </type> <type id="o:experiment" title=”Experiment"> <status title=”Status"/> <title title=”Title"/> <shortDescription title=”Description"/> <o:facility allow="o:facility” /> </type> <type id="o:experiment" title=”Experiment"> <title title=”Title"/> <shortDescription title=”Description"/> <longDescription title=”Description"/> </type> status title o:facility shortDescription title o:facility shortDescription longDescription Go Forward – Core Elements Implement Access Control Implement Replication Investigate RDF and its relationship to NEESML Investigate provenance – would like to adopt from another project Investigate mapping – Would like to adopt from another project Go Forward - Tools Evaluate the ORST interface and use it to implement experiment-based interface to meta data repository Investigate tools to represent structural data (like SAC data) Extend and improve viewers – publish API so that sites can extend the viewers Improve notebook – Single signon using CHEF/Grid credentials – Integration with Metadata – Smother integration with CHEF Explore automated synchronized video and data capture and after-experiment replay of synchronized video and data (ORST UMinn) Explore the capture of high quality still images as data (UMinn) Investigate adopting a data-editing tool (XMLSpy) Go Forward – Data Models Analyze the ORST model, determine core, convert to NEESML, pre-populate repositories with types, and develop usage documentation Form core group between SI, ES, and CS to push data model issues forward – once groundwork is better defined – we can disperse into distributed teams Use experiment based deployment to help us encounter new data needs over time How to prioritize model exploration and development Focus on the following areas: – Areas where we have or are building tools – Areas where we already have incoming data in some format – Build the model through experiment based deployment - solve real problems in an open way and see if (with some adaptation) the solutions apply more broadly (i.e. Minnesota ) What is in Release 2.0? October 7 Groovy look and feel Local Data Repository Repository Browser in CHEF – Improved visually – Configured by XML – Can read data from repository or from urls – Pre-populated with sample video and data formats – Browse – Create objects – Upload / download data API documentation NEESML User Documentation Extensible data mapping in Java Data Viewer in CHEF Local Repositories prepopulated with – SAC Data – MOST Data – ORST data model (subset) NEESML 1 Introduction NEESML is a means for populating a NEESgrid metadata repository with objects and definitions of object types. It provides a syntax for defining object types and specifying objects, the values of their properties, and the relationships between objects. The NEESgrid metadata ingestion tool accepts NEESML files and uses them to populate a repository with objects and type definitions, communicating with the repository with the NEESgrid metadata service (NMDS) interface. Table 1: Primitive types in NEESML Name Description string Text int Integer long Long integer. Can exceed the size of an integer. double date Double precision floating point number. A moment in time, represented as a date and time stamp in UTC with 1ms resolution. NEESML can be used to share type definitions and objects between repositories, and as an interface between other applications and repositories, because applications can either be written to read and write NEESML files, or can be augmented with translation utilities that translate the data formats they can read or write to and/or from NEESML. Finally, NEESML can be used to archive metadata to files. Examples “Hello, world.” This document is 493-2584x” intended as a comprehensive reference manual for the NEESML “BN# language. It does not explain how to use NEESML to represent real-world objects, but 1.1 About this document rather completely specifies the syntax and meaning of every NEESML construct. Using this document, you will be able to write NEESML documents that conform to the proper syntax, and will be able to understand syntax errors and other problems with nonconforming NEESML documents or document fragments. 3 -2 For a more general introduction to NEESML, please refer to (Futrelle, 2003). 2147483647 Some NEESML-specific terminology is used in this document. The first time every such term is used, it is italicized. Definitions of all special terms are given in the glossary in section 7. -5782347562427 9223372036854775807 NEESML is a general-purpose metadata description language. Throughout this document, examples are used that do not directly pertain to earthquake engineering. This is done merely in the interest of simplicity. All the constructs used in the examples can be applied to metadata describing earthquake engineering experiments and simulations. 523425.4568574636 -0.0000000435234 NEESML documents are XML documents. There are some aspects of XML that 1.2 Some XML “gotchas” constrain the syntax of NEESML documents in important ways: 2002-10-27 XML element and attribute names are case-sensitive. “ID” is not equivalent to “id”, and “MyTypeName” is not equivalent to “mytypename”. 15:40:32.048 Namespaces, if they are used as relation ID’s or the ID’s of types for which objects are created, must be declared in an enclosing element. 1969-01-12 Some characters are not allowed in element names. It is common practice to only 00:03:48.774 use alphabetic characters and dashes. Colons and other punctuation are not allowed. For details on XML syntax, consult the XML specification (Bray, Paoli, SperbergMcQueen & Maler, 2000) or a good XML reference or tutorial. Repository Browser Ingestor NFMSUploadAgent.java API Documentation Configuring Events <event id="oregon" desc="Oregon Large Tank Test September 8, 2003" host=“/chef/org.nees.repo.data/retrieve-data?lfn=nacse_sample_01.txt&amp; static=yes&amp;mapping=nacse-" type="stored"> <channel id="00" desc="time" unit="Seconds" url="t" /> <channel id="01" desc="Offshore Wave Gauge" unit="" url="02" /> <channel id="02" desc="Wave gauge at the front face of the cylinder" unit="" url="03" /> <channel id="03" desc="Wave gauge at the back face of the cylinder " unit="" url="04" /> <channel id="04" desc="Pressure at the front face of the cylinder " unit="" url="05" /> <video id="01" desc="Video of cylinder" url=“/chef/retrieve-data/static/nacse_sample_01.avi" /> </event> <type id="nees:storedEvent"> <data allow="io:file"/> <url/> <parseOptions/> <storedTimeChannelsmin="1" max=“1" allow="nees:storedTimeChannel"/> <storedDataChannels min=“0" max="unbounded" allow="nees:storedDataChannel"/> <storedVideoChannels min=“0" max="unbounded" allow="nees:storedVideoChannel"/> <storedStruct2DChannels min=“0" max="unbounded" allow="nees: stored2DStructureChannel "/> <fileChannels max="unbounded" allow="nees:fileChannel"/> </type> We may be able to get a patch out to switch this to NEESml and provide a simple entry tool. Mappings and the Data Viewer NSDS (ISO 8601 Time channel) Column data with time recorded as a column Column – generate time Column – generate time – trigger filter Channel units: g,g,in,kip Time ATL1 ATT1 2002-11-13T15:48:55.26499 -0.006409 0.004272 2002-11-13T15:48:55.36499 -0.005798 -0.003662 100.000 0.435 0.161 -1.016 -0.981 0.430 0.161 -1.016 -0.977 0.435 0.161 -1.016 -0.977 public class NEESDataMap { public static boolean repoMap(File mainFile, File mappingFile, String mapping) { // Code here } } Release 2.1 Data Aspects December 2003 NEESpop – – – – Notebook to metadata repository connection made Closer integration of notebook into CHEF First release of experiment tool (based on ORST) Retool data viewers to be completely driven by Metadata objects rather than their own objects – More fine grained access control – Enhanced data models Tools – Ingestion tools released – A limited set of pre-release video/image tools Further releases Release 2.2 – March 04 – Driven by your needs as we encounter them – Perhaps some “nice to haves” from the SI team Release 3.0 – June 04 – Very limited new functionality – maybe almost nothing new in the core components of the NEESpop