Professor Deborah McGuinness
TA – Weijing Chen
Other lectures from Professor Joanne Luciano, grad student Jim McCusker, and possibly others from http://tw.rpi.edu/web/People
CSCI 6962 - 01, 86933 , CSCI 4969 - 01, 87927
ITWS 6960 - 01, 87198 , ITWS 4969 - 01, 87928
Week 1, initially August 29, 2011
Moved because of Hurricane Irene to Wednesday August 31, 2011
1
• Class:
– CSCI 6962 - 01, 86933CSCI 4969 - 01, 87927
– ITWS 6960 - 01, 87198, ITWS 4969 - 01, 87928
• Hours: 1pm-3:50pm Mondays (except after
Columbus day)
• Class Location: Winslow 1140
• Instructors: Deborah McGuinness, TA Weijing Chen,
Guests: Joanne Luciano, Jim McCusker
• Contacts: dlm@cs.rpi.edu
, chenw8@rpi.edu
, jluciano@cs.rpi.edu
, mccusj@rpi.edu
• Contact locations: Winslow 2104 (DLM), 2143 (JSL)
2
• Titanpad – this week http://twc.titanpad.com/147
• Scribe for each class – this week Weijing
• After class – scribe copies notes over to the class page
http://tw.rpi.edu/web/Courses/SemanticeScience/2011
• You will need an account on our site so that you can upload your homeworks and presentations – contact Patrick West – who is in class
• See http://tw.rpi.edu/web/Help/UploadLinkToMedia for uploading instructions 3
• It's just a matter of adding a tag to the body of the drupal page: <document href="SemanteScience2011Assignment00.pdf" alt="Semantic eScience 2011 Assignment 00"/>
• When you save the page, next to the title, you'll see an Upload link. Click on that, upload the document, and when you click "Upload" the page will be changed from an Upload link to a
Download link.
• To upload a new version of the document go to
4 http://tw.rpi.edu/media/submit.php
• Who are we?
• Who are you?
• Why are you here?
• What do you want to get out of the class?
• Will you make the class (on time) each week and do you have any other conflicts or issues we should know about?
5
In the Earth and space sciences and elsewhere, ready and open access to the vast and growing collections of cross-disciplinary digital information is the key to understanding and responding to complex Earth system phenomena that influence human survival.
We have a shared responsibility to create and implement strategies to realise the full potential of digital information and services for present and future generations.
*
Adama Samassekou, Convener of the UN World Summit on the Information Society
• What do you think we need to address to start to realize the vision on the previous viewgraph?
7
• Outline of the course
• Background
• e-Science
• Examples
• Informatics
• Semantics
• Elements of Semantic e-Science (SeS)
• What we expect
• Logistics summary
8
• Topics for Semantic e-Science/ Foundations:
– Semantic Methodologies
– Knowledge Representation for e-Science
– Ontology Engineering and Re-Use for e-Science
– Knowledge Integration for e-Science
– Semantic Data Integration
– Semantic Web Languages, Tools and Services
– Semantic Infrastructure and Architecture for e-Science
– Semantic Grid Middleware
– Ontology Evolution for e-Science
– Knowledge Management for e-Science
– e-Science Workflow Management
– Data life-cycle for e-Science
– Data Mining and Knowledge Discovery
9
People (scientists) should be able to access a global, distributed knowledge base of (scientific) data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means, using various protocols, in differing vocabularies, using
(sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed
And… there often exists significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…
10
What do we need to achieve Semantic eScience?
(in-class brainstorming exercise (2010)) organization, leadership, management strategies, roles and assignment of roles dissemination strategy communication of ideas
- machine level
- human level conflict resolution cross-disciplinary collaboration flexible adaptable, feedback extensible ability to filter information usage/application of resources, optimization facts, knowledge (domain knowledge) context, domain, scope goals, use cases metadata - data to describe data ability to link information ability to understand information ability to capture and represent conflicting ideas provenance - where data come from trust - reliable ability to capture intent (humanitarian aspect / responsibility) credibility of information interesting and appealing standardization education and outreach methods and metrics criteria for evaluation
Modern information and communications technologies are creating an
“interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system:
• managing and accessing large data sets
• higher space/time resolution capabilities
• rapid response requirements
• data assimilation into models
• crossing disciplinary boundaries.
12
products have
More Strategic
Less Strategic
SCIENTISTS TOO
13
From “Why EPO (Education and Public Outreach)?”, a NASA internal report on science education, 2005
Fox CI and X-informatics - CSIG 2008, Aug 11
14
• Emphasis is on Science
• Original narrative: One of the key drivers behind the search for such new scientific tools is the imminent deluge of data from new generations of scientific experiments and surveys (*). In order to exploit and explore the petabytes of scientific data that will arise from these high-throughput experiments, supercomputer simulations, sensor networks, and satellite surveys, scientists will need assistance from specialized search engines, data mining tools, and data visualization tools that make it easy to ask questions and understand answers. To create such tools, the data will need to be annotated with relevant "metadata" giving information as to provenance, content, conditions, and so on; and, in many instances, the sheer volume of data will dictate that this process be automated.
Scientists will create vast distributed digital repositories of scientific data requiring management services similar to those of more conventional digital libraries, as well as other data-specific services. The ability to search, access, move, manipulate, and mine such data will be a central requirement for this new generation of collaborative science software applications. Hey and Trefethen, 2005
15
• Thousand years ago: science was empirical describing natural phenomena
• Last few hundred years: theoretical branch using models, generalizations
• Last few decades: a computational branch simulating complex phenomena
• Today: data exploration (eScience) synthesizing theory, experiment and computation with advanced data management and statistics
new algorithms!
.
a a
2
4
G
3 c
2
a
2
1000 • Scientific data doubles every year
– caused by successive generations of inexpensive sensors + exponentially faster computing
100
10
1
1970 • Changes the nature of scientific computing
• Cuts across disciplines (eScience)
• It becomes increasingly harder to extract knowledge
• 20% of the world’s servers go into huge data centers by the “Big 5”
– Google, Microsoft, Yahoo, Amazon, eBay
• So it is not only the scientific data!
1975
0.1
1980
1985
1990
1995
2000
CCDs Glass
• Very extended distribution of data sets: data on all scales!
• Most datasets are small, and manually maintained (Excel spreadsheets)
• Total amount of data dominated by the other end
(large multi-TB archive facilities)
• Most bytes today are collected via electronic sensors
• Where are discoveries made?
– At the edges and boundaries
– Going deeper, collecting more data, using more colors….
• Metcalfe’s law
– Utility of computer networks grows as the number of possible connections: O(N 2 )
• Federating data (the connections!!)
– Federation of N archives has utility O(N 2 )
– Possibilities for new discoveries grow as O(N 2 )
• Many examples
– Sky surveys – galaxy zoo… Very early discoveries from SDSS, 2MASS, DPOSS
– Genomics+proteomics
– Alzheimers article in reading
FTP and GREP are not adequate
• You can GREP 1 MB in a second
• You can GREP 1 GB in a minute
• You can GREP 1 TB in 2 days
• You can GREP 1 PB in 3 years
• You can FTP 1 MB in 1 sec
• You can FTP 1 GB / min
(~1 $/GB)
• … 2 days and 1K$
• … 3 years and 1M$
• Oh!, and 1PB ~4,000 disks
• At some point you need indices to limit search parallel data search and analysis
• This is where databases can help
• Take the analysis to the data !!
• As a result of finding out who is doing what,
Informatics - information science includes the sharing experience/ expertise, and science of (data and) information, the practice substantial coordination: of information processing, and the engineering communicate (data and) information. It also visualization and other computing and information processing services over the
21
• Innovation will come from other parts of the world other than the U.S.
• The Chinese have skipped the Internet first generation.
• Growth will occur in Asia, and continue to decrease in Western Europe.
• U.S. Industry is compulsively outsourcing abroad.
• Software is moving from forms-based applications to business processes.
• Networks are migrating to IP and optical networking technologies.
• Data curation and storage
• Federated access
• Collaboration
• New uses in High Performance Computing
• Databases
• Web servers, services (software as service)
• Wiki
• Visualization
• All discipline neutral
Semantic Web Methodology and
Technology Development Process
• Establish and improve a well-defined methodology vision for
Semantic Technology based application development
•
Leverage controlled vocabularies, etc.
Rapid
Open World:
Evolve, Iterate,
Prototype
Redesign,
Redeploy
Leverage
Technology
Infrastructure
Adopt
Technology
Approach
Science/Expert
Review & Iteration
Use Tools
Evaluation
Analysis
Use Case
Small Team, mixed skills
Develop model/ ontology
24
• Water Quality Portal Example from 2010
• http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal
25
Make data and tools quickly and easily accessible to a wide audience.
Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated
Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage -> thus part IT, part CI, part
Informatics and all about doing new science
26
Added value Education, clearinghouses, other services, disciplines, et c.
Semantic mediation layer - midupper-level
Semantic interoperability
VO Web
Portal Added value Serv.
VO
API
Added value
Semantic query,
Mediation Layer hypothesis and inference
• Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and
Semantic mediation layer - VSTO - low level
Classes
• Maps queries to underlying data
Metadata, schema,
Query, access and use of data
DB n
DB
1
DB DB
3 … … … …
27
Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity .
– Extract information from the use-case - encode knowledge
– Translate this into a complete query for data - inference and integration of data from instruments, indices and models
Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere-
Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.
28
Inferred plot type and return required axes data
29
• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time
• Decreased input requirements for query: in one case reducing the number of selections from eight to three
• Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics
• Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services)
• Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, and exposed as smart web services
– understanding of coordinate systems, relationships, data synthesis, transformations, etc.
– returns independent variables and related parameters
• A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)
30
More Strategic
Less Strategic
From “Why EPO?”, a NASA internal report on science education, 2005
31
Someone should be able to query a virtual observatory without having specialist knowledge Teacher accesses internet goes to An Educational Virtual
Observatory and enters a search for “Aurora”.
32
Teacher receives four groupings of search results:
1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.ph
p and http://www.meted.ucar.edu/hao/aurora/
2) Research, data and tools: via research VOs but the search for brightness, or green/red line emission is mediated for them
3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights
4) Did you mean?: Aurora Borealis or Aurora
Australis, etc .
33
Fox CI and X-informatics - CSIG 2008, Aug 11
34
Fox CI and X-informatics - CSIG 2008, Aug 11
35
• Water Quality Portal Example from 2010
• http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal
• Came from hw assignment, proposed in class
• Generated papers in
– Environmental Information Management 2011
– Intl Semantic Web Conference 2011 (main conference and possibly poster session as well)
– American Geophysical Union 2011
– Plus invited presentations for water, health, etc.
36
• The triple : { subject -predicateobject }
Interferometer is-a optical instrument
Optical instrument has focal length
An ontology is a representation of this knowledge
• W3C is the primary (but not sole) governing organization for languages, specifications, best practices, et c.
– RDF - Resource Description Framework
– OWL 1.0 - Ontology Web Language (OWL 2.0 on the way)
• Encode the knowledge in triples, in a triple-store, software is built to traverse the semantic network, it can be queried or reasoned upon
• Put semantics between/ in your interfaces, i.e. between layers and components in your architecture, i.e. between ‘users’ and
‘information’ to mediate the exchange
37
• Semantic Web
– An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation, www.semanticweb.org
– Primer: http://www.ics.forth.gr/isl/swprimer/
• Semantic Grid
– Semantic services to use the resources of many computers connected by a network to solve large scale computational/ data problems
• Provenance
– origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility.
• Service-oriented architecture
– Provision of a capability over the internet via a ‘remote-procedure-call’ using prescribed input, output and pre-conditions
• Ontology (n.d.). The Free On-line Dictionary of Computing. http://dictionary.reference.com/browse/ontology
– An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.
38
• Closed World - where complete knowledge is known (encoded), AI relied on this
• Open World - where knowledge is incomplete/ evolving, SW promotes this
• Languages
– OWL - Web Ontology Language (W3C)
– RDF - Resource Description Framework (W3C)
– OWL-S/SWSL - Web Services (W3C)
– WSMO/WSML - Web Services (EC/W3C)
– SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format
– PML - Proof Markup Language
– Editors: Protégé, SWOOP, Medius, SWeDE, …
• Reasoners
– Pellet, Racer, Medius KBS, FACT++, fuzzyDL, KAON2, MSPASS, QuOnto
• Query Languages
– SPARQL, XQUERY, SeRQL, OWL-QL, RDFQuery
• Other Tools for Semantic Web
– Search: SWOOGLE swoogle.umbc.edu
– Collaboration: www.planetont.org
– Other: Jena, SeSAME/SAIL, Mulgara, Eclipse, KOWARI
– Semantic wiki: OntoWiki, SemanticMediaWiki
• Emerging Semantic Standards for Earth Science
– SWEET, VSTO, MMI, GeoSciML
39
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
40
• Smart search
• Annotation (even simple forms), smart tagging
• Geospatial
• Implementing logic (rules), e.g. in workflows
• Data integration
• Verification …. and the list goes on
• Web services
• Web content mining with natural language parsing
• User interface development (portals)
• Semantic desktop
• Wikis - OntoWiki, SemanticMediaWiki
• Sensor Web
• Software engineering
• Explanation
41
Semantic
Web
Services
Semantic
Wiki
Smart search, e.g. NOESIS
Rules/Logic,
SWRL
Query
Lang,
OWL-QL
SKOS,
FOAF
OWL 1.1
Natural Language
Ontologies
Query Lang, Commercial and embedded QL Managing modular ontologies
(ES and general)
Technology trigger
Query Lang,
SPARQL
Tagging / annotation
Species
Validators
Peak of
Inflated
Expectations
Ontology editor,
SWOOP
Mid-level ES domain ontologies, e.g
GEON
Upper level ontologies, e.g
ABC, DOLCE,
SUMO
Trough of
Disillusionment
OWL 1.0
Concept map, Cmap
Protégé
RDF
DL Reasoners, e.g. Pellet, Racer
Mid-level ES domain ontologies, e.g SWEET
Slope of
Enlightenment
Triple stores, e.g.
Jena, Sesame,
Mulgara, Oracle
Spatial
XML
Estimated years to mainstream adoption in Earth science
< 2 years
2-5 years
5-10 years
> 10 years
Obsolete before plateau
Plateau of
Productivity
Produced for NASA TIWG semantic web subgroup
April 2008
Improved
Information
Sharing
Increased
Collaboration &
Interdisciplinary Science
Acceleration of
Knowledge
Production
Revolutionizing how science is done
Geospatial semantic services established
Some common vocabulary based product search and access
Geospatial semantic services proliferate
Semantic geospatial search & inference, access
Local processing + data exchange
Basic data tailoring services
(data as service), verification/ validation
Scientific semantic assisted services
Semantic agentbased searches
Interoperable geospatial services
(analysis as service), results explanation service
SWEET core
1.0 based on
GCMD/CF
SWEET core 2.0 based on best practices decided from community
SWEET 3.0 with semantic callable interfaces via standard programming languages
Autonomous inference of science results
Semantic agentbased integration
Metadata-driven data fusion
(semantic service chaining), trust
Reasoners able to utilize
SWEET 4.0
RDF, OWL,
OWL-S
Geospatial reasoning, OWL-Time
Numerical reasoning
Scientific reasoning
43
Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs)
April 2008
Some common vocabulary based product search and access
Semantic geospatial search & inference, access
Some metadata and limited provenance available
Semantic agentbased searches
Ontologies for data mining, visualization and analysis emerging/ maturing
Common terminology captured in ontologies, crossing domains
Semantic agent-based integration
Provenance/ annotation with ontologies in user tools
Verification is manual with minimal tool support
Ontologies for information quality developed
Domain and range properties in ontologies used in tools
Service ontologies carry quality provenance
Services must be hardwired and service agreements established
Services annotated with resource descriptions
Dynamic service discovery and mediation, and data scheduling
Semantic markup of data latency (time lags) which adapt dynamically
Local processing + data exchange
Limited metadata passed to analysis applications
Basic data tailoring services (data as service), verification/ validation
Interoperable geospatial services
(analysis as service), results explanation service
Tag properties, nonjargon vocabulary for non-specialist use
Shared terminology for the visual properties of interface objects and graph types...
Access mediated by agreed standard vocabularies, hard-wired connections
Access mediated by common ontologies
Mediation aided by services with domain/ range properties
Metadata-driven data fusion (semantic service chaining), trust
Semantic fields to describe tag key modal functions.
Key data access services are semantically mediated
44
Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs)
Semantic geospatial search & inference, access
-> requires agent development and vocabulary for agent characterization
Ontologies for data mining, visualization and analysis emerging/ maturing
Ontologies for information quality developed
-> requires mature (domain and data-type) ontologies with community endorsement and governance and a robust integration framework
-> requires mature quality and uncertainty ontologies with domain and range properties added and populated
Services annotated with resource descriptions
-> requires semantic service
(ontology) registry terminology captured in ontologies, crossing
Semantic agentbased searches
Common domains
Domain and range properties in ontologies used in tools
Dynamic service discovery and mediation, and data scheduling
Basic data tailoring services (data as service), verification/
validation
Tag properties, nonjargon vocabulary for non-specialist use
Access mediated by common ontologies
-> requires service to implement v/v, new descriptions of analyses, developing explanation
-> requires development of portal modal function vocabulary and ontology, link to domain context and data structure
-> requires adding properties to classes in ontologies and populating instances with expert agreement
Interoperable geospatial services
(analysis as service), results explanation service
Shared terminology for the visual properties of interface objects and graph types...
Mediation aided by services with domain/ range properties
Near Term (0-2 yrs) Mid Term (2-5 yrs)
45
1. Integrating Multiple Data Sources
2. Semantic Drill Down / Focused Perusal
3. Statements about Statements
4. Inference
5. Translation
6. Smart (Focused) Search
7.
Smarter Search … Configuration
8. Proof and Trust
Updated material reused from “The Substance of the Web”. McGuinness and Dean. Semantic Web Applications for National
Security. May, 2005. http://www.schafertmd.com/swans/agenda.html
46
• The Semantic Web lets us merge statements from different sources
• The RDF Graph Model allows programs to use data uniformly regardless of the source
• Figuring out where to find such data is a motivator for Semantic
Web Services hasCoordinates
#Ionosphere #magnetic name hasLowerBoundaryValue
“Terrestrial
Ionosphere” hasLowerBoundaryUnit
“km”
“100”
Different line & text colors
47 represent different data sources
• The Semantic Web uses Uniform
Resource Identifiers (URIs) to name things
…#NeutralTemperature
• These can typically be resolved to get more information about the resource
• This essentially creates a web of data analogous to the web of text created by the World Wide Web
• Ontologies are represented using the same structure as content
– We can resolve class and property URIs to learn about the ontology measuredby
Internet
...#FPI
…#Norway locatedIn
...#ISR type operatedby
...#MilllstoneHill
…#EISCAT
48
• The Semantic Web allows us to make statements about statements
– Timestamps
– Provenance / Lineage
– Authoritativeness / Probability /
Uncertainty
– Security classification
– …
• This is an unsung virtue of the
Semantic Web
#Aurora hasSource
#Danny’s hasDateTime hascolor
20031031 Red
Ontologies Workshop, APL May 26, 2006
49
• The formal foundations of the Semantic Web allow us to infer additional
(implicit) statements that are not explicitly made
• Unambiguous semantics allow question answerers to infer that objects are the same, objects are related, objects have certain restrictions, …
• SWRL allows us to make additional inferences beyond those provided by the ontology
OperatesInstrument
#Millstone Hill #Interferometer hasInstrument isOperatedBy hasTypeofData
Measures hasOperatingMode hasMeaasuredData
#VerticalMeans
50
• While encouraging sharing, the Semantic Web allows multiple URIs to refer to the same thing
• There are multiple levels of mapping
– Classes
– Properties
– Instances
– Ontologies
• OWL supports equivalence and specialization; SWRL allows more complex mappings
#precipitation name ont1:EduLevel ont1:Precipitation
#precipitation name ont2:EduLevel ont2:Rain
VO:Scientist
EduVO:K-12
51
• The Semantic Web associates 1 or more classes with each object
• We can use ontologies to enhance search by:
– Query expansion
– Sense disambiguation
– Type with restrictions
– ….
52
53
Uses emerging web standards to enable smart web applications
Given an upper-level domain choice
•Ecology
Illustrate or list contained concepts/hierarchy
•VegetationCover, TreeRings, etc.
Retrieve some specific options from web
•Maps, tree-ring data,
•
Info: https://portal.geongrid.org:8443/gridsphere/gridsphere
54
55
56
• The logical foundations of the Semantic Web allow us to construct proofs that can be used hasCalibration
#Critical
#FlatField
Dataset to improve transparency, understanding, and trust hasPeerReview
• Proof and Trust are ongoing research areas for
#Solar
Physics
Paper the Semantic Web: e.g.,
See PML and Inference
Web
“Critical Dataset has been calibrated with a flat field program that is published
In the peer reviewed literature.”
57
Framework for explaining reasoning tasks by storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by multiple distributed reasoners.
• OWL-based Proof Markup Language (PML) specification as an interlingua for proof interchange
• IWExplainer for generating and presenting interactive explanations from PML proofs providing multiple dialogues and abstraction options
• IWBrowser for displaying (distributed) PML proofs
• IWBase distributed repository of proof-related meta-data such as inference engines/rules/languages/sources
• Integrated with theorem provers, text analyzers, web services, …
58 http://iw.rpi.edu
Inference Web Infrastructure
(McGuinness, et.al., 2004 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html
)
Semantic
Discovery Service
OWL-S/BPEL
(DAML/SNRC)
N3
CWM
(NSF TAMI)
JTP
(DAML/NIMD)
KIF
Files/WWW
Proof Markup
Language (PML)
Trust
Toolkit
IWTrust
IW Explainer/
Abstractor
IWBrowser
Trust computation
End-user friendly visualization
Expert friendly
Visualization
SPARK
(DARPA CALO)
SPARK-L
Justification
IWSearch
UIMA
(DTO NIMD Text Analytics
Exp Aggregation)
Provenance
IWBase
Framework for explaining question answering tasks by
• abstracting, storing, exchanging,
• combining, annotating, filtering, segmenting,
• comparing, and rendering proofs and proof fragments provided by question answerers.
search engine based publishing provenance registration
59
Users can explore extracted entities and relationships, create new hypothesis, ask questions, browse answers and get explanations for answers.
A question
An answer
A context for explaining the answer
An abstracted explanation
60
(this graphical interface done by Batelle supported by Stanford KSL)
• Semantics are a very key ingredient for progress in informatics and escience
• A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production
• This is what we will be teaching you in this class
61
Semantic Web Methodology and
Technology Development Process
• Establish and improve a well-defined methodology vision for
Semantic Technology based application development
•
Leverage controlled vocabularies, et c.
Rapid
Open World:
Evolve, Iterate,
Prototype
Redesign,
Redeploy
Leverage
Technology
Infrastructure
Adopt
Technology
Approach
Science/Expert
Review & Iteration
Use Tools
Evaluation
Analysis
Use Case
Small Team, mixed skills
Develop model/ ontology
62
• Topics for Semantic e-Science/ Foundations:
– Semantic Methodologies
– Knowledge Representation for e-Science
– Ontology Engineering and Re-Use for e-Science
– Knowledge Integration for e-Science
– Semantic Data Integration
– Semantic Web Languages, Tools and Services
– Semantic Infrastructure and Architecture for e-Science
– Semantic Grid Middleware
– Ontology Evolution for e-Science
– Knowledge Management for e-Science
– e-Science Workflow Management
– Data life-cycle for e-Science
– Data Mining and Knowledge Discovery
63
• Semantic Web for Health Care and Life Science
• Semantic Web for Bio-Med-informatics
• Semantic Web for System and Integrated Biology
• Semantic Web for Sun, Earth, Environment and
Climate
• Semantic Web for Chemistry, Physics and
Astronomy
• Semantic Web for Engineering
• Semantic Web and Digital Libraries and Scientific
Publications
64
• Configuration and Deployment of Semantic Virtual
Observatories
– Oceanography, astronomy, geology
• Ontology Merging and Validation Test-bed
• Semantic Language and Tool Use and Evaluation
• Semantic eScience Implementation Evaluation
• Semantic Collaboration Case Studies
• Semantic Application Development and
Demonstration
65
• Reading assignments
• Assignments
– Individual
– Group
• Written assessments
• Presentation assessments
• Group assessments
66
• Attend class, complete assignments
• Participate
• Ask questions – be honest with yourself and others about what you do and do not know
• Work both individually and in a group
• Work constructively in group and class sessions
67
• Class - Monday 1-3:50pm
• Office hours – By Appointment along with a regular time to be determined and tetherless night
• This weeks assignment:
– Reading - Ontologies 101*, Semantic Web, e-Science,
RDFS
– Turn in a one page description of one of your favorite papers AND WHY from the reading list
• Next class (week 2 – September 12***** - note labor day):
– Foundations I: Methodologies, Knowledge Representation
• If you have a background that you think needs some extra background reading, talk to us.
• Questions?
68
69
• Information online, not “in line”
• Information on-demand, free of place or time
• Blended classroom and online experience
• Flexible schedule for working students
• Relevant and timely content
• More team collaboration
• More content from multiple sources
• Interactive content from voice, video and data
• Ability to contribute, as well as consume, content/knowledge
• Leads to virtual access…
IT Cyber
Infrastru cture
Cyber
Informatics
Core
Informatics
Science
Informatics, aka
Xinformatics
Science,
Societal
Benefit
Areas
71
• The data and information challenges are (almost) being identified as increasingly common
• Data and information science is becoming the
‘fourth’ column (along with theory, experiment and computation)
• Informatics is playing a key role in filling the gap between science (and the spectrum of nonexpert) use and generation and the underlying cyberinfrastructure – evident due to the emergence of Xinformatics (world-wide)
• Informatics is a profession and a community activity and requires efforts in all 3 sub-areas
(science, core, cyber) and must be synergistic 72
Scientists should be able to access a global, distributed knowledge base of scientific data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means, using various protocols, in differing vocabularies, using
(sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed
And… there often exists significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…
73