Semantics and Trust in Scientific Data Sharing

advertisement
Semantics and Trust in
Scientific Data Sharing
Joshua Lieberman
Traverse Technologies
[Open Geospatial Consortium, W3C, etc.]
Cardiff Workshop on
Geospatial Knowledge Infrastructures
November 2007
jlieberman/at/traversetechnologies.com
©2007 Joshua Lieberman
Tuesday, November 27, 2007
1
Abstract
Traditional establishment of this sort of trust has typically involved small epistemic communities and face-to-face
agreements. Branded trust and mechanical trust which have moved to the fore in commercial domains have played a
lesser role in scientific communities where context and content trust must also be considered. Despite all of the
distributed collaboration technology available to us, the face-to-face experience is still essential to semantic trust.
Experience with numerous OGC testbeds and pilot activities shows that without at least an initial in-person collaboration,
projects to develop interoperability-enhancing standards and practices are very difficult to carry off. There is some aspect
of both personal interaction and richness of cues which enables a group to develop trust not only in a shared
conceptualization, but also in shared goals, theories, models, and other components of meaning. Geospatial data is not
fundamentally different in this regard, but is especially rich in the higher viewpoint aspects which complicate sharing,
as well as other aspects such as privacy.
As the scale and virtualization of scientific collaborations grow, understanding of and support for similar trust
interactions in distributed collaborations becomes ever more important. Work done to simulate this with Web-based
social networking and Web 2.0 tagging / rating systems has mainly increased trust in movie reviews. Much of the work
has not, however, addressed the particular concerns of providers and users of scientific data, the difference between
viewpoint and opinion, and subtle differences which can arise between apparently shared scientific conceptualizations.
The rich communication cues which characterize F2F interactions are difficult to understand, let alone reproduce in
software, and require further (social science) research. More accessible F2F trust mechanisms, such as rapidly iterative,
lightweight interaction, can be approximated online through various more-or-less deficient approaches, including video
conferencing, audio conferencing, instant messaging, wiki's, and of course email. Distributed tagging as exemplified
by RSS and Atom is another approach to capturing near-realtime, but potentially more formal, distributed and
asynchronous iteration of viewpoint. A pilot sponsored by GeoConnections has used a typical F2F approach to
establishing trust, but has looked at maintaining this trust through multiple publish-subscribe steps which exchange
GeoAtom entries referencing one or more published features and each other. Whatever mechanisms are eventually
developed, it is increasingly clear that some of the most challenging obstacles to infrastructure-scale geospatial data
sharing involve not just machine handshaking but some form of human handshaking as well.
©2007 Joshua Lieberman
Tuesday, November 27, 2007
2
Who Am I?
Principal, co-founder of small Boston consultancy with a
speciality in business strategy, application design, Web
development where geospatial interoperability “matters”. Clients
range from the US Census Bureau to satellite imaging
companies
Part-time OGC IP Team architect for testbeds & pilots (OWS-3,
OWS-4, Geospatial Semantic Web Interoperability Experiment,
CGDI WFS Interoperability Pilot, Geoss Architecture
Implementation Pilot), member of OGC Architecture Board, cochair of Geosemantics WG
Onetime chair of W3C Geospatial Incubator
Former environmental hydrogeochemist, solid earth petrologist
©2007 Joshua Lieberman
Tuesday, November 27, 2007
3
Case for Standards: OGC and OWS
•
“The
Open Geospatial Consortium, Inc. (OGC) is a non-profit, international, voluntary
consensus standards organization that is leading the development of standards for geospatial
and location based services”
•
“OGC Web Services” (OWS) - OGC has been developing for some time specifications for a suite
of Web services (sensu latu) and associated encodings to expose geospatial content and
operations from distributed content repositories to remote clients across diverse platforms:
– GML - geographic markup language (an information model and XML schema) for encoding
features (geometric representations of geography).
– Web Feature Service - service providing access to collections of features
– Web Map Service - service providing access to map layers (cartographically rendered features and
images)
– Catalog Service / Web - service supporting (spatial) discovery of geospatial datasets and services
– Several other associated specifications, e.g. coordinate reference system encoding
– Many corresponding or related ISO standards, especially 191nn (TC211)
•
Semantics of OGC standards are informally and syntactically expressed, difficult to access widely
©2007 Joshua Lieberman
Tuesday, November 27, 2007
4
Overview
Motivations of technical / scientific “diplomacy”
Definitions, terms, concepts
Why share? Why not share?
How has sharing been done? “Who are you”
How is sharing being done? “Where do you come from”
How could sharing be done? “Let’s share and see what we come up
with.”
Experiments in roundtrip interaction. GeoRSS and Feedback
feeds
Where to next? Ideas for further work.
©2007 Joshua Lieberman
Tuesday, November 27, 2007
5
Concepts / Definitions
Semantics - “meaning” of information / knowledge
Trust - Expected value / expected behavior
Scientific data - testability, reproducibility, necessity of
sharing
Semantic trust - shared understanding of information /
knowledge
Scientific trust - shared values, elements of a common
viewpoint, sharing “contract”
Sharing trust - Matching data to analysis in creative repurposing of data.
©2007 Joshua Lieberman
Tuesday, November 27, 2007
6
More Terms
Trust in social networks, trust from social networks
Handshake, certificate, brand, mechanical
User trust: relevant data, accessible data, quality data, continuity,
responsiveness, “what were they thinking?”
Provider trust: attribution, appropriate use, confidentiality, (spatial)
privacy, liability, competitive advantage, collaboration, “what are they
thinking?”
Epistemic communities, communities of practice, communities of
interest, universe(s) of discourse
“Expected use” versus “unanticipated use”
Web of trust + Trust of resource + Web of resources =trusted
relationships
Attribution and lineage: can data be “signed”, metadata “sticky”?
©2007 Joshua Lieberman
Tuesday, November 27, 2007
7
Syntactic-mechanical Interoperability Stack
Human-centric
Meaning
Vocabulary
Encoding
owl, rdf, xtm,
uml, xml schema, gml
ascii, utf-8, xml
Control
tcp, http, wap
Routing
ip, dns
Transport
ethernet, wifi, gprs
Medium
e-m, light
Machine-centric
©2007 Joshua Lieberman
Tuesday, November 27, 2007
8
General feature Model
©2007 Joshua Lieberman
Tuesday, November 27, 2007
9
(Geo)semantic interoperability stack
Human-centric
Intention
description, navigation
Perception
visual - aural - tactile
Theory
persistence, consequence
Discernment
feature, context
Application
discovery, analysis
Representation
Ontology
geometry, raster
upper, domain, foundation
Machine-centric
©2007 Joshua Lieberman
Tuesday, November 27, 2007
10
Networks
Data grids
feeds
provides
Service (oriented) networks
Knowledge graphs
+spatial:
Feature networks
Trust networks
shared by
such as
Referral networks
Social networks
type of
component of
The “Many Worlds Web”
©2007 Joshua Lieberman
Tuesday, November 27, 2007
11
Privacy
Privacy of (location) data (e.g. Census data)
Privacy of credentials (e.g. SSN)
Privacy of relationships (e.g. employment status)
privacy of intent (c.f. research poaching)
Information content of partial privacy
©2007 Joshua Lieberman
Tuesday, November 27, 2007
12
Trust mechanisms
Reputation-based trust mechanisms (highly rated, this is a
good brand)
Context-based trust mechanisms (everything from MIT
CSAIL is true, everything from MIT about Stanford is
exagerrated)
Content-based trust mechanisms (statistical validity,
spatiotemporal relevance, consistency with other studies)
...The Sematic Web Trust Layer
Jeremy Carroll, Hewlett-Packard Labs, UK
Chris Bizer, FreieUniversität Berlin, Germany
Joint work with
Pat Hayes, IHMC, USA
Patrick Stickler, Nokia, Finland
©2007 Joshua Lieberman
Tuesday, November 27, 2007
13
What to Trust
Data resources
Algorithm / transformation resources
Resource relationships, e.g. interpretations, analyses
Reputation trust is “lumpy” - sparse experts are most trusted,
but this stifles unanticipated innovation, particularly those
involving paradigm (e.g. model) shift.
Some referral trust is statistical, but numbers of referrals
may not indicate relevance to a given application
Trusted services: data + algorithm + operator
©2007 Joshua Lieberman
Tuesday, November 27, 2007
14
Trust and Lineage
Trusted sources +
Trusted algorithm +
Trusted operator =
Trusted / transparent lineage
Needs metadata (or is it reification)?
Separate from, but “stuck to” data
Can data publication be both easy as in automatic / low
effort and “peer-reviewed” / responsive quality?
©2007 Joshua Lieberman
Tuesday, November 27, 2007
15
Trust, Lies, and Metadata
Meta-data is (not necessarily) objective data about data.
Meta-data for a resource is (not necessarily) produced only
once
Meta-data must (not necessarily) have a logically defined
semantics.
Meta-data can (not always) be described by meta-data
documents.
Meta-data is (not necessarily) the digital version of library
indexing systems.
Meta-data is (not necessarily) machine-readable data about
data.
...Semantic Web Metadata for e-Learning - Some Architectural Guidelines
Mikael Nilsson, Matthias Palmér, Ambjörn Naeve
©2007 Joshua Lieberman
Tuesday, November 27, 2007
16
(Geo)RSS geospatial “views about data”
_feature
GeoRSS Property Tags
External Information
_featureproperty
where
featuretypetag
relationshiptag
featurename
elev
floor
radius
_geometry
point
line
box
_content
polygon
atom:entry
rss:item
xhtml:span
...
http://www.georss.org and http://www.w3.org/2005/Incubator/geo/XGR-geo/
©2007 Joshua Lieberman
Tuesday, November 27, 2007
17
18
User Feedback & Provincial Response Loop
KO
NO
TE RTHW
RR I
TOI EST T
ERR
R ES
IT
DU
NOR ORIES
D-OU
EST
CO
BR LOM
ITA BI
NN EIQU
E
N A V U T
ALB
ER T
© 2007.
SASK
ATCHE
WAN
C
B E
É
C
Q U
B E
E
Q U
MANITOBA
ONTARIO
FRONTIÈRES ET LIMITES
Frontière internationale
Limite provinciale et territoriale
Ligne de séparation (Canada / Kalaallit Nunaat)
Limite de 200 milles (Zone Économique Exclusive)
Discover data issue
Her Majesty the Queen in Right of Canada, Natural Resources Canada.
Sa Majesté la Reine du chef du Canada, Ressources naturelles Canada.
Data User
Subscribe to
feedback
A
Aggregate
updates
BOUNDARIES
International
Provincial and Territorial
Dividing line (Canada / Kalaallit Nunaat)
200-mile limit (Exclusive Economic Zone)
NEWFOUN
DLA
ND
TERRE
-NE
AN
UV
E-E D
T-L
L
AB AB
RA RA
BO
R
atlas.gc.ca
OR
D
Publish
data
update
N U
Evaluate feedback
Accept or Reject
Update data
B
CO RITI
LU SH
MB
IA
Close the
Loop with
User
Provincial or Local
Data Custodian
N
on
YU
e
iqu
t ME)
e
e
r
Saint-Pier
NC
(FRA
l
I
P E -É
P
Î
B
N B
N-
NO N O V
UV A
EL SC
LE OT
-É IA
CO
SS
Provincial
Server
E
Publish
feedback
Geobase Server
NB = New Brunswick
N-B = Nouveau-Brunswick
PEI = Prince Edward Island
Î-P-É = Île-du-Prince-Édouard
CANADA
300
km
0
300
600
900
km
Create data feedback
View feedback and its
update response
Tuesday, November 27, 2007
18
Geoss Clearinghouse Architecture
©2007 Joshua Lieberman
Tuesday, November 27, 2007
19
Summary I
Good data are immortal but not omnipotent. Data sharing needs sharing of
quality p->u and sharing of application u->p, among other elements; in
other words, two-way trust between user and provider.
Trust is not a scalar index, but an expectation of behavior. In a chain of
trust, each expectation needs to be for the same behavior. In a Web of trust,
not all relationships are of the same type, ie pertain to the same behaviors
and knowledge. Traversal is ambiguous.
Semantic technology helps machines communicate, but the ultimate goal is
helping people to communicate, so whether a given electron helps two people
solve the same problem matters in the larger scheme of interoperability
Semantics are likely only truly shared within an epistemic community, ie
a group of people solving the same problems within the same theoretic
context.
Epistemic communities are formed or bridged in face-to-face meetings,
shared experiences
©2007 Joshua Lieberman
Tuesday, November 27, 2007
20
Summary II
F2f meetings are not really a scaleable resource, esp. in the context of
infrastructure (although I am happy to be here).
Social networks on the Web may be forming epistemic communities for
some problems (rating movies), but not yet for more structured ones
(sharing scientific data across domains)
Need to understand how F2F meetings work in building epistemic
community, and semantic trust, to translate the process into non-F2F
mechanisms (e.g. wiki’s, IM)
Need to understand what higher-level elements of semantic trust are
manifested in ontological commitment in order to predict whether a Webbased trust mechanism will work.
Spatial-temporal semantics are not unique, but are particularly
dependent on geocentric frames of reference which are difficult to step
back from.
©2007 Joshua Lieberman
Tuesday, November 27, 2007
21
Download