Semantics and Trust in Scientific Data Sharing Joshua Lieberman Traverse Technologies [Open Geospatial Consortium, W3C, etc.] Cardiff Workshop on Geospatial Knowledge Infrastructures November 2007 jlieberman/at/traversetechnologies.com ©2007 Joshua Lieberman Tuesday, November 27, 2007 1 Abstract Traditional establishment of this sort of trust has typically involved small epistemic communities and face-to-face agreements. Branded trust and mechanical trust which have moved to the fore in commercial domains have played a lesser role in scientific communities where context and content trust must also be considered. Despite all of the distributed collaboration technology available to us, the face-to-face experience is still essential to semantic trust. Experience with numerous OGC testbeds and pilot activities shows that without at least an initial in-person collaboration, projects to develop interoperability-enhancing standards and practices are very difficult to carry off. There is some aspect of both personal interaction and richness of cues which enables a group to develop trust not only in a shared conceptualization, but also in shared goals, theories, models, and other components of meaning. Geospatial data is not fundamentally different in this regard, but is especially rich in the higher viewpoint aspects which complicate sharing, as well as other aspects such as privacy. As the scale and virtualization of scientific collaborations grow, understanding of and support for similar trust interactions in distributed collaborations becomes ever more important. Work done to simulate this with Web-based social networking and Web 2.0 tagging / rating systems has mainly increased trust in movie reviews. Much of the work has not, however, addressed the particular concerns of providers and users of scientific data, the difference between viewpoint and opinion, and subtle differences which can arise between apparently shared scientific conceptualizations. The rich communication cues which characterize F2F interactions are difficult to understand, let alone reproduce in software, and require further (social science) research. More accessible F2F trust mechanisms, such as rapidly iterative, lightweight interaction, can be approximated online through various more-or-less deficient approaches, including video conferencing, audio conferencing, instant messaging, wiki's, and of course email. Distributed tagging as exemplified by RSS and Atom is another approach to capturing near-realtime, but potentially more formal, distributed and asynchronous iteration of viewpoint. A pilot sponsored by GeoConnections has used a typical F2F approach to establishing trust, but has looked at maintaining this trust through multiple publish-subscribe steps which exchange GeoAtom entries referencing one or more published features and each other. Whatever mechanisms are eventually developed, it is increasingly clear that some of the most challenging obstacles to infrastructure-scale geospatial data sharing involve not just machine handshaking but some form of human handshaking as well. ©2007 Joshua Lieberman Tuesday, November 27, 2007 2 Who Am I? Principal, co-founder of small Boston consultancy with a speciality in business strategy, application design, Web development where geospatial interoperability “matters”. Clients range from the US Census Bureau to satellite imaging companies Part-time OGC IP Team architect for testbeds & pilots (OWS-3, OWS-4, Geospatial Semantic Web Interoperability Experiment, CGDI WFS Interoperability Pilot, Geoss Architecture Implementation Pilot), member of OGC Architecture Board, cochair of Geosemantics WG Onetime chair of W3C Geospatial Incubator Former environmental hydrogeochemist, solid earth petrologist ©2007 Joshua Lieberman Tuesday, November 27, 2007 3 Case for Standards: OGC and OWS • “The Open Geospatial Consortium, Inc. (OGC) is a non-profit, international, voluntary consensus standards organization that is leading the development of standards for geospatial and location based services” • “OGC Web Services” (OWS) - OGC has been developing for some time specifications for a suite of Web services (sensu latu) and associated encodings to expose geospatial content and operations from distributed content repositories to remote clients across diverse platforms: – GML - geographic markup language (an information model and XML schema) for encoding features (geometric representations of geography). – Web Feature Service - service providing access to collections of features – Web Map Service - service providing access to map layers (cartographically rendered features and images) – Catalog Service / Web - service supporting (spatial) discovery of geospatial datasets and services – Several other associated specifications, e.g. coordinate reference system encoding – Many corresponding or related ISO standards, especially 191nn (TC211) • Semantics of OGC standards are informally and syntactically expressed, difficult to access widely ©2007 Joshua Lieberman Tuesday, November 27, 2007 4 Overview Motivations of technical / scientific “diplomacy” Definitions, terms, concepts Why share? Why not share? How has sharing been done? “Who are you” How is sharing being done? “Where do you come from” How could sharing be done? “Let’s share and see what we come up with.” Experiments in roundtrip interaction. GeoRSS and Feedback feeds Where to next? Ideas for further work. ©2007 Joshua Lieberman Tuesday, November 27, 2007 5 Concepts / Definitions Semantics - “meaning” of information / knowledge Trust - Expected value / expected behavior Scientific data - testability, reproducibility, necessity of sharing Semantic trust - shared understanding of information / knowledge Scientific trust - shared values, elements of a common viewpoint, sharing “contract” Sharing trust - Matching data to analysis in creative repurposing of data. ©2007 Joshua Lieberman Tuesday, November 27, 2007 6 More Terms Trust in social networks, trust from social networks Handshake, certificate, brand, mechanical User trust: relevant data, accessible data, quality data, continuity, responsiveness, “what were they thinking?” Provider trust: attribution, appropriate use, confidentiality, (spatial) privacy, liability, competitive advantage, collaboration, “what are they thinking?” Epistemic communities, communities of practice, communities of interest, universe(s) of discourse “Expected use” versus “unanticipated use” Web of trust + Trust of resource + Web of resources =trusted relationships Attribution and lineage: can data be “signed”, metadata “sticky”? ©2007 Joshua Lieberman Tuesday, November 27, 2007 7 Syntactic-mechanical Interoperability Stack Human-centric Meaning Vocabulary Encoding owl, rdf, xtm, uml, xml schema, gml ascii, utf-8, xml Control tcp, http, wap Routing ip, dns Transport ethernet, wifi, gprs Medium e-m, light Machine-centric ©2007 Joshua Lieberman Tuesday, November 27, 2007 8 General feature Model ©2007 Joshua Lieberman Tuesday, November 27, 2007 9 (Geo)semantic interoperability stack Human-centric Intention description, navigation Perception visual - aural - tactile Theory persistence, consequence Discernment feature, context Application discovery, analysis Representation Ontology geometry, raster upper, domain, foundation Machine-centric ©2007 Joshua Lieberman Tuesday, November 27, 2007 10 Networks Data grids feeds provides Service (oriented) networks Knowledge graphs +spatial: Feature networks Trust networks shared by such as Referral networks Social networks type of component of The “Many Worlds Web” ©2007 Joshua Lieberman Tuesday, November 27, 2007 11 Privacy Privacy of (location) data (e.g. Census data) Privacy of credentials (e.g. SSN) Privacy of relationships (e.g. employment status) privacy of intent (c.f. research poaching) Information content of partial privacy ©2007 Joshua Lieberman Tuesday, November 27, 2007 12 Trust mechanisms Reputation-based trust mechanisms (highly rated, this is a good brand) Context-based trust mechanisms (everything from MIT CSAIL is true, everything from MIT about Stanford is exagerrated) Content-based trust mechanisms (statistical validity, spatiotemporal relevance, consistency with other studies) ...The Sematic Web Trust Layer Jeremy Carroll, Hewlett-Packard Labs, UK Chris Bizer, FreieUniversität Berlin, Germany Joint work with Pat Hayes, IHMC, USA Patrick Stickler, Nokia, Finland ©2007 Joshua Lieberman Tuesday, November 27, 2007 13 What to Trust Data resources Algorithm / transformation resources Resource relationships, e.g. interpretations, analyses Reputation trust is “lumpy” - sparse experts are most trusted, but this stifles unanticipated innovation, particularly those involving paradigm (e.g. model) shift. Some referral trust is statistical, but numbers of referrals may not indicate relevance to a given application Trusted services: data + algorithm + operator ©2007 Joshua Lieberman Tuesday, November 27, 2007 14 Trust and Lineage Trusted sources + Trusted algorithm + Trusted operator = Trusted / transparent lineage Needs metadata (or is it reification)? Separate from, but “stuck to” data Can data publication be both easy as in automatic / low effort and “peer-reviewed” / responsive quality? ©2007 Joshua Lieberman Tuesday, November 27, 2007 15 Trust, Lies, and Metadata Meta-data is (not necessarily) objective data about data. Meta-data for a resource is (not necessarily) produced only once Meta-data must (not necessarily) have a logically defined semantics. Meta-data can (not always) be described by meta-data documents. Meta-data is (not necessarily) the digital version of library indexing systems. Meta-data is (not necessarily) machine-readable data about data. ...Semantic Web Metadata for e-Learning - Some Architectural Guidelines Mikael Nilsson, Matthias Palmér, Ambjörn Naeve ©2007 Joshua Lieberman Tuesday, November 27, 2007 16 (Geo)RSS geospatial “views about data” _feature GeoRSS Property Tags External Information _featureproperty where featuretypetag relationshiptag featurename elev floor radius _geometry point line box _content polygon atom:entry rss:item xhtml:span ... http://www.georss.org and http://www.w3.org/2005/Incubator/geo/XGR-geo/ ©2007 Joshua Lieberman Tuesday, November 27, 2007 17 18 User Feedback & Provincial Response Loop KO NO TE RTHW RR I TOI EST T ERR R ES IT DU NOR ORIES D-OU EST CO BR LOM ITA BI NN EIQU E N A V U T ALB ER T © 2007. SASK ATCHE WAN C B E É C Q U B E E Q U MANITOBA ONTARIO FRONTIÈRES ET LIMITES Frontière internationale Limite provinciale et territoriale Ligne de séparation (Canada / Kalaallit Nunaat) Limite de 200 milles (Zone Économique Exclusive) Discover data issue Her Majesty the Queen in Right of Canada, Natural Resources Canada. Sa Majesté la Reine du chef du Canada, Ressources naturelles Canada. Data User Subscribe to feedback A Aggregate updates BOUNDARIES International Provincial and Territorial Dividing line (Canada / Kalaallit Nunaat) 200-mile limit (Exclusive Economic Zone) NEWFOUN DLA ND TERRE -NE AN UV E-E D T-L L AB AB RA RA BO R atlas.gc.ca OR D Publish data update N U Evaluate feedback Accept or Reject Update data B CO RITI LU SH MB IA Close the Loop with User Provincial or Local Data Custodian N on YU e iqu t ME) e e r Saint-Pier NC (FRA l I P E -É P Î B N B N- NO N O V UV A EL SC LE OT -É IA CO SS Provincial Server E Publish feedback Geobase Server NB = New Brunswick N-B = Nouveau-Brunswick PEI = Prince Edward Island Î-P-É = Île-du-Prince-Édouard CANADA 300 km 0 300 600 900 km Create data feedback View feedback and its update response Tuesday, November 27, 2007 18 Geoss Clearinghouse Architecture ©2007 Joshua Lieberman Tuesday, November 27, 2007 19 Summary I Good data are immortal but not omnipotent. Data sharing needs sharing of quality p->u and sharing of application u->p, among other elements; in other words, two-way trust between user and provider. Trust is not a scalar index, but an expectation of behavior. In a chain of trust, each expectation needs to be for the same behavior. In a Web of trust, not all relationships are of the same type, ie pertain to the same behaviors and knowledge. Traversal is ambiguous. Semantic technology helps machines communicate, but the ultimate goal is helping people to communicate, so whether a given electron helps two people solve the same problem matters in the larger scheme of interoperability Semantics are likely only truly shared within an epistemic community, ie a group of people solving the same problems within the same theoretic context. Epistemic communities are formed or bridged in face-to-face meetings, shared experiences ©2007 Joshua Lieberman Tuesday, November 27, 2007 20 Summary II F2f meetings are not really a scaleable resource, esp. in the context of infrastructure (although I am happy to be here). Social networks on the Web may be forming epistemic communities for some problems (rating movies), but not yet for more structured ones (sharing scientific data across domains) Need to understand how F2F meetings work in building epistemic community, and semantic trust, to translate the process into non-F2F mechanisms (e.g. wiki’s, IM) Need to understand what higher-level elements of semantic trust are manifested in ontological commitment in order to predict whether a Webbased trust mechanism will work. Spatial-temporal semantics are not unique, but are particularly dependent on geocentric frames of reference which are difficult to step back from. ©2007 Joshua Lieberman Tuesday, November 27, 2007 21