Knowledge out there on the Web Serge Abiteboul 2014 Abiteboul - EDBT keynote, Athenes 1 Personal knowledge Knowledge out out there onthere the Web • Knowledge out there on the Web: Video of the talk at the Royal Society 2014 Abiteboul - EDBT keynote, Athenes 2 Organization 1. The context 2. The personal information management system 1. The concept of Pims 2. Pims are coming 3. Advantages 3. From information to knowledge 4. The Webdamlog language 1. The language in brief 2. Probabilities 3. Access control 5. Conclusion: some research issues 2014 Abiteboul - EDBT keynote, Athenes 3 1. The context 2014 Abiteboul - EDBT keynote, Athenes 4 Data explosion • • • • • • • data: pictures, music, movies, reports, email, tweets, contacts, schedules… social interactions: opinions, annotations, recommendation… metadata: on photos, documents, music… ontologies: Alice’s ontology and mapping with other ontologies web localizations: friends account on FB, twitter, lists of blogs… security: credentials on various systems data in various organizations – jobs, schools, insurances, banks, taxes, medical, retirement… • data in various vendors – amazon, retailers, netflix, applestore… • data that software or hardware sensors capture – with or without our knowledge – web navigation, phone use, geolocation, "quantified self" measurements, contactless card readings, surveillance camera pictures, • 2014 … Abiteboul - EDBT keynote, Athenes 5 Data dispersion • • • • • • • • Laptop, desktop, smartphone, tablet, car computer Residential boxes (tvbox), NAS, electronic vaults… Mail, address book, agenda, todo-lists Facebook, LinkedIn, Picasa, YouTube, Tweeter Svn, Google docs, Dropbox Government services Business services Also machine and systems from – family, friends, associations, work • Systems even unknown to the user – third party cookies 2014 Abiteboul - EDBT keynote, Athenes 6 Data heterogeneity Type: text, relational, HTML, XML, pdf… Terminology/structure/ontology Systems: MS, Linux, IOS, Android Distribution Security protocols Quality: incomplete / inconsistent information 2014 Abiteboul - EDBT keynote, Athenes 7 Bad news • Limited functionalities because of the silos – Difficult to do global search, synchronization, task sequencing over distinct systems… • Loss of control over the data – Difficult to control privacy – Leaks of private information • Loss of freedom – Vendor lock-in 2014 Abiteboul - EDBT keynote, Athenes 8 Growing resentment • Against companies – Intrusive marketing, cryptic personalization and business decisions (e.g., on pricing), and automated customer service with no real channel for customers' voices – Creepy "big data" inferences • Against governments – NSA and its European counterparts • Dissymmetry between what these systems know about a person, and what the person actually knows 2014 Abiteboul - EDBT keynote, Athenes 9 Future alternatives (for normal people) 1. Continue with this increasing mess – Use a shrink to overcome frustration 2. Regroup all your data on the same platform – Google, Apple, Facebook, …, a new comer – Use a shrink to overcome resentment 3. Study 2 years to become a geek – Geeks know how to manage their information – Use a shrink to survive the experience 4. And, of course, there is the Pims’ way 2014 Abiteboul - EDBT keynote, Athenes 10 2. The personal information management system 2.1 Introduction 2014 Abiteboul - EDBT keynote, Athenes 11 The Pims • Personal information management system • What is a successful Web service today – Some great software – Some machines on which it runs (and a business model) • Separate the two facets – Some company provides the software – It runs on your machine with another business model 2014 Abiteboul - EDBT keynote, Athenes 12 The Pims (1) • The Pims runs software – The user chooses the code to deploy on the server. – The software is open source, a requirement for security. • With the user's data – All the user’s personal information • 0n the user’s server(s) – – – – The user owns it or pays for a hosted server The server may be a physical or a virtual machine It may be physically located at the user’s home (e.g., a tvbox) or not It may run on a single machine or be distributed among several machines – The server is in the cloud, i.e., it can be reached from everywhere personal cloud 2014 Abiteboul - EDBT keynote, Athenes 13 The Pims: the 2 main issues • Security – Enforced by the Pims: guaranteed by the contract the user has with the Pims • Reasonably small piece of code; possible to verify it – Enforced by the services running on it: open source so that we don’t need to trust the providers of these systems – A higher level of security than now • The management – Should be epsilon-work – Should require little competence – A company can be paid to do it (in the cloud) 2014 Abiteboul - EDBT keynote, Athenes 14 2. The personal information management system 2.2 This is arriving 2014 Abiteboul - EDBT keynote, Athenes 15 It is becoming possible • System administration is easier – Abstraction technologies for servers – Virtualization and configuration management tools. • Open source is very active – Open source technology more and more available • Price of machines is going down – A hosted-low cost server is as cheap as 5€/month – Paying is no longer a barrier for a majority of people Indeed I am sure you have friends already doing it 2014 Abiteboul - EDBT keynote, Athenes 16 Many people are working on it • Many systems & projects – Lifestreams, Stuff-I’ve-Seen, Haystack, MyLifeBits, Connections, Seetrieve, Personal Dataspaces, or deskWeb. – YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud • Some on particular aspects – Mailpile for mail – Lima for a Dropbox-like service, but at home. – Personal NAS (network-connected storage) e.g. Synologie – Personal data store SAMI of Samsung... • Many more 2014 Abiteboul - EDBT keynote, Athenes 17 Data disclosure movement • Smart Disclosure in the US • MiData in the UK • MesInfos in France Several large companies (network operators, banks, retailers, insurers…) have agreed to share with a panel of customers the personal data that they have about them 2014 Abiteboul - EDBT keynote, Athenes 18 Big companies are interested (1) Pre-digital companies • E.g., hotels or banks • Disintermediated from their customers by pure Internet players such as Google, Amazon, Booking.com, Mint. • In Pims, they can rebuild direct interaction • The playing field is neutral – Unlike on the Internet where they have less data • They can offer new services without compromising privacy 2014 Abiteboul - EDBT keynote, Athenes 19 Big companies are interested (2) Home appliances companies • Many boxes deployed at home or in datacenters – Internet access provider "boxes”, NAS servers, "smart" meters provided by energy vendors, home automation systems, "digital lockers”… • Personal data spaces dedicated to specific usage • Could evolve to become more generic • Control of private Internet of objects 2014 Abiteboul - EDBT keynote, Athenes 20 2. The personal information management system 2.3 Advantages 2014 Abiteboul - EDBT keynote, Athenes 21 Advantages • User control over their data – Who has access to what, under what rules, to do what • User empowerment – They choose freely services & they can leave a service • Participation to a more “neutral” Web – With the "network effects", the main platforms are accumulating data/customers and distorting competition – The Pims bring back fairness on the Web – Good practices are encouraged, e.g., interoperability, portability 2014 Abiteboul - EDBT keynote, Athenes 22 Advantages – New functionalities • • • • • • • • 2014 Single identity/login Semantic global search with (personal) ontology Synchronization/backups across services Access control management across services Task sequencing across services Exchange of information between “friends” Connected objects control, a hub for the IoT Personal big data analysis Abiteboul - EDBT keynote, Athenes 23 3. From information to knowledge (aka let’s move a tad more technical) 2014 Abiteboul - EDBT keynote, Athenes 24 Machines prefer knowledge • Integration of data & information sources – It is easier to integrate knowledge than information • Collaboration between services & devices – It is easier for services to collaborate using knowledge than with information • Problem solving based on knowledge inference 2014 Abiteboul - EDBT keynote, Athenes 25 Humans as well • The users of the system are human beings – They want support for managing information – But they are not geeks – They don’t want to program • To facilitate the interactions between humans and machines, We should use declarative languages ! 2014 Abiteboul - EDBT keynote, Athenes 26 It all started with datalog • Popular in the 90’s • Some followers in 00’s – A., Afrati, Atzeni, Cali, Greco, Gotloeb, Milo, Sacca, Ullman… • Recent revival – 2010 Oege de Moor’s workshop @oxford • Datalog 2.0 – 2010 Joe Hellerstein’s keynote @pods • Datalog Redux: Experience and Conjecture – 2014 Frank Neven’s keynote @icdt • Remaining CALM in declarative networking • Now featuring: Webdamlog 2014 Abiteboul - EDBT keynote, Athenes 27 Requirement 1: Distribution • Different machines • Different users • We use the notion of principal here – family@alice(Bob) – agenda@Alice-iPhone(…) – friends@Alice-FaceBook(…) • A principal comes with identity and privileges 2014 Abiteboul - EDBT keynote, Athenes 29 Requirement 2: Privacy • Control of who sees what in a distributed environment • Access control • Should be clear from the first part of the talk this is a most important issue Tutorial on privacy by Nicolas Anciaux, Benjamin Nguyen, Iulian Sandu Popa – Today at 2:00 2014 Abiteboul - EDBT keynote, Athenes 30 The more I see, the less I know for sure. John Lennon Requirement 3: Probabilities • We have to deal with negation – Elvis was not French • With negations, come contradictions – Elvis Presley died in 1977; The King is alive • There are different points of view – Elvis’s music is the best; it stinks • Measure uncertainty with probabilities 2014 Abiteboul - EDBT keynote, Athenes 31 So, what is the goal • A datalog-style language with distribution access control probabilities We are lucky, there is such a language: Webdamlog 2014 Abiteboul - EDBT keynote, Athenes 32 4. The Webdamlog language (aka let’s be serious) 4.1 Webdamlog in brief 2014 Abiteboul - EDBT keynote, Athenes 33 Facts and rules Facts are of the form R@p(a1,…,an) – p is a principal, i.e., Serge, Serge’s-iPhone, Facebook/Serge, s.ab@gmail.com Rules are of the form $R@$P($U) :- $R1@$P1($U1), ..., $Rn@$Pn($Un) – – – – – 2014 $R, $Ri are relation terms $P, $Pi are peer terms $U, $Ui are tuples of terms Safety condition Also negations: ignored here Abiteboul - EDBT keynote, Athenes 34 The semantics of rules Classification based on locality and nature of head predicates (intentional or extensional) • Local rule at my-laptop: all predicates in the body of the rules are from my-laptop Local with local intentional head datalog Local with local extensional head database update Local with non-local extensional head messaging between peers Local with non-local intentional head view definition Non-local general delegation 2014 35 Abiteboul - EDBT keynote, Athenes Local rules with local head Intensional local head – datalog [at my-iphone] fof@my-iphone($x, $y) :- friend@my-iphone($x,$y) fof@my-iphone($x,$y) :- friend@my-iphone($x,$z), fof@my-iphone($z,$y) Extensional local head – database updates [at my-iphone] believe@my-iphone(“Alice”, $loc) :tell@my-iphone($p,”Alice”, $loc), friend@my-iphone($p) 2014 Abiteboul - EDBT keynote, Athenes 36 Local rules & non-local extensional head Messaging between peers $message@$peer($name, “Happy birthday!”) :today@my-iphone($date), birthday@my-iphone($name, $message, $peer, $date) Example – today@my-iphone(3/25) – birthday@my-iphone("Manon”, “sendmail”, “gmail.com”, 3/25) – sendmail@gmail.com("Manon”, “Happy birthday”) 2014 Abiteboul - EDBT keynote, Athenes 37 Local rules & non-local intentional head View definition boyMeetsGirl@gossip-site($girl, $boy) :girls@my-iphone($girl, $event), boys@my-iphone($boy, $event) • Semantics of boyMeetGirl@gossip-site is a join of relations girls and boys from my-iphone • Defines a view at some other peer 2014 Abiteboul - EDBT keynote, Athenes 38 Non-local rules General delegation (at my-iphone): boyMeetsGirl@gossip-site($girl, $boy) :girls@my-iphone($girl, $event), boys@alice-iphone($boy, $event) Example: girls@my-iphone(“Alice”, “Julia's birthday”) – my-iphone installs the following rule at alice-iphone boyMeetsGirl@gossip-site(“Alice”, $boy) :boys@alice-iphone($boy, “Julia's birthday”) Useful to distribute work and exchange knowledge 2014 Abiteboul - EDBT keynote, Athenes 39 The thesis The Web should turn into a distributed knowledge base where peers share facts and rules, and collaborate The language Webdamlog is a first step towards that goal Missing – Probabilities – Access control 2014 Abiteboul - EDBT keynote, Athenes 40 4. The Webdamlog language 4.2 Probabilities 2014 Abiteboul - EDBT keynote, Athenes 41 Advertisement Deduction with Contradictions in Datalog S.A., Daniel Deutch and Victor Vianu – Tomorrow, 11:00 2014 Abiteboul - EDBT keynote, Athenes 42 4. The Webdamlog language 4.3 Access control 2014 Abiteboul - EDBT keynote, Athenes 43 Requirements Data access Users would like to control who can read and modify their information Data dissemination Users would like to control how their data are transferred from one participant to another Application control Users would like to control which applications can run on their behalf, and what information these applications can access. 2014 Abiteboul - EDBT keynote, Athenes 44 The general picture • Coarse grain for extensional relations – read access to the relation • Fine grain for intensional relations – read access to tuple t requires read access to the tuples that lead to deriving t • Delegation controlled in a sandbox • Focus on read privilege here 2014 Abiteboul - EDBT keynote, Athenes 45 Read: default • Extensional relations – if you have read privilege to the relation • Intensional relations – if you have read privilege to the relation & – if you can read all the tuples that have been used to create this fact – provenance of the fact 2014 Abiteboul - EDBT keynote, Athenes 46 Fine grain access control [at Bob] album@Alice($p,$f) :- photo@Bob($p,$f) [at Sue] album@Alice($p,$f) :- photo@Sue($p,$f) – album@Alice is intensional – Both Bob and Sue contribute to it – Peter who has read privilege to album@Alice and photo@Bob only does not see the photos of Sue 2014 48 Abiteboul - EDBT keynote, Athenes Paranoiac access control [at Bob] album@Alice($p,$f) :photo@Bob($p,$f), friends@Bob($f) – Issue: you can read Bob’s photos only if you have read privilege on friends@Bob that Bob wants to keep private 2014 49 Abiteboul - EDBT keynote, Athenes Declassification [at Bob] photo@Alice($p,$f) :photo@Bob($p,$f), [ hide friends@Bob($f) ] – Hide: blocks the provenance from friends@Bob – Bob declassify this data just for the evaluation of this rule – You can declassify only tuples you own ↦ grant privilege 2014 50 Abiteboul - EDBT keynote, Athenes Issues with non local rules [at Bob] message@Sue(“I hate you”) :- date@Alice(d) aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x) Ignoring access rights, by delegation, this results in running [at Alice] message@Sue(“I hate you”) :- date@Alice(d) aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x) 2014 51 Abiteboul - EDBT keynote, Athenes Default solution: sand box We run the rule at Alice in a Sandbox • We use the access rights of Bob So the second rule does not succeed in sending secrets • The message specifies that this is done at Bob’s request So requires authentication/signatures Alternative: delegation without sandbox 2014 52 Abiteboul - EDBT keynote, Athenes 5. Conclusion: some research issues 2014 Abiteboul - EDBT keynote, Athenes 53 Explaining • Users want to understand the information they see, the answers they are given – In their professional/social life • Difficulties – Reasoning with large number of facts – Information is often probabilistic and not public – Requires knowing how the information was obtained (its provenance) 2014 Abiteboul - EDBT keynote, Athenes 55 Serendipity • You may hear by chance a song that is going to totally obsess you • A librarian may suggest your reading an article that will transform your research • A perfect search engine • A perfect recommendation system • A perfect computer assistant Such systems are boring This is serendipity They lack serendipity Design programs that would introduce serendipity in our lives 2014 Abiteboul - EDBT keynote, Athenes 57 Hypermnesia Exceptionally exact or vivid memory, especially as associated with certain mental illnesses For a user: We cannot live knowing that any word, any move will leave a trace? For the ecosystem: We cannot store all the data we produce – lack of storage resources Forgetting is Key to a Healthy Mind Scientific American Image: Aaron Goodman A main issue is to select the information we choose to keep 2014 Abiteboul - EDBT keynote, Athenes 58 Babel of human-machine-interaction • Each time a user interacts with a data source, does he have to use the ontology of that source ? • No! • Instead of a user adapting to the ontologies of the N systems he uses each day • We want the N systems to adapt to the user’s ontology 2014 Abiteboul - EDBT keynote, Athenes 59 Religion…science…machines • Knowledge used to be determined by religion • Knowledge used to be determined scientifically • Knowledge will now be determined by machines? • Decisions are increasingly made by machines – – – – 2014 Stock market (automatic trading) Fully automated factory Fully automated metros Death penalty (killer drones)… Abiteboul - EDBT keynote, Athenes 60 to the digital world! • We will soon be living in a world surrounded by machines that – acquire knowledge and decide for us • What will we do with that technology? • Will we become smarter? • Will we become master or slave of the new technology? 2014 Abiteboul - EDBT keynote, Athenes 61 σας ευχαριστώ 2014 Abiteboul - EDBT keynote, Athenes 62