NoSQL: Graph Databases 1 Databases Relational NoSQL Hadoop … 2 Data are more connected: • • • • • • Text HyperText RSS Blogs Tagging RDF 3 Trend 2: Connectedness GGG Onotologies RDFa Information connectivity Folksonomies Tagging Wikis UGC Blogs Feeds Hypertext Text Documents 4 Architecture Changes Over Time 1980’s: Single Application Application DB 5 Architecture Changes Over Time 1990’s: Integration Database Antipattern Application Application Application DB 6 Architecture Changes Over Time 2000’s: SOA RESTful, hypermedia, composite apps Application Application Application DB DB DB 7 Side note: RDBMS performance Salary list Most Web apps Social Network Location-based services 8 NOSQL Not Only SQL 9 Less than 10% of the NOSQL Vendors 10 Key Value Stores • Came from a research article written by Amazon (Dynamo) – Global Distributed Hash Table • Global collection of key value pairs 11 Four NOSQL Categories 12 Key Value Stores • Most Based on Dynamo: Amazon Highly Available Key-Value Store • Data Model: – Global key-value mapping – Big scalable HashMap – Highly fault tolerant (typically) • Examples: – Redis, Riak, Voldemort 13 Key Value Stores: Pros and Cons • Pros: – Simple data model – Scalable • Cons – Poor for complex data 14 Column Family • Most Based on BigTable: Google’s Distributed Storage System for Structured Data • Data Model: – A big table, with column families • Every row can have its own schema • Helps capture more “messy” data – Map Reduce for querying/processing • Examples: – HBase, HyperTable, Cassandra 15 Column Family: Pros and Cons • Pros: – Supports Simi-Structured Data – Naturally Indexed (columns) – Scalable • Cons – Poor for interconnected data 16 Document Databases • Inspired by Lotus Notes – Collection of Key value pair collections (called Documents) 17 Document Databases • Data Model: – A collection of documents – A document is a key value collection – Index-centric, lots of map-reduce • Examples: – CouchDB, MongoDB 18 Document Databases: Pros and Cons • Pros: – Simple, powerful data model – Scalable • Cons – Poor for interconnected data – Query model limited to keys and indexes – Map reduce for larger queries 19 Graph Databases • Data Model: – Nodes and Relationships • Examples: – Neo4j, OrientDB, InfiniteGraph, AllegroGraph 20 Graph Databases: Pros and Cons • Pros: – Powerful data model, as general as RDBMS – Connected data locally indexed – Easy to query • Cons – Sharding ( lots of people working on this) • Scales UP reasonably well – Requires rewiring your brain 21 What are graphs good for? • • • • • • • • • • • Recommendations Business intelligence Social computing Geospatial Systems management Web of things Genealogy Product catalogue Web analytics Scientific computing (especially bioinformatics) Indexing your slow RDBMS 22 What is a Graph? 23 What is a Graph? • An abstract representation of a set of objects where some pairs are connected by links. Object (Vertex, Node) Link (Edge, Arc, Relationship) 24 Different Kinds of Graphs • Undirected Graph • Directed Graph • Pseudo Graph • Multi Graph • Hyper Graph 25 More Kinds of Graphs • Weighted Graph • Labeled Graph • Property Graph 26 What is a Graph Database? • A database with an explicit graph structure • Each node knows its adjacent nodes • As the number of nodes increases, the cost of a local step (or hop) remains the same • Plus an Index for lookups 27 Relational Databases 28 Graph Databases 29 30 Neo4j Released in 2007 Written in Java Uses Lucene Indexing Founder: Emil Eifrem 33 Neo4j Tips • Each entity table is represented by a label on nodes • Each row in an entity table is a node • Columns on those tables become node properties • Join tables are transformed into relationships, columns on those tables become relationship properties 34 Neo4j: Features • • • • • Data model (flexible schema) − Neo4j follows a data model named native property graph model. Here, the graph contains nodes (entities) and these nodes are connected with each other (depicted by relationships). Nodes and relationships store data in key-value pairs known as properties. ACID properties − Neo4j supports full ACID (Atomicity, Consistency, Isolation, and Durability) rules. Scalability and reliability − You can scale the database by increasing the number of reads/writes, and the volume without effecting the query processing speed and data integrity. Neo4j also provides support for replication for data safety and reliability. Cypher Query Language − Neo4j provides a powerful declarative query language known as Cypher Built-in web application − Neo4j provides a built-in Neo4j Browser web application. Using this, you can create and query your graph data. 35 Node in Neo4j 36 Relationships in Neo4j • Relationships between nodes are a key part of Neo4j. 37 Relationships in Neo4j 38 Twitter and relationships 39 Properties • Both nodes and relationships can have properties • Properties are key-value pairs where the key is a string • Property values can be either a primitive or an array of one primitive type For example String, int and int[] values are valid for properties 40 Properties 41 Paths in Neo4j • A path is one or more nodes with connecting relationships, typically retrieved as a query or traversal result. 42 The Matrix Graph Database 43 Practical Example docker pull neo4j:latest docker run --rm -p 7474:7474 -p 7687:7687 -d -env NEO4J_AUTH=neo4j/test neo4j:latest Database user: neo4j Password: test 44 Practical Example :play start :history :sysinfo :server status :clear :help commands 45 Practice Cypher is Neo4j's graph query language Not case sensitive! MATCH (n) RETURN (n) MATCH (n) RETURN n CREATE (n) MATCH (n) RETURN (n) MATCH (n) DELETE (n) 46 Practice CREATE (d: Device) CREATE (d: Device {ext: ‘ABCD’}) CREATE (d: Device {ext: ‘ABCD’, type: ‘VOIP’}) Delete a single node MATCH (n:Device {ext : ‘ABCD'}) DELETE n Delete all nodes and relationships MATCH (n) DETACH DELETE n 47 Administration SHOW DATABASES SHOW DATABASE system SHOW DEFAULT DATABASE SHOW DATABASES YIELD * RETURN count(*) as count SHOW DATABASES YIELD name, currentStatus, requestedStatus ORDER BY currentStatus WHERE name CONTAINS ‘e’ www.neo4j.com 48 Administration Non-community versions CREATE DATABASE transactions START DATABASE transactions SHOW DATABASE transactions DROP DATABASE transactions Community version only offers a single active database edit /var/lib/neo4j/conf/neo4j.conf dbms.default_database=neo4j www.neo4j.com 49 Practical Example 50 Practical Example 51 Practical Example CREATE (d:Device CREATE (d:Device CREATE (d:Device CREATE (d:Device CREATE (d:Device CREATE (d:Device {ext:'4645'} ); {ext:'3371'} ); {ext:'7201'} ); {ext:'4400'} ); {ext:'5700'} ); {ext:'6200'} ); CREATE (n:Person{name:'abc', age:34}) 52 Practical Example CREATE (d:Device {extn:'5400', type:'VOIP'} ); CREATE (d:Device {extn:'2400', type:'ANALOG'} ); CREATE (f:Faculty CREATE (f:Faculty CREATE (f:Faculty CREATE (f:Faculty CREATE (f:Faculty {name:'Agriculture'} ); {name:'Arts'} ); {name:'Management'} ); {name:'Science'} ); {name:'Vet'} ); 53 Practical Example match (d:Device {ext:'4400'} ) SET d.type='VOIP'; match (d:Device {ext:'4400'} ) REMOVE d.type; match (d:Device {ext:'4645'} ) SET d.type='ANALOG'; match (d:Device {ext:'3371'} ) SET d.type='VOIP'; match (d:Device {ext:'7201'} ) SET d.type='VOIP'; 54 Practical Example match (t:Device{extn:'5400'}) return t match (t:Device{extn:'5400'}) return t.extn match(d:Device) return (d) LIMIT 2; MATCH (d:Device) WHERE d.extn='4400' OR d.extn = '5400' RETURN d; 55 Relationships Relationships (Uni and Bi direction) MATCH (d:Device {ext:'7201'}), (f:Faculty{name:’Vet'}) CREATE (f) -[r:OWNS]-> (d); MATCH (d:Device), (f:Faculty) WHERE d.ext='3371' AND f.name='Arts' CREATE (f) -[r:FAMILY{}]-> (d); 56 Relationships MATCH (d:Device {extn:'7201'}), (f:Faculty{name:'Arts'}) CREATE (f) -[r:OWNS]-> (d); MATCH (d:Device {extn:'5400'}), (f:Faculty{name:'Science'}) CREATE (f) -[r:OWNS]-> (d); MATCH (d:Device {extn:'4400'}), (f:Faculty{name:'Management'}) CREATE (f) -[r:OWNS]-> (d); MATCH (d:Device {extn:'5700'}), (f:Faculty{name:'Management'}) CREATE (f) -[r:OWNS]-> (d); MATCH (d:Device {extn:'6200'}), (f:Faculty{name:'Vet'}) CREATE (f) -[r:OWNS]-> (d); MATCH (d:Device {extn:'2400'}), (f:Faculty{name:'Agriculture'}) CREATE (f) -[r:OWNS]-> (d); 57 Relationships MATCH (d1:Device{extn:'4645'}), (d2:Device{extn:'3371'}) datetime:'2020-07-04T19:32:24'}]-> (d2); MATCH (d1:Device{extn:'3371'}), (d2:Device{extn:'4645'}) datetime:'2020-07-04T19:45:20'}]-> (d2); MATCH (d1:Device{extn:'7201'}), (d2:Device{extn:'3371'}) datetime:'2020-07-05T19:32:24'}]-> (d2); MATCH (d1:Device{extn:'5700'}), (d2:Device{extn:'3371'}) datetime:'2020-07-05T20:32:24'}]-> (d2); MATCH (d1:Device{extn:'6200'}), (d2:Device{extn:'3371'}) datetime:'2020-07-06T19:32:24'}]-> (d2); MATCH (d1:Device{extn:'5400'}), (d2:Device{extn:'3371'}) datetime:'2020-07-07T19:32:24'}]-> (d2); MATCH (d1:Device{extn:'4645'}), (d2:Device{extn:'3371'}) datetime:'2020-07-08T19:32:24'}]-> (d2); MATCH (d1:Device{extn:'4400'}), (d2:Device{extn:'2400'}) datetime:'2020-07-09T19:32:24'}]-> (d2); MATCH (d1:Device{extn:'6200'}), (d2:Device{extn:'7201'}) datetime:'2020-07-10T19:32:24'}]-> (d2); MATCH (d1:Device{extn:'6200'}), (d2:Device{extn:'5400'}) datetime:'2020-07-11T11:32:24'}]-> (d2); CREATE (d1) -[c:CALLS{duration:23, CREATE (d1) -[c:CALLS{duration:45, CREATE (d1) -[c:CALLS{duration:200, CREATE (d1) -[c:CALLS{duration:21, CREATE (d1) -[c:CALLS{duration:10, CREATE (d1) -[c:CALLS{duration:56, CREATE (d1) -[c:CALLS{duration:123, CREATE (d1) -[c:CALLS{duration:500, CREATE (d1) -[c:CALLS{duration:100, CREATE (d1) -[c:CALLS{duration:27, 58 Delete Nodes with Relationships MATCH(d:Device) DELETE (d) Cannot delete node<0>, because it still has relationships. To delete this node, you must first delete its relationships. MATCH(d:Device) DETACH DELETE (d) 59