.NET Database Technologies: Using NoSQL databases NoSQL – “Not only SQL” • Alternatives to the ubiquitous relational database which may be superior in specific application scenarios • Object-oriented databases (ODBMS) They came, they saw, they.... ...didn’t conquer, but they are still around • NoSQL databases The new kids on the block General term applied to a range of different non-relational database systems Largely emerging to meet the needs of large-scale Web 2.0 applications Object-oriented databases • ODBMSs use the same data model as object-oriented programming languages no object-relational impedance mismatch due to a uniform model • An object database combines the features of an object- oriented language and a DBMS (language binding) treat data as objects • object identity • attributes and methods • relationships between objects extensible type hierarchy • inheritance, overloading and overriding as well as customised types ODBMS history • Object Database Manifesto Paper published in 1989 (Atkinson et. al) • Some ODBMS products Early 1990s: Gemstone, Objectivity Late 1990s: Versant, ObjectStore, Poet , Matisse 2000s: db4o, Cache • ODMG (Object Data Management Group) 1993: ODMG 1.0 standard 1997: ODMG 2.0 1999: ODMG 3.0, then ODMG disbanded 2005: ODMG reformed, working towards new standard ODMG • Object Database ManagementGroup (ODMG) founded in 1991 standardisation body including all majorODBMS vendors • Define a standard to increase the portability across different ODBMS products • Mirroring the SQL standard for RDBMS Object Model Object Definition Language (ODL) Object Query Language (OQL) language bindings • C++, Smalltalk and Java bindings Characteristics of ODBMS • Support complex data models with no mapping issues • Tight integration with an object-oriented programming language (persistent programming language) • High performance in suitable application scenarios • Different products scale from small-footprint embedded db (db4o) to large-scale highlyconcurrent systems (e.g. Versant V/OD) Persistence patterns and ODBMS • Some of Fowler’s patterns are specific to the use of a relational database, e.g. Data Mapper Foreign Key Mapping Metadata Mapping Single-table Inheritance, etc. • Some are not specific to the data storage model and are relevant when using an ODBMS, e.g. Identity Map Unit of Work Repository Lazy-Loading db4o • Open-source object-database engine Now owned by Versant Complements their own V/OD product • Can be used in embedded or client-server modes Embed in application simply by including DLLs • Native object database Stores .NET (or Java) objects directly with no special requirements on classes Other ODBMSs (e.g. V/OD) require classes to be marked as persistent through bytecode manipulation and also store class definitions Tight integration with application, but trade-off in limited adhoc querying and reporting Can replicate data to relational database if required IObjectContainer • IObjectContainer interface is implemented by objects which provide access to database IObjectContainer is roughly equivalent to EF ObjectContext Unit of Work pattern if transparent persistence is enabled (see later) • Can access DB in embedded mode (direct file access) or client-server mode (local or remote) IObjectServer instance required in client-server mode • IObjectContainer instances created by factory classes, e.g. Db40Embedded • Queries on IObjectContainer return IObjectSet (except LINQ queries) Viewing data and ad-hoc querying • ObjectManager Enterprise Visual Studio plug-in Browsing and drag-and-drop queries • LINQPad Need to include db4o DLLs and namespaces for stored classes Executes LINQ queries and visualises results db4o query APIs • Query-by-example (QBE) Very limited - no comparisons, ranges, etc. • Simple Object Data Access (SODA) Build query by navigating graph and adding constraints to nodes • Native Queries Expressed completely in programming language Type-safe Optimised to SODA query at runtime if possible • LINQ .NET version, not in Java (obviously) Activation • Objects are stored in DB as an object graph • If db4o configured to cascade-on-activate (eager loading) then retrieving one object could potentially load a large number of related objects • Fixed activation depth limits depth of traversal of graph when retrieving objects Default value is 5 • Can then explicitly activate related objects when needed • Lazy loading can be configured with transparent activation • Classes need to be “instrumented” at load time by running Db4oTool.exe Code injected into assembly so that classes implement IActivatable interface Update depth • Similar considerations apply to updates • Storing an updated object could cause unnecessary updates to related objects • Fixed update depth limits depth of traversal of graph when retrieving objects Default value is 1 • Can configure transparent persistence which allows changes to be tracked Only changed objects are updated in database Behaves like change tracking in, for example, Entity Framework Unit of Work PI? • Stores POCOs without any need for mapping, so yes • Transparent Activation requires that classes implement a specific interface • But this is done at build time so domain classes don’t need any specific code • Has parallels with dynamic proxies in ORMs: Classes are instances of domain classes, which have been modified ‘under the hood’ at build-time Compare with dynamic proxy class which derive from domain classes and are created ‘under the hood’ at run-time Further reading • www.odbms.org Resource portal • Db4o Tutorial included in product download • The Definitive Guide to db4o (Apress) NoSQL databases • New breed of databases that are appearing largely in response to the limitations of existing relational databases • Typically: Support massive data storage (petabyte+) Distribute storage and processing across multiple servers • Contrast in architecture and priorities compared to relational databases • Hence term NoSQL • “Not only SQL” – absence of SQL is not a requirement NoSQL features • Wide variety of implementations, but some features are common to many of them: • Schema-less • Shared-nothing architecture • Elasticity • Sharding and asynchronous replication • BASE, not ACID Basically Available Soft state Eventually consistent MapReduce • Algorithm for dividing a work load into units suitable for parallel processing • Useful for queries against large sets of data: the query can be distributed to 100’s or 1000’s of nodes, each of which works on a subset of the target data • The results are then merged together, ultimately yielding a single “answer” to the original query • Example: get total word count of a large number of documents Map: calculate word count of each document • Each node works on a subset of the overall data set • Results emitted to intermediate storage Reduce: calculate total of intermediate results Brewer’s CAP theorem • Can optimize for only two of three priorities in a distributed database: • Consistency All clients have same view of the data Requires atomicity, transaction isolation • Availability Every request received by a non-failing node must result in a response • Partition Tolerance Partitions happen if certain nodes can’t communicate No set of failures less than total network failure is allowed to cause the system to respond incorrectly Implications of CAP theorem • Any two properties can be achieved • CP If messages between nodes are lost then system waits Possible that no response returned at all No inconsistent data returned to client • CA No partitions, system will always respond and data is consistent • AP Response always returned even if some messages between nodes Different nodes may have different views of the data Implications of CAP theorem • Choose a database whose priorities match the application http://blog.nahurst.com/visual-guide-to-nosql-systems Using a NoSQL database in a .NET application • Application typically makes connection to remote cluster • Some (but not many) NoSQL databases are supported by native .NET clients Handle “mapping” from .NET objects to data model • Many NoSQL databases are accessed through a REST interface Application must construct request and handle response format, e.g. JSON Application can be written in any suitable language • Azure Table Storage is Microsoft’s NoSQL storage for cloud-based applications • However the data is accessed, you need to understand the data model, which will be significantly different from a typical relational database or object model NoSQL database types and examples • Key/value Databases These manage a simple value or row, indexed by a key e.g. Voldemort, Vertica • Big table Databases “a sparse, distributed, persistent multidimensional sorted map” e.g. Google BigTable, Azure Table Storage, Amazon SimpleDB • Document Databases Multi-field documents (or objects) with JSON access e.g. MongoDB, RavenDB (.NET specific), CouchDB • Graph Databases Manage nodes, edges, and properties e.g. Neo4j, sones MongoDB • Scalable, high-performance, open source, document- oriented database • Stores JSON-style (actually BSON) documents with dynamic schema • Replication, high-availability and auto-sharding • Supports document-based queries and map/reduce • Command line tools : mongod – starts server as a service or daemon mongo – client shell • Store documents defined as JSON • Retrieved documents form query displayed as JSON MongoDB and HTTP • Admin console at http://<server name>:28017 • REST interface on http://<server name>:28018 Enabled by starting server with mongod --rest Server responds to RESTful HTTP requests, e.g. • http://127.0.0.1:28017/company/Employee/?filter_Name= Fernando Response is in JSON format Could be consumed by client-side code in Ajax application MongoDB .NET driver • Can access documents as instances of Document class • Represents document as key-value pairs • Or, can serialize POCOs to database format (JSON) • Deserialize database documents to POCOs • Supports LINQ queries • MapReduce queries can be expressed as LINQ queries MongoDB schema design • Collections are essentially named groupings of documents Roughly equivalent to relational database tables • Less "normalization" than a relational schema because there are no server-side joins • Generally, you will want one database collection for each of your top level objects Don’t want a collection for every "class" - instead, embed objects relational document Document example • Save: • Query: http://www.10gen.com/video/mongosv2010/schemadesign MongoDB in C# applications - PI? • Up to a point • Collection class needs Id property of a specific type (MongoDB.Oid) • Object model needs to be designed with document schema in mind Further reading • http://nosql-database.org/ • http://www.nosqlpedia.com/ • http://www.mongodb.org/ • http://www.codeproject.com/KB/database/MongoDBCS.aspx Nice code example for C# and MongoDB