NoSQL

advertisement
.NET Database Technologies:
Using NoSQL databases
NoSQL – “Not only SQL”
• Alternatives to the ubiquitous relational database which
may be superior in specific application scenarios
• Object-oriented databases (ODBMS)

They came, they saw, they....

...didn’t conquer, but they are still around
• NoSQL databases

The new kids on the block

General term applied to a range of different non-relational
database systems

Largely emerging to meet the needs of large-scale Web 2.0
applications
Object-oriented databases
• ODBMSs use the same data model as object-oriented
programming languages

no object-relational impedance mismatch due to a uniform
model
• An object database combines the features of an object-
oriented language and a DBMS (language binding)


treat data as objects
•
object identity
•
attributes and methods
•
relationships between objects
extensible type hierarchy
•
inheritance, overloading and overriding as well as
customised types
ODBMS history
• Object Database Manifesto

Paper published in 1989 (Atkinson et. al)
• Some ODBMS products

Early 1990s: Gemstone, Objectivity

Late 1990s: Versant, ObjectStore, Poet , Matisse

2000s: db4o, Cache
• ODMG (Object Data Management Group)

1993: ODMG 1.0 standard

1997: ODMG 2.0

1999: ODMG 3.0, then ODMG disbanded

2005: ODMG reformed, working towards new standard
ODMG
• Object Database ManagementGroup (ODMG) founded in
1991

standardisation body including all majorODBMS vendors
• Define a standard to increase the portability across
different ODBMS products
• Mirroring the SQL standard for RDBMS

Object Model

Object Definition Language (ODL)

Object Query Language (OQL)

language bindings
•
C++, Smalltalk and Java bindings
Characteristics of ODBMS
• Support complex data models with no mapping issues
• Tight integration with an object-oriented programming
language (persistent programming language)
• High performance in suitable
application scenarios
• Different products scale from
small-footprint embedded db
(db4o) to large-scale highlyconcurrent systems (e.g.
Versant V/OD)
Persistence patterns and ODBMS
• Some of Fowler’s patterns are specific to the use of a
relational database, e.g.

Data Mapper

Foreign Key Mapping

Metadata Mapping

Single-table Inheritance, etc.
• Some are not specific to the data storage model and are
relevant when using an ODBMS, e.g.

Identity Map

Unit of Work

Repository

Lazy-Loading
db4o
• Open-source object-database engine

Now owned by Versant

Complements their own V/OD product
• Can be used in embedded or client-server modes

Embed in application simply by including DLLs
• Native object database

Stores .NET (or Java) objects directly with no special
requirements on classes

Other ODBMSs (e.g. V/OD) require classes to be marked as
persistent through bytecode manipulation and also store class
definitions

Tight integration with application, but trade-off in limited adhoc querying and reporting

Can replicate data to relational database if required
IObjectContainer
• IObjectContainer interface is implemented by objects
which provide access to database

IObjectContainer is roughly equivalent to EF ObjectContext

Unit of Work pattern if transparent persistence is enabled (see
later)
• Can access DB in embedded mode (direct file access) or
client-server mode (local or remote)

IObjectServer instance required in client-server mode
• IObjectContainer instances created by factory classes, e.g.
Db40Embedded
• Queries on IObjectContainer return IObjectSet (except
LINQ queries)
Viewing data and ad-hoc querying
• ObjectManager Enterprise

Visual Studio plug-in

Browsing and drag-and-drop queries
• LINQPad

Need to include db4o DLLs and namespaces for stored classes

Executes LINQ queries and visualises results
db4o query APIs
• Query-by-example (QBE)

Very limited - no comparisons, ranges, etc.
• Simple Object Data Access (SODA)

Build query by navigating graph and adding constraints to
nodes
• Native Queries

Expressed completely in programming language

Type-safe

Optimised to SODA query at runtime if possible
• LINQ

.NET version, not in Java (obviously)
Activation
• Objects are stored in DB as an object graph
• If db4o configured to cascade-on-activate (eager loading)
then retrieving one object could potentially load a large
number of related objects
• Fixed activation depth limits depth of traversal of graph
when retrieving objects

Default value is 5
• Can then explicitly activate related objects when needed
• Lazy loading can be configured with transparent
activation
• Classes need to be “instrumented” at load time by running
Db4oTool.exe

Code injected into assembly so that classes implement
IActivatable interface
Update depth
• Similar considerations apply to updates
• Storing an updated object could cause unnecessary
updates to related objects
• Fixed update depth limits depth of traversal of graph
when retrieving objects

Default value is 1
• Can configure transparent persistence which allows
changes to be tracked

Only changed objects are updated in database

Behaves like change tracking in, for example, Entity
Framework

Unit of Work
PI?
• Stores POCOs without any need for mapping, so yes
• Transparent Activation requires that classes implement a
specific interface
• But this is done at build time so domain classes don’t need
any specific code
• Has parallels with dynamic proxies in ORMs:

Classes are instances of domain classes, which have been
modified ‘under the hood’ at build-time

Compare with dynamic proxy class which derive from domain
classes and are created ‘under the hood’ at run-time
Further reading
• www.odbms.org

Resource portal
• Db4o Tutorial

included in product download
• The Definitive Guide to db4o (Apress)
NoSQL databases
• New breed of databases that are appearing largely in
response to the limitations of existing relational databases
• Typically:

Support massive data storage (petabyte+)

Distribute storage and processing across multiple servers
• Contrast in architecture and priorities compared to
relational databases
• Hence term NoSQL
• “Not only SQL” – absence of SQL is not a requirement
NoSQL features
• Wide variety of implementations, but some features are
common to many of them:
• Schema-less
• Shared-nothing architecture
• Elasticity
• Sharding and asynchronous replication
• BASE, not ACID

Basically Available

Soft state

Eventually consistent
MapReduce
• Algorithm for dividing a work load into units suitable for
parallel processing
• Useful for queries against large sets of data: the query can
be distributed to 100’s or 1000’s of nodes, each of which
works on a subset of the target data
• The results are then merged together, ultimately yielding
a single “answer” to the original query
• Example: get total word count of a large number of
documents


Map: calculate word count of each document
•
Each node works on a subset of the overall data set
•
Results emitted to intermediate storage
Reduce: calculate total of intermediate results
Brewer’s CAP theorem
• Can optimize for only two of three priorities in a
distributed database:
• Consistency

All clients have same view of the data

Requires atomicity, transaction isolation
• Availability

Every request received by a non-failing node must result in a
response
• Partition Tolerance

Partitions happen if certain nodes can’t communicate

No set of failures less than total network failure is allowed to
cause the system to respond incorrectly
Implications of CAP theorem
• Any two properties can be achieved
• CP

If messages between nodes are lost then system waits

Possible that no response returned at all

No inconsistent data returned to client
• CA

No partitions, system will always respond and data is
consistent
• AP

Response always returned even if some messages between
nodes

Different nodes may have different views of the data
Implications of CAP theorem
• Choose a database whose priorities match the application
http://blog.nahurst.com/visual-guide-to-nosql-systems
Using a NoSQL database in a .NET application
• Application typically makes connection to remote cluster
• Some (but not many) NoSQL databases are supported by
native .NET clients

Handle “mapping” from .NET objects to data model
• Many NoSQL databases are accessed through a REST
interface

Application must construct request and handle response
format, e.g. JSON

Application can be written in any suitable language
• Azure Table Storage is Microsoft’s NoSQL storage for
cloud-based applications
• However the data is accessed, you need to understand the
data model, which will be significantly different from a
typical relational database or object model
NoSQL database types and examples
• Key/value Databases

These manage a simple value or row, indexed by a key

e.g. Voldemort, Vertica
• Big table Databases

“a sparse, distributed, persistent multidimensional sorted map”

e.g. Google BigTable, Azure Table Storage, Amazon SimpleDB
• Document Databases

Multi-field documents (or objects) with JSON access

e.g. MongoDB, RavenDB (.NET specific), CouchDB
• Graph Databases

Manage nodes, edges, and properties

e.g. Neo4j, sones
MongoDB
• Scalable, high-performance, open source, document-
oriented database
• Stores JSON-style (actually BSON) documents with
dynamic schema
• Replication, high-availability and auto-sharding
• Supports document-based queries and map/reduce
• Command line tools :

mongod – starts server as a service or daemon

mongo – client shell
•
Store documents defined as JSON
•
Retrieved documents form query displayed as JSON
MongoDB and HTTP
• Admin console at http://<server name>:28017
• REST interface on http://<server name>:28018

Enabled by starting server with mongod --rest

Server responds to RESTful HTTP requests, e.g.
•
http://127.0.0.1:28017/company/Employee/?filter_Name=
Fernando

Response is in JSON format

Could be consumed by client-side code in Ajax application
MongoDB .NET driver
• Can access documents as instances of Document class
• Represents document as key-value pairs
• Or, can serialize POCOs to database format (JSON)
• Deserialize database documents to POCOs
• Supports LINQ queries
• MapReduce queries can be expressed as LINQ queries
MongoDB schema design
• Collections are essentially named groupings of documents

Roughly equivalent to relational database tables
• Less "normalization" than a relational schema because there
are no server-side joins
• Generally, you will want one database collection for each of
your top level objects

Don’t want a collection for every "class" - instead, embed objects
relational
document
Document example
• Save:
• Query:
http://www.10gen.com/video/mongosv2010/schemadesign
MongoDB in C# applications - PI?
• Up to a point
• Collection class needs Id property of a specific type
(MongoDB.Oid)
• Object model needs to be designed with document schema
in mind
Further reading
• http://nosql-database.org/
• http://www.nosqlpedia.com/
• http://www.mongodb.org/
• http://www.codeproject.com/KB/database/MongoDBCS.aspx

Nice code example for C# and MongoDB
Download