MongoDB Introduction © 2014 - Zoran Maksimovic www.agile-code.com MongoDB is a scalable, highperformance, open source, schema-free, document-oriented database © 2014 - Zoran Maksimovic www.agile-code.com History • 2007 - First developed (by 10gen) • 2009 - Become Open Source • 2010 - Considered production ready (v 1.4 > ) • 2013 - MongoDB Closes $150 Million in Funding • 2014 - Latest stable version (v 2.6) • Today- More than $231 million in total investment since 2007 • MongoDB inc. valuated $1.2B. © 2014 - Zoran Maksimovic www.agile-code.com © 2014 - Zoran Maksimovic www.agile-code.com NoSQL Breakdown • NoSQL encompasses a wide variety of different database technologies and were developed in response to a rise in the volume of data • Document databases pair each key with a complex data structure known as a document (MongoDB, Couchbase Server, CouchDB ) • Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value (DynamoDB, Windows Azure Table Storage, Riak, Redis, LevelDB , Dynomite ) • Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows. • Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB. © 2014 - Zoran Maksimovic www.agile-code.com NoSQL made by big vendors • Oracle NoSQL Database (Key-Value store) • Microsoft Azure Table Storage (Key-Value store) • Google: BigTable (proprietary) • Google: LevelDB (Open Source key-value store) • Amazon: SimpleDB (Wide Column store) • Amazon: DynamoDB (Key-Value store) • Apache: HBase, Riak , … • Facebook: Cassandra (Wide column store) © 2014 - Zoran Maksimovic www.agile-code.com MongoDB in a nutshell • Document-Oriented Storage » JSON-style documents with dynamic schemas offer simplicity and power. • Full Index Support »Index on any attribute, just like you're used to. • Replication & High Availability » Mirror across LANs and WANs for scale and peace of mind. • Auto-Sharding » Scale horizontally without compromising functionality. • Querying » Rich, document-based queries. • Fast In-Place Updates »Atomic modifiers for contention-free performance. • Map/Reduce »Flexible aggregation and data processing. • GridFS »Store files of any size without complicating your stack. • MongoDB Management Service »Monitoring and backup designed for MongoDB. • Professional Support by MongoDB »Enterprise class support, training, and consulting available. © 2014 - Zoran Maksimovic www.agile-code.com MongoDB is a Document oriented database • Think of “documents” as database records. No Schema! • Documents are basically just JSON objects that Mongo stores in binary (BSON) format © 2014 - Zoran Maksimovic www.agile-code.com MongoDB database structure © 2014 - Zoran Maksimovic www.agile-code.com Embedded Data Model When to use: • “contains” relationships between entities. • one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are viewed in the context of the “one” or parent documents. • Retrieving data in one query • Data redundancy. © 2014 - Zoran Maksimovic www.agile-code.com Document oriented database – Normalized data model When to use: • When embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication. • To represent more complex many-to-many relationships. • To model large hierarchical data sets. • Multiple queries! May, 14 2014 Zoran Maksimovic www.agile-code.com Indexing • All indexes in MongoDB are B-Tree indexes • Index Types: • • • • • • • • Single field index Compound Index: more than one field in the collection Multikey index: index on array fields Geospatial index and queries. Text index: Index TTL index: (Time to live) index will contain entities for a limited time. Unique index: the entry in the field has to b unique. Sparse index: stores an index entry only for entities with the given field. © 2014 - Zoran Maksimovic www.agile-code.com Security • Authentication: • • • • MongoDB’s default UserName/Password authentication x509 certificate authentication LDAP proxy authentication Kerberos authentication • Authorization • Role based access control © 2014 - Zoran Maksimovic www.agile-code.com Replication • Replication provides redundancy and increases data high availability © 2014 - Zoran Maksimovic www.agile-code.com Sharding (Horizontal scaling) • Sharding is a method for storing data across multiple machines • When HDD, CPU or RAM limits are reached. • Vertical Scaling vs Horizontal Scaling. • Range based vs Hash based sharding © 2014 - Zoran Maksimovic www.agile-code.com How to access MongoDB? Drivers: http://docs.mongodb.org/ecosystem/drivers/downloads Administration interfaces: http://docs.mongodb.org/ecosystem/tools/administration-interfaces © 2014 - Zoran Maksimovic www.agile-code.com C# code example var connectionString = "mongodb://localhost"; var client = new MongoClient(connectionString); var server = client.GetServer(); public class Entity { public ObjectId Id { get; set; } public string Name { get; set; } } var database = server.GetDatabase("test"); var collection = database.GetCollection<Entity>("entities"); { //insert a new entity _id: “13098098”, Name: “Tom” var entity = new Entity { Name = "Tom" }; collection.Insert(entity); } var id = entity.Id; //Retrieve var query = Query<Entity>.EQ(e => e.Id, id); entity = collection.FindOne(query); //Save (Update) -> Sends the full content of the entity to be updated. { _id: “13098098”, Name: “Nick” entity.Name = “Nick"; collection.Save(entity); } //Update -> Sends partial content of the entity to be updated. { _id: “13098098”, Name: “Nick” var update = Update<Entity>.Set(e => e.Name, "Harry"); collection.Update(query, update); } //Deleting the entity collection.Remove(query); © 2014 - Zoran Maksimovic www.agile-code.com Some of the MongoDB Shell methods • • • • db.inventory.find( { type: "snacks" } ) db.inventory.find( { type: 'food', price: { $lt: 9.95 } } ) db.inventory.insert ( { _id: 10, type: "misc", item: "card", qty: 15 } ) db.inventory.find( { type: 'food' } ).explain() { "cursor": "BtreeCursor type_1", "isMultiKey": false, "n": 5, "nscannedObjects": 5, "nscanned": 5, "nscannedObjectsAllPlans": 5, "nscannedAllPlans": 5, "scanAndOrder": false, "indexOnly": false, "nYields": 0, "nChunkSkips": 0, "millis" : 0, "indexBounds": { "type" : [ [ "food", "food" ] ] }, "server": "mongodbo0.example.net:27017" } © 2014 - Zoran Maksimovic www.agile-code.com What is missing (from the RDBMS perspective) • No JOINS support • No complex transaction support • No constrains support (have to be implemented at the application level) © 2014 - Zoran Maksimovic www.agile-code.com Where/When to use? • A main drivers: • Big amount of data (Twitter: ~12TB of data per day!) • Develop more easily (according to surveys)! impedance mismatch problem! • In general: • Content Management and Delivery: serve content, as well as the associated metadata (attachments, images, binary) • Big Data too diverse, fast-changing, or massive… These include a wide variety of apps such as genomics, clickstream analysis, customer Sentiment analysis, log data collection etc… • Analytics and Reporting (data warehouse) • Market Data Management © 2014 - Zoran Maksimovic www.agile-code.com Problems • Maturity!!! • Skillset? • Organizational change? • What’s about the future? © 2014 - Zoran Maksimovic www.agile-code.com Q&A © 2014 - Zoran Maksimovic www.agile-code.com