MarkLogic Overview of Key Features © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic / Enterprise NoSQL Database Platform POWERFUL AGILE TRUSTED Native JSON Store Native XML Store Scalable and Elastic Cloud Ready (AWS) Performance at scale LDAP and Kerberos Security Native RDF Triple Store Geospatial Support Hadoop and HDFS REST API Security Certifications Monitoring and Management Full-text Search Flexible Indexes SQL Support Multi-OS Support Configuration Management 24/7 Engineering Support Bitemporal Real-time Alerting Schema Agnostic Samplestack ACID Transactions Flexible Replication Semantic Inference Tiered Storage MarkLogic Content Pump XA Transactions Customizable Backup Server-side JavaScript Fully Transactional Ad-hoc Queries Index Across Data Types Point-in-time Recovery Customizable Failover Atomic Forests Continuous Innovation Cerisent XQE Server 1 MarkLogic Server 3 ACID Transactions Text Based Search Backup and Restore Linux Support Web-based Protocols HTTP and XDBC XQuery 2003 2004 Advanced Search Features Content processing (including PDF, Word, Excel, PPT) HTTP calls Failover Support for Linux, Windows Server, .NET 2005 MarkLogic Server 2 Clustering Role-based security w/BASIC authentication Document Collections Enhanced Search (stemming, thesaurus, wildcard) WebDAV support Document locking Enhanced XDBC support 2006 MarkLogic Server 3.1 Advanced Search Features Wildcard queries Directories Forward Compatibility Support for Sun Solaris XCC MarkLogic Server 4 Alerting Entity Enrichment Geospatial Analytics (co-occur., value lexicons, bucketing) Modular documents Security auditing HA: forest-level failover 2008 2010 MarkLogic Server 4.2 Replication Failover Database Rollback Compartment Security Search Optimizations Search API Information Studio Application Builder MarkLogic Server 5 Complete Enterprise Roadmap Database Replication Multi-statement and distributed transactions Point-in-time recovery Start Hadoop Roadmap Hadoop Connector 2011 2012 MarkLogic 6 Accessibility SQL/BI Java/REST/JSON UDFs/Analytics mlcp Hadoop Distributions HDFS Tech Preview MarkLogic 7 Semantics Foundation Next-gen Infrastructure Support Elasticity Tiered Storage Continue Hadoop Roadmap Run on HDFS 2013 2014 MarkLogic 8 JSON Storage Server-side JavaScript Semantics Bitemporal Samplestack Java Client API Node.js Client API Management API Incremental Backup Flexible Replication Enhanced HTTP Server Enterprise NoSQL Database Platform Flexible Data Model Search and Query Semantics Scalability ACID and Elasticity Transactions Certified Security Hadoop Integration Store and manage JSON, XML, RDF, and Geospatial data with a documentcentric, schemaagnostic database Lightning fast, sophisticated, sub-second search and query across all of your data Store and query linked data as RDF and SPARQL Scale to petabytes of data without over-provisioning or over-spending Governmentgrade, granular, role-based security Make your Hadoop better by connecting it to MarkLogic Avoid data loss, data corruption, and stale reads—even at speed and scale Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document-centric, schema-agnostic database JSON, XML, RDF, Geospatial data, and also large binaries—all stored and managed on a single unified platform Document-centric and schema-agnostic for agility, reducing lost fidelity and functionality from data conversion and brittle ETL Use the data format that makes the most sense, keeping the data in its most readable form SLIDE: 5 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Flexible Data Model Schema-agnostic, structure-aware <report> vehicle near airport </title> <title>Suspicious Suspicious vehicle… <date> 2012-11-12Z </date> <type> observation/surveillance</type> <threat> <type>suspicious activity</type> <category> suspicious vehicle </category> </threat> <location> <lat>37.497075 </lat> <long> -122.363319 </long> </location> van… <description> A blue van with license plate ABC 123 was observed parked behind the airport sign… <triple><subject>IRIID </subject> <predicate>isa </predicate><object>license-plate</object></triple> <triple><subject>IRIID </subject> <predicate>value </predicate><object>ABC 123 </object></triple> </description> SLIDE: 6 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Search and Query Built-in search to find answers in documents, relationships, and metadata In MarkLogic, a search is a query, and a query is search JavaScript XQuery SPARQL Ingest your data as-is and rely on over 30 sophisticated indexes to get better answers from today’s data Lightning fast, sub-second search across hundreds of terabytes of data and billions of documents Full-text Search Rich Query Capability Geospatial Search In-database MapReduce Semantic Search Powerful, agile development providing complex query capability across heterogeneous data Full-featured UX with full-text search, type-ahead suggestions, facets, snippeting, highlighted search terms, proximity boosting, relevance ranking, and language support SLIDE: 7 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Universal Index Term Which vetted reports contain the phrase blue van? Term List “blue” 123, 125, 129, 130, 152, 344, … “van” 123, 125, 126, 129, 130, 152, … “observed” 125, 152, 516, 522, 765, 890, … “blue van” 123, 125, 129, 130, 152, 486, … STEM “observe” 125, 152, 516, 522, 765, 890, … <report> … <report>/<location> Document References 125, 516, 890, … MarkLogic indexes… Words … Phrases <threat>/<category> … Stemmed words and phrases <type>suspicious activity</type> … <date>2012-11-12Z</date> … Structure Collection:Vetted … Words and phrases in the context of structure Role:Analyst + Action:Read … … … Values … … Collections … … Security Permissions SLIDE: 8 © COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Which vetted reports containing the phrase blue van were submitted before 2014? Range Index Term Term List “blue” 123, 125, 129, 130, 152, 344, … “van” 123, 125, 126, 129, 130, 152, … “observed” 125, 152, 516, 522, 765, 890, … “blue van” 123, 125, 129, 130, 152, 486, … STEM “observe” 125, 152, 516, 522, 765, 890, … <report> … <report>/<location> … <threat>/<category> … <type>suspicious activity</type> … <date>2012-11-12Z</date> … Collection:Vetted … Role:Analyst + Action:Read … … … … … … … SLIDE: 9 Document References 125, 516, 890, … Range indexes map document IDs to values, and vice-versa in a compact in-memory representation. © COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Geospatial Triple IndexIndex Term Term List “blue” 123, 125, 129, 130, 152, 344, … “van” 123, 125, 126, 129, 130, 152, … “observed” 125, 152, 516, 522, 765, 890, … “blue van” 123, 125, 129, 130, 152, 486, … STEM “observe” 125, 152, 516, 522, 765, 890, … <report> … <report>/<location> … <threat>/<category> … <type>suspicious activity</type> … <date>2012-11-12Z</date> … Collection:Vetted … Role:Analyst + Action:Read … … … … … … … SLIDE: 10 Which vetted reports about a blue van from before 2014 referthis to alocation location near airport? 2013 with refer tothe partial plate ABC? Document References 125, 516, 890, … The Geospatial index is like a 2D range index, with built-in query support for point, box, circle, and complex polygons. © COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Which vetted reports about a blue van from before 2013 with this location refer to partial plate ABC? Triple Index Term List 123, 125, 129, 130, 152, 344, … 123, 125, 126, 129, 130, 152, … Document References 125, 516, 890, … 125, 152, 516, 522, 765, 890, … 123, 125, 129, 130, 152, 486, … 125, 152, 516, 522, 765, 890, … … … … The Triple index is an index of “facts” expressed as Semantic triples. It can efficiently query and join billions of “linked data” triples. … … … … … … … SLIDE: 11 © COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics Enterprise triple store, document store, and database combined Store and query billions of facts and relationships; infer new facts Facts and relationships provide context for better search Flexible data modeling—integrate and link data from different sources Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces SLIDE: 12 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Scalability and Elasticity Massive enterprise scalability and elasticity Scale horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documents E-NODE E-NODE Process thousands of transactions per second with distributed XA transactions Start small and scale up or down to meet capacity D-NODE D-NODE D-NODE and performance demands without overprovisioning or over-spending Even better with Tiered Storage SLIDE: 13 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. ACID Transactions Don’t settle for a BASE-ic database Reads and writes are durably logged to disk, and strongly isolated from other transactions Prevents data corruption, stale reads, and inconsistent data—common problems with databases that settle for eventual consistency—and all of which are unacceptable No performance drop-offs at scale. Production applications run tens of thousands of very complex transactions per second for tens of thousands of users Accomplished using MVCC (multi-version concurrency control) SLIDE: 14 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. ACID Transactions Implemented Using MVCC /articles/doc1.xml /articles/doc1.xml Document Document Title Title Author Last Section Metadata Last Section Section Section 423 ∞ Creation Timestamp SLIDE: 15 Section Metadata First First Section Author Section 628 ∞ Section 628 ∞ Year Section Section Section Section ∞ MVCC Benefits: ACID transactions Zero-latency search indexing High throughput Lock-free reads Serial writes Point-in-time query Fast database rollback Deletion Timestamp © COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Government-Grade Security Certified, granular security for modern data governance Certified security – Higher security Data Governance With MarkLogic certifications than any other NoSQL database, carrying a Common Criteria Security Certification and being certified to run in classified government systems Granular Security – Role Based Security Retention Privacy Access Control (RBAC) at the document level, and can also employ other models for cell-level security Continuity Provenance Compliance SLIDE: 16 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic and Hadoop Make Hadoop better by connecting it to MarkLogic Handle both operational and analytical workloads Run MarkLogic directly on HDFS on data staged in HDFS Leverage the economics of HDFS by using it as a storage tier for archival data Connect MarkLogic to Hadoop to run large-scale MapReduce jobs for ETL, analytics, or enrichment SLIDE: 17 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MORE FEATURES Tiered Storage Move data to different tiers based on cost and performance trade-offs Use a fluid mix of flash storage, traditional local or shared disk storage, HDFS, or Amazon cloud storage Migrate data automatically between storage tiers without any ETL, additional software, or expensive infrastructure changes Optimize data availability while reducing the costs of storage Manage data across the information lifecycle SLIDE: 19 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. High Availability and Disaster Recovery Keep your data continuously safeguarded and available HA using shared-disk or local-disk failover – Shared-nothing architecture so there is no single point of failure – ACID transactions ensure full redundancy and consistency DR using customizable database replication – Point-in-time recovery with journal archiving – Incremental Backup consumes less storage and can be completed quickly Database Replication Full database replication with journal frames that enable point-in-time disaster recovery SLIDE: 20 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Cloud Deployment Get started quickly on AWS for only 99 cents an hour Build a single cluster in minutes on Amazon Web Services using prepackaged AMIs Easily and quickly scale up or back down Blend on-premise, virtualized, and AWS nodes in a single cluster, and scale out without downtime Flexibility to move licenses across your environment as changes occur SLIDE: 21 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MARKLOGIC 8 FEATURES MarkLogic 8 / More Powerful, Easier to Use Developer Experience Semantics MarkLogic 8 is more powerful than ever, but remarkably easy to use Enterprise triple store, document store, database combined Bitemporal JSON Unified indexing and query for today’s web and SOA data Node.js Client API Java Client API Server-side JavaScript Enterprise NoSQL database for Node.js applications NoSQL agility in a pure Java interface JavaScript runtime inside MarkLogic using Google’s V8 Track information along two dimensions of time JSON Unified indexing and query for today’s web and SOA data Speed up development with powerful built-in search, transformation, and alerting capabilities designed for JSON Reduce lost fidelity and functionality from data model translations and brittle ETL Simplify architecture with data, metadata, and relationships managed consistently and securely together Ease modern, end-to-end JavaScript 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { "_id": 1, "name": { "MarkLogic" }, "supports" : [ { "datatype": "XML", "year": 2003 }, { "datatype": "JSON", "year": 2014 } ] } development SLIDE: 24 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Node.js Client API Enterprise NoSQL database for Node.js applications Focus on application features rather than plumbing with out-of-the-box search, transactions, aggregates, alerting, geospatial, and more Move faster to production with proven reliability at scale Maximize performance and flexibility—bringing code to the data Enable modern end-to-end JavaScript development SLIDE: 25 Always open source on GitHub Participate. Contribute. Fork it. © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Java Client API NoSQL agility in a pure Java interface Faster development and less custom code with out-of-the-box data management, search, and alerting Pure Java query builder and conveniences for POJOs, JSON, XML, and binary I/O Built-in extensibility for moving performancecritical code to the database Always open source and developed on GitHub Participate. Contribute. Fork it. SLIDE: 26 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Server-side JavaScript JavaScript runtime inside MarkLogic using Google’s V8 Run code near the data for unparalleled Front End power, efficiency Build applications faster from a growing pool of skills, tools Middle Tier Reduce risk with proven performance and reliability Decrease brittle ETL and lost fidelity and + Database Layer functionality from JSON data conversions Pair with Node.js to ease full-stack JavaScript development SLIDE: 27 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Samplestack An end-to-end three-tiered application in Java and Node.js Encapsulates best practices and introduces key MarkLogic concepts Use sample code as a model for building applications more quickly Front End Middle Tier Modern technology stack shows where MarkLogic fits in your environment Database Layer Participate. Contribute. Fork it. SLIDE: 28 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics Enterprise triple store, document store, database combined Store and query billions of facts and relationships; infer new facts Facts and relationships provide context for better search Flexible data modeling—integrate and link data from different sources Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces Even better with Built-in Search and Bitemporal – Triples, documents, and data combined SLIDE: 29 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Bitemporal Timing is everything SLIDE: 30 Rewind the information “as it actually was” in combination with “as it was recorded” at some point in time Provides increased insight into your business and mission Capture evolving schema as the shape of the data changes with changing time, a capability that has prevented relational bitemporal offerings from being widely adopted Critical for anyone in regulated industries Even better because of Tiered Storage and Semantics Valid Time EVENT 3 EVENT 2 EVENT 2 EVENT 1 System Time Valid Time – Real-world time, information “as it actually was” System Time – Time it was recorded to the database © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Management API REST-based API to manage all MarkLogic capabilities SLIDE: 31 Increase efficiency and agility by automating timeconsuming repetitive tasks across production, testing and development Reduce setup time and admin error by orchestrating multi-step configurations and deployments Fit more seamlessly into IT environments by using REST interfaces unlike CLI or proprietary APIs Perform automated testing and monitor performance using market tools that support REST Even better with Client REST API, Elasticity © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Incremental Backup Faster backups while using less storage Store only changes since the previous full or incremental backup Consume less storage for backup copies Reduce backup window Improve availability with multiple daily backups Work with Log Archiving to enable fine-grained point-in-time recovery INCREMENTAL BACKUP (differential) FULL SUNDAY SLIDE: 32 FULL MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Flexible Replication Customizable information sharing between systems SLIDE: 33 Enable content collaboration across numerous systems Support directly connected or mobile users Provide data that users need using simple configurable parameters or queries Ensure data consistency and security with simple workflows Even better with Bitemporal and Management API © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Enhanced HTTP Server Simple and fast client-server interactions out-of-the-box SLIDE: 34 Use a single interface when employing the REST API, custom HTTP, XCC/XDBC to connect to any database Delivers ease-of-use by not having to create extra ports Simplifies the out-of-the-box interaction and can improve the performance of client/server Provides an improved and more efficient developer experience with MarkLogic © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. APPENDIX MarkLogic / Enterprise NoSQL Database Platform POWERFUL AGILE TRUSTED Better answers from today’s data Adaptive to every environment Hardened, proven platform MarkLogic is built to find answers in documents, relationships, and metadata MarkLogic runs well everywhere, while preserving the option to change hardware, data, and scale later MarkLogic has a proven track record of performance under all enterprise conditions Simpler data integration Uncompromised data resiliency MarkLogic accelerates and simplifies data sharing across silos, cutting down on ETL and making agile development possible MarkLogic will keep your data safe and whole—no matter what happens in your application or at your data center The intelligent data layer An intelligent data layer powers intelligent applications—and makes them faster and more flexible than any alternative // POWERFUL / Deliver more value, build better apps Native JSON Store Store and manage data natively as JSON documents, speeding up development and reducing data transformation with a simplified architecture for end-to-end JavaScript development. Native XML Store Store and manage data natively as XML documents, a hierarchical selfdescribing data type that is ideal for a wide variety of applications. Native RDF Triple Store Geospatial Support Store RDF triples and query them using SPARQL—providing context to your data and better search with a database that can handle a combination of documents, data, and triples. Store geospatial data such as GML, KML, and GeoRSS and do complex queries on the data or in combination with other data types. Also integrate with ESRI ArcGIS and Google Maps for visualization. Full-text Search Flexible Indexes Bitemporal Real-time Alerting Built-in, lightning fast search and query capabilities across hundreds of billions of documents. And, fullfeatured UX with type-ahead suggestions, facets, snippeting, relevance ranking, and language support. Rely on over 30 sophisticated, composable indexes including a universal index, range index, geospatial index, and triple index—all designed so that developers can ask harder questions and get faster responses. Handle historical data along two different timelines, making it possible to rewind the information “as it actually was” in combination with “as it was recorded” at some point in time. Create an unlimited number of realtime alerts by email or text using the alerting API and reverse indexes. Whenever a document is loaded that matches a specific query, you’ll know. Semantic Inference Tiered Storage Server-side JavaScript Work with new data that didn’t exist before. For example, if John lives in London, and London is in England, then MarkLogic can infer that “John lives in England” and then add that new fact to your semantic search. Store and manage data in different tiers based on cost and performance trade-offs, and easily migrate between tiers without any ETL, additional software, or expensive infrastructure changes. Live in JavaScript. Run JavaScript near the data for unparalleled power and efficiency with a high performance JavaScript runtime inside MarkLogic using Google’s V8. Run complex distributed transactions across multiple documents and Fully collections with no performance dropTransactional offs at scale. Production applications run tens of thousands of transactions per second for tens of thousands of users. // AGILE / Prepare for and respond to change Handle petabytes of data without Scalable over-provisioning, over-spending, or and Elastic experiencing downtime, SQL Support inconsistency, or risk of data loss. Cloud Ready (AWS) Use MarkLogic’s cloud templates to get up and running quickly on AWS or other cloud environments, starting with a three node cluster or a large cluster with over a hundred nodes. Hadoop and HDFS Make Hadoop better by connecting it to MarkLogic and using it as part of an infrastructure to handle both operational and analytic workloads. REST API Configure and administer MarkLogic with a single REST-based API. This provides more programmatic control than ever before—giving DBAs the power and flexibility necessary to run a modern data center. Multi-OS Support Schema Agnostic Samplestack Use a relational SQL data model within MarkLogic, connecting to SQLbased tools using the ODBC driver, or execute SQL commands against relational databases using the MLSAM open-source XQuery library. Run MarkLogic on Windows, Linux, Solaris, OS X. MarkLogic runs easily and is easy to setup in your environment, whether in the cloud, virtualized, or on premises. Only use schema when you need it. Ingest all your data as-is, whether structured or unstructured, using the NoSQL document model rather than being forced to use a predefined schema. Get going fast on MarkLogic with Samplestack, an end-to-end three tiered sample application designed to show developers how to implement a reference architecture using key MarkLogic concepts and sample code. MarkLogic Content Pump XA Transactions Ad-hoc Queries Index Across Data Types MLCP makes it easy to quickly import or export documents and metadata from MarkLogic, or to copy from one database to another using a command-line tool. Run distribute transactions across a cluster using the XA (eXtended Architecture) standard, which ensures ACID properties for global transaction processing. Don’t plan your queries in advance of ingesting your data. MarkLogic is designed for search and discovery so that you can run any query at any-time and get real-time results. Use multiple indexes in concert across multiple data types—giving you the power to search and query all of your data. // TRUSTED / Enterprise-ready for mission-critical uses Performance at scale LDAP and Kerberos Security Security Certifications Monitoring and Management Scales easily to handle hundreds of terabytes using shared-nothing architecture in which data partitions are completely independent of each other and can act independently. Use third party authentication from LDAP or Kerberos, making the most secure NoSQL database easier to manage. Secure your data with government-grade security. MarkLogic has certified, granular security for modern data governance and to handle the increased complexity of today’s cyber threats. Use the Management API for cluster management, process automation, access controls, database cloning, audit trails, and connections to third-party interfaces. Configuration Management 24/7 Engineering Support ACID Transactions Flexible Replication View and manage the configuration settings for MarkLogic databases, forests, application servers, groups or hosts—and easily propagate changes across the entire cluster. Rely on support from the 24/7, allengineer support staff to ensure you get answers fast, or just want some friendly tips on saving a few milliseconds on performance. Don’t settle for a BASE-ic database. Use ACID transactions to ensure you don’t run the risk of encountering data corruption, stale reads, and inconsistent data—all of which are unacceptable. Enable customizable information sharing between systems, allowing for the easy and secure distribution of portions of data even across disconnected, intermittent, and latent networks. Customizable Backup Customizable Failover Point-in-time Recovery Atomic Forests Restore the database quickly with minimal downtime, relying on full and consistent backups, hot configuration changes, and automatic index optimization without shutting down the system. Have confidence that your data is always available, reducing risk and avoiding interruptions with automated local- or shared-disk failover made possible with sharednothing architecture. Rollback to a specified point in time by replaying journal archives, an additional feature to ensure disaster recovery and easy of management. Manage data in collections of documents similar to partitions, called forests, that exist independently and enable scalability and elasticity, rebalancing, efficient operations, and easier data governance.