fundamentally

advertisement
WHAT THE MARKET-LEADING DBMS
VENDORS DON’T WANT YOU TO KNOW
Disruption is gathering steam
Curt Monash

Analyst since 1981




Covered DBMS since the pre-relational days
Also analytics, search, etc.
Own firm since 1987
Publicly available research



Blogs, including DBMS2 (www.dbms2.com -- the
source for most of this talk)
Feed at www.monash.com/blogs.html
White papers and more at www.monash.com
Database diversity

Mike Stonebraker, PhD


Curt Monash, PhD



“One size doesn’t fit all”
“Horses for courses”
“Database diversity”
Mike and Curt

The world needs 9 to 11 different kinds of data
management software
The case for grand integrated DBMS





Theoretical relational model has great
advantages
Actual relational DBMS are versatile and
modular
Software developers have economies of scale
Vendor consolidation theoretically saves
effort and money
So does database consolidation
The case for database diversity



Different kinds of data require
fundamentally different kinds of data
management software
Putting all that together in one system is
extremely hard
Nobody has ever done it well
Application and use cases






High-end e-commerce
100-terabyte analytics
High-volume call center
Media-heavy web startup
Simple departmental application
General enterprise or SaaS app

End-user or ISV
Data management distinctions

Fundamental



Data manipulation language
Data access method
Practical




Type of data
Type of hardware
Administrative burden
Performance stresses and metrics
Very practical
$
Major components of DBMS cost

License and maintenance


Hardware, power, facilities


Mainly for VLDB analytics
Installation and ongoing administration


Especially maintenance
Time-to-benefit is a factor too
Programming

Sometimes a differentiator
11 kinds of data management software
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
High-end OLTP/general-purpose DBMS
Mid-range OLTP/general-purpose DBMS
Row-based analytic RDBMS
Column- or array-based analytic RDBMS
Text search engines
XML and OO DBMS (but these may merge with search)
RDF and other graphical DBMS (but these may merge with
relational)
Event/stream processing engines (aka CEP)
Embedded DBMS for devices
Sub-DBMS file managers (e.g. MapReduce/Hadoop)
Science DBMS
High-end OLTP/general-purpose DBMS



Oracle, DB2, MS SQL Server, et al.
Amazing throughput and scale-up
Bullet-proofing




24/7
Security certifications
Datatype extensibility
Expensive, expensive, expensive
Mid-range OLTP/general-purpose DBMS

Three main groups




Some are comparable to (or better than) the
systems that ran the world in the 1990s


Crippled high-end (“Express” editions)
ISV/VAR-focused (Progress, several nonrelational)
Open source-based (Postgres, MySQL)
What does the Postgres family still lack?
Generally inexpensive
Row-based analytic RDBMS

Data warehouses should be in separate
instances




But that’s not enough
Sequential vs. random reads
MPP vs. SMP
Teradata, Netezza, DATAllegro
Column- or array-based analytic RDBMS

Retrieving whole rows carries penalties



Columnar is better


I/O
Optimization
But not in all use cases
MOLAP may be superceded
Text search engines

“85% of all information is in text” …


There really are a lot of words out there


And search interfaces are hugely important
Text search has its own data access methods


… and 16.9% of all statistics are made up out of thin
air
May play more nicely with columnar than row-based
RDBMS
Watch integrations with other analytic
datatypes


Attivio (relational, a little XML)
Mark Logic (a lot of XML)
XML and OO DBMS

Reasons for logical XML structures





Native XML data access methods


Schema flexibility
Dressed-up text
XML is the transport format, and it’s too complex
to unpack
The data came from neither an RDMS nor text
store in the first place
Like text and object
So far mainly in niches
RDF and other graphical DBMS




“Semantic web” is overhyped …
… but the world DOES need ontology
management systems
Much depends on path length
Analytic RDBMS may do the job
Event/stream processing engines

Design point = super-low latency …



… but there are other applications
Data is “executed against” queries rather
than vice versa
Could be the future of BI …

… and of social networking
Embedded DBMS for devices

Products





Sybase SQL Anywhere
solidDB – focused on caching post-acquisition?
Cloudscape – vaporized?
McObject – tiny startup
Features

Load-and-forget


Zero-DBA
Small-footprint

Sometimes -- subsettable library
Matching analytic DBMS to use cases



100 Tb data mart
50 Tb enterprise data warehouse
5 Gb – 5 Tb OLTP offload
Matching OLTP/general DBMS to use cases

Market leader



Mid-range


High-end e-commerce
High-volume call center
Web startup
It depends on how locked-in you are


Simple departmental application
General enterprise or SaaS app
Clayton Christensen’s “disruption” narrative

Market leaders have many advantages, including
top technology.
Followers come up with good technology too.
The leaders stay ahead by making their products
ever better and more complex.
The followers sell into new or non-mainstream
markets, at prices the leaders can’t match. So they
dominate new markets.
Old markets turn into low-margin commodity-fests.

Unless they diversify, old leaders are doomed.




That’s what’s happening here





Much DBMS complexity is without benefit
Other complexity only benefits a few highend customers
Data warehouse specialists exploit radically
superior technology (e.g., MPP)
Open source vendors have radically
different price points and business models
Open source adoption has been strongest in
non-traditional markets.
And the big vendors know it





Oracle is diversifying furiously
Oracle has announced a clear focus on topend customers
IBM is obviously focused on the high end
too
Oracle and (to some extent) IBM are buying
alternative DBMS technologies
Microsoft and IBM aren’t dependent on the
DBMS business anyway
Download