Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com http://Research.Microsoft.com/~Gray/talks 1 Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7 2 Status Report on Commodity Server Performance • Standards: – TPC, – SpecWeb, ... • Product benchmarks: e.g. – SAP, – PeopleSoft,… • Both indicate that – NT is 18 months behind Unix-SMP performance – but clusters can make up the difference 3 TPC-C Cluster • IBM SP2 12x8 cpu Oracle 8.2 57 ktpmC, 148$/tpmc • Predict: large & inexpensive NT cluster number this year. SMP • HP 9000 16 cpu, Sybase 11 52.1 ktpmC, 82$/tpmC • NEC 8 cpu SQL Server 14.9 ktpmC, 60$/tpmC 40 35 Diseconomy of Scale: Big systems are Expensive 27$/tpmC vs 148$/tpmC tpmC per k$ 30 25 20 15 10 5 0 11007 16101 52117 tpmC 57053 4 TPC-D • Performance Champions:NCR/Teradata – 1 TB:32x4 node clusters – 300 GB: 24x4 node cluster – 100 GB: 8x4 cluster • All use Teradata software on NCR World-Mark Intel-based hardware 1,000 GB NCR 300 GB NCR 100 GB NCR WorlkMark Server WorlkMark Server WorldMark Server (QppD) 3069 9260 12149 (QthD) 1205 3117 3912 5 Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7 6 VLDB Reality Test • California DMV – ~ 20 million cars, drivers, doctors, barbers,.. – Some drivers have moving violations – DMV knows about 1.5 KB about each one – 30 GB total. • Microsoft: too big says DoJ – 40B$ revenue (in company life time) – ~1 billion unit sales: @ 100 B = 100 GB – ~100 M customers: @1 KB = 100 GB • Wall Mart (no one bigger!) – Sells 10 B items per year – 100 bytes/item => 1 TB • ATT – 300 M calls per day (peak day) – 10 B calls per year – 100 b/call = 1 TB 7 VLDB Reality Test • Its HARD to find 1 TB of transaction data – 100 M web hits/day – 250 B/hit – 1TB/year • Its HARD to find 1TB of text data – 100 M web pages – 10 KB/page – = 1 TB • How do they do it? • Lots of indices? – No: that is only 3x • Precomputed Aggregates? – Yes: OLAP benchmark • Start at 30 MB • Use 2.7 GB or 6GB database – But: this is dumb • Email? – Microsoft: 6 TB – Hotmail: 3.5 TB – AOL? 8 Data Tidal Wave • Seagate 47GB drive @ 3k$ – 100 GB penny per MB drive coming in 2000 • 10 $/GB = 10 k$/ Terabyte! (in y2k) – Everyone can afford one • What’s a terror bite? – If you sell ten billion items a year (e.g Wal-Mart) – And you record 100 bytes on each one – Then you got a Terror Bite • Where will the terror bytes come from? – Multimedia (like the TerraServer) and... 9 Multi Media: Very Large DBs • Photo is 100 KB, not 100 B – So, photo DBs are 1,000x larger • Examples: – – – – Scanned documents Photo records of products/people/places Surveillance Scientific monitoring 10 Some TerrorByte Databases • EOS/DIS (picture of planet each week) – 15 PB by 2007 • Federal Reserve Clearing house: images of checks – 15 PB by 2006 (7 year history) • Sloan Digital Sky Survey: – 40 TB raw, 2 TB cooked • TerraServer: 11 Scaleup - Big Database • Build a 1 TB SQL Server database – Show off Windows NT and SQL Server scalability – Stress test the product • Data must be – – – – 1 TB Unencumbered Interesting to everyone everywhere And not offensive to anyone anywhere • Loaded – 1.1 M place names from Encarta World Atlas – 1 M Sq Km from USGS (1 meter resolution) – 2 M Sq Km from Russian Space agency (2 m) • Will be on web (world’s largest atlas) • Sell images with commerce server. 12 TerraServer World’s Largest PC! • 324 disks (2.9 terabytes) • NT EE & SQL 7.0 • 8 x 440Mhz Alpha CPUs • Photo of the planet USGS and Russian images • 10 GB DRAM 13 Background • Someday • Earth is 500 Tera-meters square – USA is 10 tm2 • 100 TM2 land in 70ºN to 70ºS • We have pictures of 6% of it • • • • – 3 tsm from USGS – 2 tsm from Russian Space Agency Compress 5:1 (JPEG) to 1.5 TB. Slice into 10 KB chunks Store chunks in DB Navigate with – Encarta™ Atlas – multi-spectral image – of everywhere – once a day / hour 1.8x1.2 km2 tile 10x15 km2 thumbnail 20x30 km2 browse image 40x60 km2 jump image • globe • gazetteer – StreetsPlus™ in the USA 14 USGS Digital Ortho Quads (DOQ) • US Geologic Survey • 3 TeraBytes • Most data not yet published • Based on a CRADA – TerraServer makes data available. 1x1 meter 4 TB Continental US New Data Coming USGS “DOQ” 15 Russian Space Agency(SovInfomSputnik) SPIN-2 (Aerial Images is Worldwide Distributor) • • • • • • 1.5 Meter Geo Rectified imagery of (almost) anywhere Almost equal-area projection De-classified satellite photos (from 200 KM), More data coming (1 m) Want to sell imagery on Internet. Putting 2 tm2 onto TerraServer. SPIN-2 16 Demo http://www.TerraServer.com Microsoft BackOffice SPIN-2 17 Hardware 1TB Database Server AlphaServer 8400 4x400. 10 GB RAM 324 StorageWorks disks 10 drive tape library (STC Timber Wolf DLT7000 ) SPIN-2 18 Software Web Client Image Server Active Server Pages Internet Information Server 4.0 Java Viewer broswer MTS Terra-Server Stored Procedures HTML The Internet Internet Info Server 4.0 Sphinx (SQL Server) Microsoft Automap ActiveX Server Terra-Server DB Automap Server Terra-Server Web Site Internet Information Server 4.0 Microsoft Site Server EE Image Delivery SQL Server Application 7 19 Image Provider Site(s) System Management & Maintenance Backup and Recovery – – – – STC 9717 Tape robot Legato NetWorker™ Sphinx Backup/Restore Utility Clocked at 80 MBps!! SQL Server Enterprise Mgr – DBA Maintenance – SQL Performance Monitor 20 TerraServer File Group Layout • Convert 324 disks to 28 RAID5 sets plus 28 spare drives • Make 4 NT volumes (RAID 50) 595 GB per volume • Build 30 20GB files on each volume • DB is File Group of 120 files E: F: G: H: 21 Gazetteer Design • Classic Snowflake Schema • Fast First hint to Optimizer PlaceGrid Place CountrySearch AlternateName CountryID GazSrcID 1148 Country CountryID CountryName UNcode 264 StateSerach AlternateName CountryID StateID FreatureID GazSrcID State StateID CountryID StateName 1083 PlaceID ImageFlag AlternateName Name CountryID StateID TypeID GazSourcID Latitude Longitude UGridID ZGridID DOQdate SPIN2date 3776 1,089,897 ZGridID BestPlaceName XDistance YDistrance 50,000,000 FeatureType TypeID Description 13 GazetteerSource GazSrcID Description 1 22 Image Data Design • Image pyramid stored in DBMS (250 M recs) OriginalMetaData ImageMeta OrigMetaID SrcID ImageSource Agency SourcePhotoID SourcePhotoDate SourceDEMDate MetaDataDate ProductionSystem ProductionDate DataFileSize Compression HeaderBytes … 80 other fields ImgMetaID OrigMetaID ImgStatus ImgDate ImgTypeID JumpPixHeight JumpPixWidth BrowsePixHeight BrowsePixWidth ThumbPixWidth ThumbPixHeight CutCol CutRow MidLat MidLong NELat NELong NWLat NWLong SELat SELong SWLat SWLong UGridID UTMZone XUtmID YUtmID XGridID YGridID ZGridID 650 k SPIN2 2 M USGS ImgSource ImgType ImgTypeID ImgFileDesc ImgFileExt MimeStr SrcID SrcName SrcTblName SrcDescription GridSysID ImgTypeID Pick Log UGridHits Name Description Link PickDate URL Time <extensive list of action parameters URL UGridID ZTileGridID count 10 TileMeta xxx xxx Jump Browse Thumb Tile UGridID ZGridID ZTileGridID ImgData ImgDate ImgTypeID ImgMetaID SrcID EncryptKey File Name UGridID ZGridID ZTileGridID ImgData ImgDate ImgTypeID ImgMetaID SrcID EncryptKey File Name UGridID ZGridID ZTileGridID ImgData ImgDate ImgTypeID ImgMetaID SrcID EncryptKey File Name UGridID ZGridID ZTileGridID ImgData 1 ImgDate ImgTypeID ImgMetaID SrcID EncryptKey File Name .65 M SPIN2 1.5 M USGS .65 M SPIN2 1.5 M USGS .65 M SPIN2 1.5 M USGS 16 M SPIN2 96 M USGS ImgMetaID OrigMetaID SrcID ImgStatus ImgDate ImgTypeID TilePixHeight TilePixWidth CutCol CutRow MidLat MidLong NELat NELong NWLat NWLong SELat SELong SWLat SWLong UGridID UTMZone XUtmID YUtmID XGridID YGridID ZGridID 16 M SPIN2 96 M USGS 23 4 2 650 k SPIN2 2 M USGS Image Delivery and Load DLT Tape DLT Tape “tar” NT DoJob \Drop’N’ LoadMgr DB Wait 4 Load Backup LoadMgr LoadMgr ... ESA Alpha Server 4100 100mbit EtherSwitch 60 4.3 GB Drives Alpha Server 4100 ImgCutter \Drop’N’ \Images 10: ImgCutter 20: Partition 30: ThumbImg 40: BrowseImg 45: JumpImg 50: TileImg 55: Meta Data 60: Tile Meta 70: Img Meta 80: Update Place Enterprise Storage Array STC DLT Tape Library 108 9.1 GB Drives 108 9.1 GB Drives 108 9.1 GB Drives Alpha Server 8400 24 SQL 7 Testimonial • We started using it March 4 1997 – – – – SQL 7 Pre-Alpha SQL 7 Alpha SLQ 7 Beta 1 SQL 7 Beta • Loaded the DB twice – (we made application mistakes) • • • • Now doing it “right” Reliability: Great! SQL 7 never lost data Ease of use: Great! Functionality: Great! 25 Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7 26 SQL 7: Easy & Functional Easy Scalability Data Warehousing Dynamic self management Multi-site management Alert/response management Job scheduling and execution Scriptable management profiling/tuning tools Fully Unicode English Language Query Integrated text search engine 27 Made It Easier! (fewer knobs) • Desktop & Workgroups – Auto Configure Engine / Dynamic Disk/memory – Reduce Learning Curve, Increase Productivity – Self-Managing SQLAgent, Wizards, “Task Pads” • Large Organizations – Deploy/manage hundreds of SQL Servers – Lower TOC for Large Environments – Multi-Server Operations/ “Lights-out” Environment 28 Multi-Site Management • Admin servers from one place • Automate simple stuff • Wizards for common stuff • Manage arrays of servers – operations, security,… – Replication – Import/export •Interface is scriptable – COM object model – Script with Java, VB, ... •Scheduling and Multi-step jobs 29 DBA and Developer Tools • Built-in GUI – data/schema design – data query & edit – intgrated with programming tools • SQL Server Profiler – Selected server events and trace criteria – “Capture” output to screen or replay • SQL Server Expert – – – – Analyzes actual server usage history Makes recommendations to improve performance Recommends Index design Recommends operations procedures 30 • • • • Wizards and GUIs Wizards galore (over 50 at last count) MS Access as a query interface Built-in data access tools (integrated with tools) Graphical show plan 31 Many New Wizards... • • • • • • • • • • Create a Database Scheduled Backup Create a Maintenance Plan Create a Scheduled Job Create an Alert Security Wizard Import Data to SQL Server Export Data From SQL Server Clustering (Wolfpack) Index Tuning Wizard Web Assistant Register Servers Configure Replication Create Publication Create Pull Subscription Create Push Subscription Replication Partitioning Create an Index Create a Stored Procedure Create a View More to come... 32 Distributed Management Objects (SQL-DMO) • COM Interfaces for administering SQL Server – Embedded Administration (no UI) • All Administration Functions Supported – Server, Database Configurations, Settings – Object Creation, Security, Replication, Scripting,.. – 40+ Objects, 1000+ properties and methods • Integration Interface for ISV Administration – I.e., Baan using DMO for Scripted App Install • Scripting Via VBA and Jscript + DCOM 33 DMO: Object Model (Overview) SQL Server SQLAgent Databases Jobs Users Tasks DB Options Alerts Transaction Logs Operators Publications FileGroups Files Logins Configurations Linked Servers Remote Login Table View Columns Stored Procs Indexes Rules Keys (PK/FK) Defaults Triggers 34 DMO Scripting • Backup a Database Set MyServer = CreateObject("SQLDMO.SQLServer")‘Create Server Object Set MyBackup = CreateObject("SQLDMO.Backup") ‘Create Backup Object MyServer.Name = “MSSALES” MyServer.LoginSecure = True MyServer.Connect ‘ Identify Server ‘ Windows NT Auth ‘ Connect MyBackup.Database = ”SALESII” MyBackup.Files = "\\MyServer\Backups\" _ + MyBackup.Database +”.bak” MyBackup.SQLBackup MyServer ‘ Database to backup ‘ Backup Location ‘ Name Backup File ‘ Back it Up MyServer.Disconnect ‘ We’re Done! 35 Scalability Scalability Data Warehousing Easy Win9x/NTW version Dynamic row-level locking Improved query optimizer Intra-query parallelism 64-bit support Replication Distributed query High Availability Clusters 36 Scale Down to Windows 95-98 • • • • • Full function (same as NTW) Self managing Many tools Integration with Next MS Access Great for imbedded apps 37 Replication • • • • • Transactional and Merge Remote update ODBC and OLE DB subscribers Wizards Performance OS 390 DB2 Publisher 2PC, RPC Distributor DB2 VSAM Subscriber CICS Subscriber Subscriber Subscriber Updating Subscriber (immediate updates) 38 Parallel Query SMP & Disk Parallelism Global Agg. Result 50 rows + 4 x 50 rows Local Agg. + + + + Disks • Plus Distributed • Plus Hash Join (fanciest on the planet) • Plus Optimized Partitioned views 50,000 rows •# of emp. per group •total inc. per group 39 Distributed Heterogeneous Queries Data Fusion / Integration Join spread sheets, databases, directories, Text DBs etc. Any source that exposes OLE DB interfaces SQL Server as gateway, even on the desktop Directory Service Database (DB2, VSAM, Oracle, …) Spreadsheet SQL 7.0 Query Processor Photos Mail Maps Documents 40 and the Web Utilities The Key to LARGE Databases • Backup – Fuzzy – Parallel – Incremental – Restartable • Recovery – Fast – File granularity • Reorganize – shrinks file – reclusters file • Auto-repair 41 Data Warehousing Warehousing Framework Easy Visual data modeler Microsoft repository Data transformation services Scalability (DTS) Plato & Dcube - Multi Dimensional Data Cubes English query 2.0 Data Warehousing Built-in text-index engine 42 Key Microsoft Data Warehouse Programs • Data Warehouse Framework (DWF) – Process -- for building, using and managing – Pipeline -- for metadata flow – Protocols -- to integrate components • Data Warehouse Alliance (DWA) – Partners -- ISVs pledged to the framework and its parts – Products -- complete spectrum from Microsoft and third-parties 43 Microsoft Data Warehousing Framework Managing Building Using Data Warehouse Design Data Mart Design** (logical/physical schema*/ data flow**) (Cubes/Star schema) Operational Data (OLE-DB **) DB Schema** Data Transformations (DTS**) Transformation** Data Marts (SQL Server** & OLAP Server**) Scheduling End-User Tools (Excel**, Access, English Query) OLAP Microsoft Repository** (Persistent Shared Meta-Data) Data Warehouse Management (Console*, Scheduling**, Events**,Topology*,) ** available in SQL Server 7 (* partially) Data Flow Meta-Data44Flow Alliance for Data Warehousing Technical and marketing relationship Supports SQL Server storage engine Third-party products tested with BackOffice DW Build BMC Data Mirror Execusoft Informatica Microsoft Platinum Technology Praxis Prism Sagent SAS Sterling V-Mark DW Access Andyne Business Objects Cognos IQ Software Microsoft NCR Data Mining Pilot Platinum Technology Sagent SAS Seagate Wall Data 45 DW Alliance Milestones • • • • 9/96 - Launched with 8 founding members 3/97 - Design review 1/97 - 6/97 - Expanded to 21 members 7/97 - Repository design review – Team development of shared metadata • 9/97 - OLE DB for OLAP API specification • 1H’98 - Integration development with Sphinx DTS and Replication APIs 46 Microsoft Repository • • • • Based on joint Sterling/Microsoft design (Shipped 97Q2) Wide distribution:VB, Visual Studio and Third-Parties Designed with over 60 vendors Extended to support DB schema, transformations, OLAP – Key element of the DW Framework • UML is abstract model • Everything viewable in UML terms UML UMX CDE DTM COM GEN DBM SQL OCL UML Unified Modeling Language GEN Generic UMX Uml Extensions DBM Database Model CDE Component Descriptions SQL Microsoft SQL Server COM Component Object Model OCL Oracle DTM Data Type Model 47 Repository & Data Warehousing • Common infrastructure -- the meta-data pipeline • Supports interoperability between data warehousing tools and products • Process: – Initial spec developed with 12 vendors – Gathering feedback now – Final spec review in Redmond, 2/98 48 Data Transformation • Workflow system manages Data Pump – Pre-defined transforms using the DTS GUI – Procedural VB Script, JavaScript, VBA, any COM • Multi-stream in, Multi-stream out Repository Metadata Transforms Oracle > SQL Server IDTSDataPump IUnknown Transformation Objects ActiveX Scripts Data Pump Function Example() Transform() If DTSSource(“CreditRating”) = “1” then DTSDestination(” Risk ") = ”Good" Else If DTSSource(”Credit") = ”2” DTSDestination(” Risk ") = ”Average” Else If DTSSource(”Credit") = ”3” DTSDestination(” Risk ") = ”Bad” Else Example = DTS_SkipRow End if End Function SQLAgent Multiserver Operations 49 Transformations • Data quality and validation – Missing values, scrubbing, exception handling • Data integration – Heterogeneous query, join keys, elim. dups • Transforms – Combine/decompose multiple columns to one • Aggregation • Central metadata – Business rules, data lineage 50 ROLAP User View Data load Persistent Store Data access User View MD Cache Hybrid • Debates between MOLAP and ROLAP vendors obscure customer needs • Plato is the product that best supports MOLAP, ROLAP and Hybrid and offers the most seamless integration of all three • Users & apps only see cubes MOLAP Flexible Architecture User View MD Cache 51 Plato and Dcube and HOLAP By Year By Make By Make & Year RED WHITE BLUE Source table Europe By Color & Year Sum Partition 1 By Color “Plato” User 1 ROLAP Designer USA Partition 2 Dcube SQL MD SQL Client app Asia Partition 3 ROLAP Dcube “Plato” server Client app User 2 52 How Plato Handles Data Explosion Product Family Product Month Quarter Quarter Product Family Products Month Aggregation Wizard finds the aggregations that feed the most other aggregations Fact Table 53 How Plato Handles Data Explosion • Aggregation Wizard finds the “80-20” rule in the data – The 20 percent of all possible pre-aggregations that provide 80 percent of the performance gain – Analyses level counts for each dimensions and parent-child ratios for each level • Independent of OLAP data model 54 OLE DB For OLAP • OLE DB extensions to access MD data – Part of OLE DB 2.0 • • • One new object: Dataset Enhancements to existing objects Heavily leverages OLE DB 55 OLE DB For OLAP Objects And Interfaces CoCreateInstance Command Enumerator Flattened Rowset Data source Dataset Session Range Rowset Schema Rowsets 56 English Query 57 OBJECT RELATIONAL The Next Great DBMS Wave • • • • • All the DB vendors are adding objects Microsoft is adding DBs to Objects Integration with COM+ Gives user-defined types and objects Plug-ins will be Billion dollar industry – Blades for SQL Server razor 58 Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s SQL Server 7 59