1 Global Maksimum Data&Information Technologies Not x2 Oracle Exadata v2 Fast Track Hüsnü Şensoy Global Maksimum Data & Information Tech Founder, VLDB Expert 2 Global Maksimum Data&Information Technologies Agenda Why do we need Exadata v2 ? Exadata Hardware Exadata Software Better to show rather than talk. Conclusion 3 Global Maksimum Data&Information Technologies Who am I ? Data & Information expert on VLDB environments Before completing the year • HrOUG in two weeks later • Optimized Analytical Processing Capabilities of 11g Release 2 • Database Consolidation Best Practices • ACED Session with • Jose Senegačnik • Denes Kubiček • UKOUG in December • Optimized Analytical Processing Capabilities of 11g Release 2 DWH Data Mining Inference Systems Data Archiving Solutions Niche Storage Technologies Recovery Strategies & Solutions HA Systems Oracle ACED on BI field Only one in Turkey Still the youngest one all over the community. DBA of the Year 2009 7th and still the youngest all over the world. Only one in Turkey Member of Oracle CAB for 12g DWH development Worldwide presenter of Oracle conferences and user group events 4 Global Maksimum Data&Information Technologies Global Maksimum Data & Information Technologies and Oracle Exadata v2 Only company in Turkey having IB interconnected RAC 11g implementation experience on Linux x86-64bit. Only company in Turkey having sufficient consultancy experience (more than 120 TB conventional system data) on Exadata v2 Physical & Architecture Design Migration Performance Optimization Backup & Recovery Architectures Design Trains customers, Oracle partners, and Oracle employees all over the Europe Strong joint relation with Oracle Platinum Partners, Oracle Development Team Head Office, and IB technology leaders. X-Migrator service provider for high capacity customers. 5 Global Maksimum Data&Information Technologies Oracle Exadata v2 Don’t think Exadata as yet another product sold by SALES guys. • As a customer take it as an effortless solution for hardware software integration. • As an engineer take it as an elegant solution of so-called unsolvable I/O problem for Oracle databases. 6 Global Maksimum Data&Information Technologies Who needs Exadata v2 ? Engineers To learn that «The mechanic with a hammer thinks that all problems are nail» Customers Shorter setup time Non-Exadata Customers More stable Oracle releases Oracle Easy to manage/standardize its code repository 7 Global Maksimum Data&Information Technologies Oracle Exadata v2 Hardware Best thing about Exadata is that it has nothing magical in it in terms of hardware. • A few Sun Fire X4170 x86-64 bit servers. • A few Sun Fire X4275 x86-64 bit servers. • A few IB switches. 8 Global Maksimum Data&Information Technologies Exadata v2 X-Ray Sun Datacenter 36-port Managed QDR IB Switched Exadata Storage Servers Sun Fire™ X4170 Oracle Database Server KVM IP Console Switch Rackmount KMM Keyboard with TFT monitor 42U 48-port Gigabit Ethernet Switch 9 Global Maksimum Data&Information Technologies Interconnect Network Hardware IB Switches 3 x 36-port managed switches as opposed to Exadata v1 (2+1). 2 “leaf” 1 “spine” switches Spine switch is only available for Full Rack because it is for connecting multiple full racks side by side. A subnet manager running on one switch discovers the topology of the network. HCA Each node (RAC & Storage Cell) has a PCIe x8 40 Gbit HCA with two ports Active-Standby Intracard Bonding. 10 Global Maksimum Data&Information Technologies RAC Node Sun Fire X4170 Server 2 socket Quad Core 2.53 GHz 2 Hyper-Threads So, CPU_COUNT=16 18 DDR3 DIMM Slots 72 GB@800 MHz (2x3x3x4 GB) 4 10/100/1000Base-T Ethernet ports NET0 : Management NET1 : Public Network NET2 : Public Network NET3 : - PCIe PES24T6G2 Switch x8 11 Global Maksimum Data&Information Technologies Storage Node Sun Fire X4275 Server 2 socket Quad Core 2.53 GHz 6 DDR3 DIMM Slots 24 GB@1066 MHz (2x3x1x4 GB) HDD Storage 12 x 3.5-inch 600 GB 15 K RPM SAS disks 12 x 3-5-inch 2 TB 7.2 K RPM SATA disks 4 Sun Flash Accelerator F20 PCIe Cards 12 Global Maksimum Data&Information Technologies Soft Storage Node CELLSRV iDB Multithreaded block server Buffer cache reads Smart scans MS IORM CELLSRV Performs I/O Resource Management Gather operational statistics Communicates over iDB with the clients. MS RS OC4J application Provides functionalities for Cell management Cell administration Aler generation RS First process becoming live in storage cell. Work as a hang analyzer for CELLSRV and MS 13 Global Maksimum Data&Information Technologies HDD Sequential Read Performance 600 GB 15K RPM SAS 2 TB 7.2K RPM SATA 204 MB/s 122 MB/s 144 MB/s 90 MB/s 14 Global Maksimum Data&Information Technologies HDD Random Read Performance 600 GB 15K RPM SAS 2 TB 7.2K RPM SATA 175 IOPS @ 2KB 380 IOPS @ 2KB 79 IOPS @ 2KB 182 IOPS @ 2KB 15 Global Maksimum Data&Information Technologies F20 PCIe Card Not a SATA/SAS SSD driver but a x8 PCIe device providing SATA/SAS interface. 4 Solid State Flash Disk Modules (FMod) each of 24 GB size 256 MB Cache SuperCap Power Reserve (EnergyStorageModule) provides write-back operation mode. ESM should be enabled for optimal write performance Should be replaced in every two years. Can be monitored using various tools like ILOM Embedded SAS/SATA configuration will expose 16 (4 cards x 4 FMod) Linux devices. /dev/sdn 4K sector boundary for Fmods Each FMod consists of several NAND modules best performance can be reached with multithreading (32+ thread/FMod etc) 16 Global Maksimum Data&Information Technologies Performance of F20 Read: 1.1 GB/s Random Write Performance Degeneration As the flash cache get full (sustained write) Wear Leveling SLC Update Mechanism : Delete + Write Garbage Collector write performance is degenerated due to Write Amplification. That’s why you are not advised to put real-time performance demanding files on flash cards Online Redo Logs Sequential Max Write: 567 MB/s (~145K IOPS @ 4K) F20 PCIe Card (4 FMod) Read: 101K IOPS Random @ 4K Peak: 88K IOPS Write Average : 37K IOPS 17 Global Maksimum Data&Information Technologies Aggregate Capacity Capacity Quarter Rack Half Rack Full Rack 21 TB 50 TB 100 TB 72 TB 168 TB 336 TB 1.1 TB 2.6 TB 5.3 TB SAS 6 TB 14 TB 28 TB SATA 21 TB 50 TB 100 TB Raw HDD SAS SATA Raw Flash User Data Performance Quarter Rack Half Rack Full Rack SAS 4.5 GB/s 10.5 GB/s 21GB/s SATA 2.5 GB/s 6 GB/s 12 GB/s Flash Throughput 11 GB/s 25 GB/s 50 GB/s Flash IOPS 225,000 500,000 1,000,000 HDD Throughput 18 Global Maksimum Data&Information Technologies Oracle Exadata v2 Software Exadata hardware is almost sufficient to beat any hardware configuration possible to work with Oracle Database. But why to stop there while it is possible to do more with • Smart Scan • Storage Indexes • I/O Resource Manager • EHCC 19 Global Maksimum Data&Information Technologies Soft Components of Exadata v2 Open Soft Pieces 1. Oracle Enterprise Linux 5.3 2. Oracle defined set of RPMs Oracle Exadata Storage Software Smart Scan Smart Flash Cache HCC Storage Index IO Resource Manager (IORM) Oracle Exadata Bundle Patches Common Soft Pieces Oracle RDBMS 11.2.0.1 3. Pruning Parallel Hash Join Oracle OFED (bug fixed version) Encrypted Data Data Mining Partitioning Bloom Filtering Pairwise/Semi-pairwise Join Compression HCC DBFS Oracle Grid IS 11.2.0.1 ASM Clusterware Oracle Exadata Bundle Patches iDB 20 Global Maksimum Data&Information Technologies Smart Scan Smart Scan is initially formed to be column and row filtering based on projection and predicates. But this was just the seed idea. Today Smart Scan can also do Projection (column) filtering Predicate (row) filtering SELECT * FROM v$sqlfn_metadata WHERE offloadable = 'YES'; Preperation of bloom filters for join Smart Incremental backup Scan on encrypted data Smart File Creation RMAN Restore Tablespace Creation File Grow Scoring for Data Mining All data mining scoring functions are offloaded 21 Global Maksimum Data&Information Technologies Smart Scan OFF. Why ? CELL_OFFLOAD_PROCESSING = FALSE The table or partition is small. CBO doesn’t choose to use direct path read. ROW_DEPENDENCY ENABLED or rowscn is fetched. Fetch rows in rowid order. CREATE INDEX ... NOSORT LOB or LONG fetch Scan on flashback table Cell based decryption is disable. Tablespace is not completely on Exadata More than 255 columns are queried. Predicate evaluation on virtual column. For dirty blocks 22 Global Maksimum Data&Information Technologies Storage Index Smart Scan is about saving RAC node CPUs during I/O processing, but storage index is about saving the processors of Exadata storage cells. Anyhow if we figure out that T = E+W, decreasing E in any layer will decrease T. This means faster queries or more queries within the same period. Storage Index is not something first used in Exadata. It is borrowed from Netezza ZoneMap. Oracle’s SI is in memory It is about filtering out for a super set of actual result set. 23 Global Maksimum Data&Information Technologies select A,B,C from T1 where B<2; CELLSRV AU AU Smart Scan RDSoRDMA First Execution 24 Global Maksimum Data&Information Technologies Next Executions select A,B,C from T1 where B<2; Storage Index B: 1/5 B: 3/10 B: 5/10 B: 9/10 B: 3/10 iDB B: 2/10 AU AU Smart Scan CELLSRV 25 Global Maksimum Data&Information Technologies More Storage Index Information Storage Index may not be built by CELLSRV yet. Storage Regions are not created on all columns. CELLSRV picks out suitable columns to be indexed. Column types should be suitable (byte level comparison should match type level comparison) NLS types are not allowed. Tips Keep your eyes on cell physical IO bytes saved by storage index statistics in V$SYSSTAT or V$SESSSTAT Remember that in order to fully utilize storage indexes, data should be physically located in clustered manner on highly queried column You might thing of as which column would you index if you could. So modify your ETL in accordance with that. 26 Global Maksimum Data&Information Technologies &