Bringing Value of Big Data to Business: SAP's Integrated Strategy [1] Group 6 - Ziqi Fan, Sheng Chen SAP’s Integrated Big Data Strategy • SAP is attempting to create an integrated approach that allows companies to perform all the following operations in one environment – Analytics; – Make big data operational; – Support applications for high resolution management . Architecture Vision of SAP’s Integrated Big Data SPA HANA [2] • SAP HANA, an in memory database is the key to SAP’s integrated strategy. • HANA DB takes advantage of the low cost of main memory (RAM), data processing abilities of multicore processors and the fast data access of solidstate drives relative to traditional hard drives to deliver better performance of analytical and transactional applications. SPA HANA [2] • It offers a multi-engine query processing environment which allows it to support both relational data as well as graph and text processing for semi- and unstructured data management within the same system. • HANA DB is 100% ACID compliant. Main-Memory DB Query Optimization [3] • Logical Optimization – Almost same like that in conventional database • Physical Optimization – goal : minimize execution costs with respect to a given cost model – Quite different from that in conventional database due to lack of I/O as dominant cost factor • A “simple” cost model T = TMem + TCPU Main-Memory DB Query Optimization • CPU Cost TCPU = c0 + c1 · n + c2 · m c0 - fix startup costs c1 - per tuple costs for processing input tuples c2 - per tuple costs for producing output tuples n - # input tuples m - # output tuples Main-Memory DB Query Optimization • Memory Access Cost Mis - # cache miss of level i for sequential access Mir - # cache miss of level i for random access lis - cache latency of level i for sequential access lir - cache latency of level i for random access Estimating Mis and Mir is very difficult ! Main-Memory DB Query Optimization • Basic Access Pattern – single sequential traversal – repetitive sequential traversal – single random traversal – random access – etc. • Compound Access Pattern – Nested loop Join – Hash-join – etc. Reference • [1] Dan Woods, “Bringing Value of Big Data to Business: SAP's Integrated Strategy”, Forbes, 01/05/2012 http://www.forbes.com/sites/danwoods/2012/01/05/bringingvalue-of-big-data-to-business-saps-integrated-strategy/ • [2] http://en.wikipedia.org/wiki/SAP_HANA • [3] Manegold S.: Understanding, Modeling, and Improving MainMemory Database Performance, SIKS Dissertation Series No. 2002-17, ISBN 90 6196 5179, pp. 71-104