Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31st 2011 A relational database software for reporting, data analysis and Business Intelligence Meaning it has to analyze terabytes of data VectorWise recently set a new record on the TPC-H benchmark Financial services like banks, wall street Desirable to query historical data as well as current positions Data volume is simply too large to store cost effectively in memory VectorWise delivers in-memory performance with data stored on-disk Social media: for example for advertisement E-commerce Performance of database suffers as the amount of historical data grows VectorWise is able to deliver good performance even when analyzing large amount of data SIMD(SINGLE INSTRUCTION MULTIPLE DATA) Using large CPU cache as execution memory SIMD instructions allows the same operation performed on multiple data simultaneously Traditional databases process data one tuple at a time Vectorwise processes vector of hundreds of element at once Size of vector is tuned to fit into cache HARDWARE ACCELERATED STRING-BASED OPERATIONS Supported by Intel Xeon processor Speeds up operations like: Selections on strings using wild card matching Aggregations on string-based values Joins or sorts using string keys Up to 2 – 4 times faster Use of COLUMN-BASED storage For data warehouse databases, most queries retrieve many rows Row-based storage would generate a lot of unnecessary I/O Column-based storage is generally accepted as a superior storage model for this type of workload VectorWise’s Hybrid Column Store By default data is stored column by column For tables that are indexed on more than one column, indexed columns are stored together in the same block This data storage model is known as PAX (Partition Attributes Across) PAX delivers better cache performance Guarantees ACID properties Supports multi-version read consistency Use of POSITIONAL DELTA TREES(PDTs) Small inserts, updates or deletes is expensive in column-based database (as opposed to large bulk data load operations) PDT is an in-memory structure that stores the position and the change (delta) at that position PDTs use a configurable amount of memory. Once the memory pool is exhausted, updates are written to disk DATA COMPRESSION - VectorWise compresses data on a column-bycolumn basis using these any one of these algorithm: RLE(Run Length Encoding, PFOR(Patched Frame Of Reference or delta encoding on top of PFOR)) - For instance the VectorWise Innovated use of data compression in order to improve performance by allocating a portion of physical memory for a memory-bases disk buffer called the CBM(Column Buffer Manager). The data is automatically prfetched from disk and stored in the CBM. STORAGE INDEXES storage indexes in extreme cases can provide the same benefit as data partitioning does for other databases w/o the overhead of multiple database object or maintaining a partitioning strategy. - VectorWise automatically maintains a storage index per column storing minimum and maximum values for the data block. - Very efficient in determining whether a database block is a candidate block for a particular query. PARALLEL EXECUTION Parallel execution provides the greatest performance improvements in DSS (Decision support system) and data warehousing environments. The VectorWise engine is able to sustain a large amount of concurrent queries efficiently on a multi-core system Ex.of Parallel Execution Server Connections and Buffers New record set by Ingres for the TPC-H benchmark at the 100GB scale factor is an astounding 3.4 times faster than the old mark. New record of 251,561 QphH (Queries per hour) for 100 GB of data was set by Ingres's VectorWise database running on one HP ProLiant DL380 G7. Enables you to a workload on a server Can lower the cost instantly by better utilizing your hardware (dynamic). Achieve extremely fast performance for typical data warehouse and data mart workload. http://www.itwire.com/business-it-news INGRES VectorWise Whitepaper, a technical whitepaper http://uhesse.wordpress.com http://kerryosborne.oracle-guy.com/2010/08/oracleexadata-storage-indexes/ http://www.openexpo.ch/fileadmin/documents/201 0Bern/Slides/25_OlafLaber.pdf http://www.wikipedia.org Ailamaki, Anastassia. A Storage Model to Bridge the Processor/Memory Speed Gap. Carnegie Mello University, 2001 ANY QUESTION?