dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Query Task Model (QTM): Modeling Query Execution with Tasks Steffen Zeuch and Johann-Christoph Freytag 1 Motivation ✤ ✤ dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Different DBMS execute the same QEP using different schedules ✤ Run-time execution not query optimization ✤ No uniform scheduling format ✤ Query execution in different DBMS are not comparable Major differences between DBMS: ✤ Chunk Size: Size of operator’s input Execution model vs.comparable run-time schedulerto HowScheduling to makeStrategy: different schedules explain why one schedule performs better than another ? ✤ 2 Outline dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN 1.Parallel Query Execution 2.QTM: Query Task Model 3.Evaluation 4.Outlook 3 dbis Chunk Size I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Tupleat-a-time Selection t1 t2 t3 t4 t5 t6 Columnat-a-time Bufferat-a-time t1 t1,t2,t3 t4, t5, t6 t1,t2, t3 Chunk Size DBMS 1 Tuple System R, MySQL, (PostgreSQL) “Fit into Cache” Monet X100, DB2 with BLU Fix number of tuples Hyper Fix Block Size C-Store Column MonetDB MIL 4 dbis Scheduling Strategie I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Hash Probe (R) Hash Probe (S) Hash Build Hash Build Selection R S T 5 Volcano Execution Model (Open-Next-Close Iterator) Tuple dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Hash Probe (R) Next Hash Probe (S) Tuple Hash Build Hash Build Next Selection Next Tuple R S T 6 dbis (Run-time) Scheduler Hash Probe (R) Hash Probe (S) Selection T Time I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Spatial Locality Temporal Locality Prob_R(t2) Prob_R(t2) Prob_R(t1) Prob_S(t2) Prob_S(t2) Sel(t2) Prob_S(t1) Prob_R(t1) Sel(t2) Prob_S(t1) Sel(t1) Sel(t1) Further Optimiziation Criteria: I/O, NUMA or Memory Usage 7 dbis Dynamic Load Balancing ⋈ I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN T2 T1 ⋈ T3 T4 T5 σ σ R S T CPU1 CPU2 T1 T2 T3 T4 T5 8 dbis Chunk Size DBMS Landscape Column-at-a time Buffer-at-a time Tuple-at-a time I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN MonetDB MIL MonetDB X100 DB2 DB2 BLU PostgreSQL StagedDB Hyper System R MySQL PostgreSQL Volcano Execution Model (Run-time) Scheduler SAP HANA Dynamic Load Balancing Scheduling Strategy 9 Outline dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN 1.Parallel Query Execution 2.QTM: Query Task Model 3.Evaluation 4.Outlook 10 QTM: Query Task Model dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Idea: A model that describes parallel query execution with tasks QEP: Queue of tasks Task: Encapsulate a piece of work on some data Goals: Open a design space for DBMS schedules Make main aspects of query scheduling comparable: Execution order, degree of parallelism and thread coordination, and partitioning 11 dbis Query Task Model Work Data I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Task Queue T1 T2 T3 Processing Strategies Table t1 t2 t3 Data Queue t1 t2 t3 12 QTM Transformation: Input QEP Hardware Architecture dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Table Format 13 QTM Transformation QEP Choosing Hash Join dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Max. Pipelines + Dependency Graph 14 QTM: Task Configuration Max. Pipelines + Dependency Graph dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Task Configurations (Task Blueprints) 15 dbis QTM: Tasks I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Instantiation Task Configuration (Task Blueprints) Set of Tasks (TC Instantiation) 16 QTM: Implementation dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Compile-time Run-time 17 Outline dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN 1.Parallel Query Execution 2.QTM: Query Task Model 3.Evaluation 4.Outlook 18 Evaluation: Scenario dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Schedule Workload Tuples per Relation 30M Selection < 25M S1 Values 0,1,2 … S2 Values 0,2,4,… S3 Values 0,4,8,… 19 dbis Evaluation: Configuration Schedule I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Buffer Size 1 Tasks per Op 30M Total Tasks 1 30M 150M 1 30M 150M 4 7.5M 22.5M 5) Buf – L1 2,048 14,649 43,947 6) Buf – L2 16,384 1,832 5,496 7) Buf – L3 491,520 62 186 8) Op - Mat 7.5M 4 20 9) Op - Seq 7.5M 4 20 1) Tup – Pipe 2) Tup – Mat 3) Tup – Seq 4) Buf - CL 90M 20 Evaluation: Runtimes dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN 21 Evaluation: Sampling Data-related Misses dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Instruction-related Misses 22 Evaluation: Miss Distribution dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN 23 Evaluation: Scalability dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN 24 Evaluation: Insights ✤ ✤ ✤ HUMBOLDT− UNIVERSITÄT ZU BERLIN Tradeoff between data and instruction cache performance ✤ ✤ dbis I NSTITUT FÜR I NFORMATIK Sweet spot: Largest private cache size vs. slightly larger buffer Medium sized tasks are data-efficient: ✤ Pros: Buffer fits entirely into cache, high data locality ✤ Cons: High number of tasks and instructions Large tasks are instruction-efficient: ✤ Pros: Decrease number of instructions and tasks, high instruction locality ✤ Cons: More data cache misses if cache size is exceeded QTM: Cache-performance can be adjusted by buffer size 25 Outline dbis I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN 1.Parallel Query Execution 2.QTM: Query Task Model 3.Evaluation 4.Outlook 26 dbis Outlook ✤ ✤ I NSTITUT FÜR I NFORMATIK HUMBOLDT− UNIVERSITÄT ZU BERLIN Contributions: ✤ QTM: A model for parallel query execution using tasks ✤ Open a design space for DBMS schedules ✤ Make different schedules present in different DBMS comparable Future Work: Thanks! ✤ Cost Model ✤ Transformation process for an arbitrary QEP 27