EXECUTION PLANS By Nimesh Shah , Amit Bhawnani Outline What is execution plan How are execution plans created How to get an execution plan Graphical Text What is execution plan The execution plan is created by the optimizer and used for the execution of a statement. Once the execution of a statement has started, the execution plan is followed in a step-by-step manner to retrieve the required result. It is an explanation of the steps to perform during statement execution. An execution plan is composed of primitive operations. Examples of primitive operations are: reading a table completely, using an index, performing a nested loop or a hash join. How are execution plans created How are execution plans created Parse: - The first phase is to parse the SQL query for syntaxes and create a query processor tree which defines logical steps to execute the SQL. This process is also called as ‘algebrizer’. Optimize: - The next step is to find a optimized way of executing the query processor tree defined by the ‘algebrizer’. This task is done by using ‘Optimizer’.’Optimizer’ takes data statistics like how many rows, how many unique data exist in the rows, do the table span over more than one page etc. In other words it takes information about data’s data. These all statistics are taken, the query processor tree is taken and a cost based plan is prepared using resources, CPU and I/O. The optimizer generates and evaluates many plan using the data statistics, query processor tree, CPU cost, I/O cost etc to choose the best plan. The optimizer arrives to an estimated plan, for this estimated plan it tries to find an actual execution plan in the cache. Estimated plan is basically which comes out from the optimizer and actual plan is the one which is generated once the query is actually executed. Execute: - The final step is to execute the plan which is sent by the optimizer. How are execution plans created The creation of an execution plan takes time. Not every execution option is always explored, a “good enough” execution plan is often generated, then sent to the database engine for execution. The execution plan is estimated, and may change when the T-SQL is actually executed by the database engine. Execution plans are usually cached (in the plan cache) for later use so that it can be reused if an identical (or paramerized) query is submitted for execution again. Reusing a cached execution plan can save time because a new execution plan does not have to be recreated each time the same query is re-executed. How to get an execution plan SQL Server has multiple ways to get execution plans. The two most important methods are: Graphical The graphical representation of SQL Server execution plans is easily accessible in the Management Studio but is hard to share. Especially because detailed information for the individual operations is only visible when the mouse is over the particular operation ("hover"). Text The table wise execution plan is hard to read but easy to copy. The table includes all the information in show shot. Graphical Execution plan Interpreting Graphical Execution Plans You read a graphical execution plan from right to left and top to bottom. Icons (operators) - The icons you see in the above execution plan are 2 of the several operators that represent various actions and decisions that potentially make up an execution plan. Arrows - The arrow pointing between two operators represent data being passed between them. The thickness of the arrow reflects the amount of data being passed, thicker meaning more rows. Costs (per operator) - Below each icon is displayed a number as a percentage. This number represents the relative cost to the query for that operator Tooltips Each of the icons and the arrows has, associated with it, a pop-up window called a ToolTip, which you can access by hovering your mouse pointer over the icon. Physical Operation - Lists the physical operation being performed for the node,such as a Clustered Index Scan, Index Seek, Aggregate, Hash or Nested Loop Join,and so on Logical Operation—Lists the logical operation that corresponds with the physical operation, such as the logical operation of a union being physically performed as a merge join. Estimated I/O Cost—Indicates the estimated relative I/O cost for the operation. Preferably, this value should be as low as possible. Estimated CPU Cost—Lists the estimated relative CPU cost for the operation. Estimated Number of Executions—Lists the estimated number of times this operation will be executed. Estimated Operator Cost—Indicates the estimated cost to execute the physical operation. For best performance, you want this value as low as possible. Tooltips (contd.) Estimated Number of Rows—Lists the estimated number of rows to be output by the operation and passed on to the parent operation. Estimated Row Size—Indicates the estimated average row size of the rows being passed through the operator. Estimated Subtree Cost—Lists the estimated cumulative total cost of this operation and all child operations preceding it in the same subtree. Object—Indicates which database object is being accessed by the operation being performed by the current node. Predicate—Indicates the search predicate specified for the object in the original query. Seek Predicates—Indicates the search predicate being used in the seek against the index when an index seek is being performed. Output List—Indicates which columns of data are being returned by the operation. Ordered—Indicates whether the rows are being retrieved via an index in sorted order. Logical and Physical Operators Each operator implements a single basic operation, such as: Scanning data from a table Seeking data in a table Aggregating data Sorting Joining two data sets Etc. In total, there are 79 different operator that can be included in an execution plan. Table Scan Seeing a table scan often indicates a problem that needs to be addressed. A table scan indicates that every row in the table had to be examined to see if it met the query criteria, which can mean slow performance if there are a large numbers of rows. A table scan indicates there is no clustered index on the table, and the table is a heap. In most cases, you will want to add a clustered index to every table, as it has the potential of boosting the performance of the query. Clustered Index Scan A clustered index scan is a scan of all the rows of a table that has a clustered index. Like a table scan, clustered index scans can be slow and use up lots of server resources. Generally, clustered index scans should generally be avoided (but better than table scans). On the other hand, when tables are small or many rows are returned, then a clustered index scan might be the fastest way to return data. Clustered Index Seek If there is an available and useful index, and there is a sargeable WHERE clause, the query optimizer can usually, very quickly, identify the rows to be returned and return them without having to scan each row of the table. Ideally, for best query performance, clustered index seeks should be as used often as feasible. Consider them the “golden standard” for returning data. Nonclustered Index Scan All records in the table are scanned, and all rows that match the WHERE clause are returned. As with all scans, it can be slow and require extra I/O resources. Generally, non-clustered index scans should be avoided. Nonclustered Index Seek A non-clustered index is used to identify the row(s) to be returned, so every row does not need to be scanned (assumes sargeable WHERE clause). This is generally much faster than a non-clustered index scan. Like clustered index seeks, non-clustered index seeks are generally a good thing. One exception is if bookmark (RID or Key) lookups occur as part of the non-clustered index seek, then performance may lag if many rows are returned. RID Lookup/ Key Lookup A RID/Key Lookup is generally an indicator of a performance issue. A RID Lookup is a form of a bookmark index lookup on a table without a clustered index (a heap). A Key Lookup is a form of a bookmark index lookup on a table with a clustered index. While Lookups are often faster than most “scans,” this is often not the case if many rows have to be returned. Generally, RID Lookups should be eliminated with the addition of an appropriate clustered index, and if necessary, a covering or included index. Joins (Loop, Merge, Hash) The nested loop join compares each row from one table (the “outer table”) to each row from the other table (the “inner table”), looking for rows that satisfy the join predicate. The merge join works by simultaneously reading and comparing the two sorted inputs one row at a time. For each step, it compares the next row from each input. If the rows are equal, it outputs a joined row and continues. If the rows are not equal, it discards the lesser of the two inputs and continues. The hash join algorithm executes in two phases known as the “build” and “probe” phases. During the build phase, it reads all rows from the first input, hashes the rows on the equijoin keys, and creates or builds an in-memory hash table. During the probe phase, it reads all rows from the second input (often called the right or probe input), hashes these rows on the same equijoin keys, and looks or probes for matching rows in the hash table. Text execution plan References http://msdn.microsoft.com/enus/library/ms175913.aspx www.bradmcgehee.com