Common Analysis Services Multidimensional Design Mistakes and How to Avoid Them Chris Webb www.crossjoin.co.uk Who Am I? • Chris Webb – – • UK-based consultant and trainer: – – • www.crossjoin.co.uk www.technitrain.com Co-author of several books: – – – – • • chris@crossjoin.co.uk Twitter @Technitrain MDX Solutions Expert Cube Development with SSAS 2008 Analysis Services 2012: The BISM Tabular Model Power Query for Power BI and Excel SQL Server MVP Blogger: http://cwebbbi.wordpress.com Agenda • • • • • • • • • • • Why good cube design is a Good Thing Using built-in best practices in BIDS ETL in your DSV User-unfriendly names Unnecessary attributes Parent/child pain One cube or many? Over-reliance on MDX Unused and/or unprocessed aggregations Non_Empty_Behavior Cell Security Why Good Design is Important! • As if you needed reasons…? • Good design = good performance = faster initial development = easy further development = simple maintenance • This is not an exhaustive list, but a selection of design problems and mistakes I’ve seen on consultancy engagements Best Practices in SSDT-BI • Don’t ignore the blue squiggly lines in SSDT! – They sometimes make useful recommendations about what you’re doing • Actively dismissing them, with comments, is a useful addition to documentation • As always, official ‘best practices’ aren’t always best practices in all situations Common Design Mistakes • Three questions need to be asked: – What’s the problem? – What bad things will happen as a result? – What can I do to fix it (especially after I’ve gone into production)? • This is not a name-and-shame session! Problem: ETL in your DSV • It’s very likely, when you are working in SSAS, that you need changes to the underlying relational structures and data – Eg you need a new column in a table • You then have two options: – Go back to the relational database and/or ETL and make the change – Hack something together in the DSV using named queries and named calculations • The DSV is the easy option, but… Consequences: ETL in your DSV • It could slow down processing performance – No way to influence the SQL that SSAS generates – Expensive calculations/joins are better done once then persisted in the warehouse; you may need to process more than once • It makes maintenance much harder – DSV UI is not great for writing SQL – Your DBA or warehouse developer certainly won’t be looking at it Fix: ETL in your DSV • Bite the bullet and either: – Do the necessary work in the underlying tables or ETL packages – Create a layer of views instead of using named queries and calculations • Use the Replace Table With option to point the table in the DSV at your new view/table • No impact on the rest of the cube! Problem: Unfriendly Names • Cubes, dimensions and hierarchies need to have user-friendly names • However names are often user-unfriendly – Unchanged from what the wizard suggests, or – Use some kind of database naming convention • Designing a cube is like designing a UI • Who wants a dimension called something like “Dim Product”….? Consequences: Unfriendly Names • Unfriendly names put users off using the cube – These are the names that users will see in their reports, so they must be ‘report ready’ – Users need to understand what they’re selecting • Also encourage users to export data out of cube to ‘fix’ the names – And so you end up with stale data, multiple versions of the truth etc etc etc Fix: Unfriendly Names • You can rename objects easily, but: – This can break calculations on the cube – It can also break existing queries and reports, which will need rewriting/rebuilding – IDs will not change, which makes working with XMLA confusing • You should agree the naming of objects with end users before you build them! Problem: Unnecessary Attributes • Wizards often generate attributes on dimensions that users don’t want or need • Classic example is an attribute built from a surrogate key column – Who wants to show a surrogate key in a report? Consequences: Unnecessary Attributes • The more attributes you have: – The more cluttered and less useable your UI – The slower your dimension processing – The harder it is to come up with an effective aggregation design Fix: Unnecessary Attributes • Delete any attributes that your users will never use • Merge attributes based on key and name columns into a single attribute • Set AttributeHierarchyEnabled to false for ‘property’ attributes like email addresses • Remember that deleting attributes that are used in reports or calculations can cause more problems Problem: Parent Child Hierarchies • Parent Child hierarchies are the only way to model hierarchies where you don’t know the number of levels in advance • They are also very flexible, leading some people to use them more often than they should Consequences: Parent Child • Parent Child hierarchies can lead to slow query performance – No aggregations can be built at levels inside the hierarchy – Slow anyway • They can also be a nightmare for – Scoping advanced MDX calculations – Dimension security Fix: Parent Child • If you know, or can assume, the maximum depth of your hierarchy, there’s an alternative • Normal user hierarchies can be made ‘Ragged’ with the HideMemberIf property – Hides members if their parent has no name, or the same name as them • Still has performance issues, but less than parent/child • You can use the BIDS Helper “parent/child naturaliser” to convert the underlying relational table to a level-based structure Problem: One Cube or Many? • When you have multiple fact tables do you create: – One cube with multiple measure groups? – Multiple cubes with one measure group? • Each has its own pros and cons that need to be understood Consequences: One Cube • Monster cubes containing everything can be intimidating and confusing for users • Also tricky to develop, maintain and test – Often changing one thing breaks another – Making changes may take the whole cube offline • Securing individual measure groups is a pain • If there are few common dimensions between measure groups and many calculations, query performance can suffer Consequences: Multiple Cubes • If you need to analyse data from many cubes in one query, options are very limited • A single cube is the only way to go if you do need to do this • Even if you don’t think you need to do it now, you probably will do in the future! Fix: One Cube to Multiple • If you have Enterprise Edition, Perspectives can help overcome usability issues • Linked measure groups/dimensions can also be used to split out more cubes for security purposes • If you have one cube, you probably don’t want to split it up though Fix: Multiple Cubes to One • Start again from scratch! • LookUpCube() is really bad for performance • Linked measure groups and dimensions have their own problems: – Duplicate MDX code – Structural changes require linked dimensions to be deleted and recreated Problem: Over-reliance on MDX • As with the DSV, it can be tempting to use MDX calculations instead of making structural changes to cubes and dimensions • A simple example is to create a ‘grouping’ calculated member instead of creating a new attribute • Other examples include pivoting measures into a dimension, or doing m2m in MDX Consequences: Over-reliance on MDX • MDX should always be your last resort: • Pure MDX calculations are always going to be the slowest option for query performance • They are also the least-easily maintainable part of a cube • The more complex calculations you have, the more difficult it is to make other calculations work Fix: Over-reliance on MDX • Redesigning your cube is a radical option but can pay big dividends in terms of performance • Risks breaking existing reports and queries but your users may be ok with this to get more speed Problem: Unused Aggregations • Aggregations are the most important SSAS feature for performance • Most people know they need to build some and run the Aggregation Design Wizard… • …but don’t know whether they’re being used or not Consequences: Unused Aggregations • Slow queries! • If you haven’t built the right aggregations, then your queries won’t get any performance benefit • You’ll waste time processing these aggregations, and waste disk space storing them Fix: Unused Aggregations • Design some aggregations! • Rerun the Aggregation Design Wizard and set the Aggregation Usage property appropriately • Perform Usage-Based Optimisation • Design aggregations manually for queries that are still slow and could benefit from aggregations Problem: Unprocessed Aggregations • Even if you’ve designed aggregations that are useful for your queries, you need to ensure they’re processed • Running a Process Update on a dimension will drop all Flexible aggregations Consequences: Unprocessed Aggregations • Slow queries! (Again) Fix: Unprocessed Aggregations • Run a Process Default or a Process Index on your cube after you have run a Process Update on any dimensions • Note that this will result in: – Longer processing times overall – More disk space used • But it will at least mean that your queries run faster Problem: Non_Empty_Behavior • The Non_Empty_Behavior property was introduced in SSAS 2000 as a performance hint • It was very important in SSAS 2000/2005 • It says: If Regular Measure A is Null Then Calculated Measure B will be Null • But it is usually set incorrectly • And from SSAS 2008 on it is mostly unnecessary SQLDay 2013 Consequences: Non_Empty_Behavior • Setting Non_Empty_Behavior incorrectly can – At best, make no difference to your performance – At worst, result in incorrect query results SQLDay 2013 Fix: Non_Empty_Behavior • Don’t set it! • Rewrite your calculations to check for null values: IIF( ISEMPTY(MEASURES.A), NULL, <CALCULATION> ) SQLDay 2013 Problem: Cell Security • Cell security allows you to secure individual cells in a cube • At first sight, it is the only option: it does things dimension security cannot do… SQLDay 2013 Consequences: Cell Security • Setting cell security correctly can be very difficult • Read and read contingent cell security can be even more confusing • Performance is often very bad with cell security: – Forces cell-by-cell mode – Forces Formula Engine caching to query scope SQLDay 2013 Fix: Cell Security • The answer is to use Dimension Security instead • But how…? • Create a new dimension that contains the combinations of members that you want to secure • Then use Dimension Security on it • Doesn’t work if you need to secure measures SQLDay 2013 Thanks!