Uploaded by jitukd

DATA MINNING

advertisement
DATA
MINNING
PROJECT
Right click on the column and ‘Rename’ this column “Date”.
On the ‘Modeling’ tab now click on ‘New Column’ to create a calculated column in the new
table, use this expression:
Datekey = Format('Calendar'[Date], "YYYYMMDD")
Right click on the new column called “DateKey” and select “Hide In Report view”.
In the ‘Modeling’ tab, under ‘Formatting’ and ‘Data Type’ change the “DateKey” Column’s
DataType to “Whole Number”.
|
1
Note: A warning will inform you that changing the data type will affect how the data is stored
thus may have an impact elsewhere. This is an important note however should not affect your
current tasks.
Now a Calendar table is created you can switch to the ‘Relationship’ perspective and drag
the Calendar[DateKey] over to the Orders[DateKey] to create their relationship.
You can extend the Calendar by creating the additional calculated columns using the
following table of expressions:
Note: Please read through the table and try to see the differences in the DAX. To
save time select a few columns to add into your model, this is only proof of concept.
Also note you can add additional columns at any time. Realistically you would try to
do as little modeling as required, so there is no need to create fields that may be
redundant and unnecessary.
Column Title
Formula
Comments
Year
Year = YEAR('Calendar'[Date])
Extracts years from date/time
Year2
Year2 = FORMAT('Calendar'[Date],
“YYYY”)
Alternative - isolates year from
date/time
Year3
Year3 = LEFT('Calendar'[DateKey], 4)
Alternative - Extracts first 4 letters
from the left of a text datatype.
Month
Month = Month('Calendar'[Date])
Extracts month numbers from date/time
Month2
Month2 = MID('Calendar'[DateKey], 5,
2)
Alternative - Extracts characters from
texts based on specific starting point
Month Name
Month Name =
FORMAT('Calendar'[Date], "MMMM")
Extracts the month name from date/time
Week Number
Week Number = "Week " &
Weeknum('Calendar'[Date])
Shows the number of the week in the year
Week Number2
Week Number2 = "W" &
FORMAT(Weeknum('Calendar'[Date]),
"00")
Alternative - Shows the number of the
week in the year with a leading zero for
the first nine weeks
Day Of Month
Day Of Month = DAY('Calendar'[Date])
Displays the number of the day of the
month
Day Of Month2
Day Of Month2 =
FORMAT(Day('Calendar'[Date]),"00")
Alternative - Displays the number of the
day of the month with a leading zero for
the first nine days
Day Of Week
Day Of Week =
WEEKDAY('Calendar'[Date],2)
Displays the number of the day of the
week – week starting by Monday
Week Day Name
Week Day Name =
FORMAT('Calendar'[Date],"dddd")
Displays the name of the weekday
Week Day Name2
Week Day Name2 =
FORMAT('Calendar'[Date],"ddd")
Alternative - Displays the name of the
weekday as a three letter abbreviation
Weekday/Weekend
Weekday/Weekend = IF('Calendar'[Day
Calculates if day is weekday or weekend
|
2
Of Week]<=5, "Weekday", "Weekend")
– this relies on 'Calendar'[Day Of Week]
ISO Date
ISO Date = [Year] & [Month2] & [Day
Of Month2]
Displays the date in the ISO
(internationally recognised) format of
YYYYMMDD
Full Date
Full Date = [Month2] & " " & [Month
Name] & " " & [Year]
Displays the full date with spaces
Full Date2
Full Date2 =
FORMAT('Calendar'[Date], "DD MMMM
YYYY")
Alternative – using a user-defined
date/time formats via format function
Quarter
Quarter = "Quarter " &
ROUNDDOWN(MONTH('Calendar'[Date])/4,
0)+1
Displays the current quarter
Quarter Abbr.
Quarter Abbr. = "Qtr "
&ROUNDDOWN(MONTH('Calendar'[Date])/4
,0)+1
Displays the current quarter as a three
letter abbreviation plus the quarter
number
Quarter Year
Quarter Year = [Year] & " Qtr" &
ROUNDDOWN(MONTH('Calendar'[Date])/4,
0)+1
Shows the year and quarter abbreviation
Current Year
Current Year =
IF(YEAR('Calendar'[Date])=YEAR([Toda
y]), TRUE(), FALSE())
Tests to see if date is in the current
year
Current Year
Month
Current Year Month =
IF(YEAR('Calendar'[Date])=YEAR([Toda
y]),
IF(MONTH('Calendar'[Date])=MONTH([To
day]), FALSE()))
Tests to see if date is in current year
and current month
Current LTM
Current LTM =
IF('Calendar'[Date]>DATE(YEAR(TODAY(
))-1, MONTH(TODAY()), 1)-1 &&
'Calendar'[Date]<=DATE(YEAR(TODAY())
, MONTH(TODAY()), 1)-1, TRUE(),
FALSE())
Tests to see if date is between the
last day of last month, last year and
Last day of last month, this year.
Note: LTM stands for Last Twelve Months.
LTM Group
LTM Group = CONCATENATE("LTM Group:
", IF
(MONTH('Calendar'[Date])>MONTH(TODAY
()-DAY(TODAY())), (YEAR(TODAY()DAY(TODAY()))YEAR('Calendar'[Date]))*12,
(YEAR(TODAY()-DAY(TODAY()))YEAR('Calendar'[Date])+1)*12))
Groups dates together based on twelve
months periods. Note: LTM stands for
Last Twelve Months.
Please only read methods 7A and 7B as alternatives to the above.
7A) Creating a Date Table via External Sources
An alternative to manually creating a calendar table is to import an existing table. This is no different to
importing any other data sources or files. Simple click Get Data and import then link in the relationships.
Note: You do not need to follow this section.
|
3
This method will only work if you have a database to connect to and that database includes a
Date table. You will additionally need to modify the TSQL statement accordingly.
Click Get Data, then nnavigate to ‘SQL Database’ and open the ‘Advanced Options’
Note: this will allow you to write in a SQL query manually.
Enter the following syntax:
select * from dbo.DimDate
where FullDate >= '2015-01-01'
and FullDate < '2018-01-01'
After inserting the new table navigate to the ‘Relationships’ perspective to create the
relevant relationships.
Note: Ensure the Cross Filter Direction is on Both. This is the most common, default
direction. This means for filtering purposes, both tables are treated as if they're a
single table. More information around can found by clicking here.
7B) Extracting Date Table from Columns or Measures
A second alternative to the above methods is to extend an existing table with date
information extracted from an existing data using DAX’s Text and Format functions.
This is most likely used in scenarios that need a ‘quick and dirty’ result, saving us
having to import and etc.
An example would be to turn Orders[DateKey] Into a three calculated coulmns as Date,
Month and Year via:
Days = Right(Orders[DateKey],2)
------------------------------------------------------------Month = MID(Orders[DateKey],5,2)
------------------------------------------------------------Year = LEFT(Orders[DateKey],4)
Or first breaking the Orders[DateKey] down into sections then concatenating the string back
together using a combination of DAX functions:
OrderDate=CONCATENATE(CONCATENATE(CONCATENATE(CONCATENATE(Rig
ht(Orders[DateKey], 2),"/"), MID(Orders[DateKey],5,2)),"/"),
LEFT(Orders[DateKey],4))
Note: This method is not necessary for manipulating Dates because of the vast time
intelligence functions in DAX however, these principles apply well with texts. E.g. to
extract details from email address such as emails ending in “.co.uk” being a “UK”
customer etc.
8) Filtering
Generally BI commonly looks at numbers (i.e. facts – elements that can be measured) against
descriptions (i.e. dimensions – elements that provide context); for example, looking at the total ‘Orders
Value’ by multiple ‘Market Names’. Due to this nature it is essential to enable users to filter facts by
descriptions to provide context specific data.
|
4
In Power BI there are a number of filtering features. It is important to note each method and know how
these are individually used but also how these can be combined together to drive an interactive report.





DAX Filters – Filters programmed into calculations
Slicers – Filters that are visually represented on the report page
Visual level filters – Filter that are off the page and only effect a specific visual
Page level filters – Filters that are off the page and effect all visuals on the report page
Report level filters – Filters that are off the page and effect all visuals on all report pages
10A) DAX Filters:
DAX can be used to create calculations that take into account a filter context, thus DAX can create
filtered elements such as measures to evaluate the order value for a specific year. To do this, on the
ribbon click on the dropdown labeled ‘New Measure’ and select ‘New Measure’ from the list. In the
formula bar that appears insert the following DAX scrip.
2014 Orders = CALCULATE(SUM(Orders[OrderValue]),
'Calendar'[Year]=2014, Orders[LostReasonKey]=-2)
Note: the ‘calculate’ function runs an expression i.e. the sum of value where specific filters
contexts apply i.e. filtering down to the given ‘Year’ then further filtering down to only show
“LostReason=-2” that means the order was “Not lost”.
This context illustrates the importance of filtering the data. As seen above, the aggregate of
‘OverValue’ as the standard column from the ‘Orders’ table is redundant without filtering out the
lost orders and providing a time context.
Repeat these steps to create measures for both ‘2014 Orders’ and ‘2015 Orders’.
The two new measures should now be available in the field list. If the measures are not created in the
‘Orders’ table find them, then move them onto the Orders table. This is done by simply clicking on the
measure in the fields pane, whilst the element is highlighted clicking on the tab ribbon to ‘Modeling’ and
under the section ‘Properties’ section changing the ‘Home Table’ through the dropdown.
Note: Measures can be moved between tables as they are a calculated entity therefore not
dependent on any tables. Ideally measures should be kept on a table that represents a logical
relationship. For example, keeping the filtered 2015 orders and 2014 orders in the ‘Orders’ table.
Similar DAX filtering can also be applied when creating calculated columns or tables.
Add a new page to the report. Then drag out the ‘2015 orders’ measure into a table. Then from the
‘Product’ table add ‘ProductGroupDesc’ to the table. Now the order value is split into the relevant
Product Groups. This shows although measures are aggregates they can still be split via attributes.
9B) Slicers:
Slicers are selected from the Visualizations pane as a type of visual. Similar to their counterparts these
visuals are added to the canvas and promote an interaction point for users. These on-canvas filters allow
anyone segment the data by particular values.
|
5
Note: Slicers are one of the few visuals that can’t have a visual level filters applied to them,
however a page or repot level filter do effect the slicers. If you must filter a slicer a workaround
may be to use a hidden slicer to filter the user facing slicer or use a filtered calculated column
rather than the origin comprehensive column.
As two separate visuals insert slicers from the ‘Calendar’ table on ‘Date’ and another on ‘Month Name’.
Although one type of visual these two visuals are
presented and interact differently. This is because ‘Date’
is a ‘Date/time’ datatype and ‘Month Name’ is a ‘Text’
datatype. This highlights why it’s important to check
very column and its datatype initially when inserting the
table into the model.
Also note the ‘Date’ slicer is presenting “01/02/2012” to
“31/12/2016” so the table showing the measure ‘2015
Orders’ is presented. If we change the range to exclude
2015 dates such as “01/01/2016” to “31/12/2016” no
data will be returned for measure. This is because
Measure is already filter through DAX so the new slicers
are only adding extra conditions on top of calculation.
Note: The form of the slicer is dependent on the datatype of the field selected e.g. text columns will show
up as check-box style slicers however a date/time column will display a timeline style slicer and a date
picker.
Note: A ‘Search’ box can also be added to slicers across
text fields. This is very useful for filters such as ‘product
code’ as come shop floor and telesales users will know
specific codes
9C) Visual, Page and Report Level Filters:
These three levels of filtering are off the page filters. They essentially function like the slicers however
appear on a collapsible pane. The three versions do as their names suggest, each filtering either to a
visual or all the visuals on the page or all visual on all pages of a report.
|
6
Simply drop-and-drop ‘Sector’ from the ‘Markets’
table onto the ‘Page Level Filters’ area under the
‘filters’ section on the ‘visualizations’ pane.
A good example of these filters being used is when
creating an annual report. There is no need to
filter each calculation, nor visuals we just apply a
Report Level Filter to the pack ensuring all the
number reflect the year in question.
An example where is not a great fit is where an adhoc report is being created and all visuals are
running on the same context. A page or report
level filter then can become confusing for users.
Note: In most ad-hoc style reports designer apply filters as they developed. This approach is okay for a
power user however not ideal for the wider business users. Filters should be grouped together i.e. utilize
page or report level filter as much as possible. This is because modifications to the visuals then becomes
centralized and there is no need to check each visual individually. Planning is key here; it’s easy to tell
what reports are well planned or not just based on this simple concept.
As standard the user can also select to add ‘Advance Filtering’
options to the fields. These allow for conditional filtering,
much like the ones previously seen in the likes of Excel.
On a visual level however this is taken
to another level. As the visual is
isolated this allows for greater control
on its filtering. Here we also see more
advancements such as the ‘Top N’ filter.
Previously DAX was needed to calculate
this however now we see this as a
standard feature in the tool.
Note: The ‘Top N’ filter was voted in by the power BI Community as a common
feature and thus developed into the tool.
|
7
10) Tooltips
As the users hover over the visuals tooltips open a temporary window to display relevant information.
Due to the small display area tooltips are limited to displaying aggregated numbers, first/last values or
the count/count(Distinct) of the records. DAX can provide a workaround for this limitation as calculations
can convert rows of data into a string separated by delimiters to form a measure. Having collapsed the
rows into a single string the measure can display the full information within a tooltip.
To do this a variety of DAX calculations could be used, let’s examine some of these to see the difference
and learn more about the arrangement of DAX.
The simple solution would be collapse all the rows into a single string.
Note: DAX measure to display multiple values split by delimiter “, ”
Tooltip1 =
CALCULATE(CONCATENATEX(VALUES('Stock'[Subcategory]),'Stock'[Subcategor
y],", "))
The weakness of the above method is that when the numbers of values to display are very high we lose
details here or this may even not display at times. The calculation below limits the numbers returned and
displays the first 3 values. If additional values are present it also follows the statement with “ and
more…”
Note: This measure currently displays 3 values and a note however this could be modified to
display any set number or even a variable number
Tooltip2 =
var itemcount = DISTINCTCOUNT(StockItem[Color])
return
IF(itemcount > 3,
CONCATENATEX(TOPN(3,VALUES(StockItem[Color])),StockItem[Color],”,”)&”
and more…”,
CALCULATE(CONCATENATEX(VALUES(StockItem[Color]), StockItem[Color], “,
“)))
To advance on the last step, the DAX can be further improved to show the top 3 value rather than the
first 3 values; this is more likely to be required in the business world. This is a great example to show the
power of DAX but also show users can layer logic to advance their calculations. Advance logic can be
gained from layering up simple functions.
Note: In DAX “VAR” sets a variable, these optimize performance etc. however no need to worry
about such advance elements; as long as you get the right numbers, you’re a good position 
Tooltip3 =
VAR SubcategoriesCount = DISTINCTCOUNT('Stock'[Subcategory])
RETURN
IF(SubcategoriesCount >= 3,
CALCULATE(CONCATENATEX(TOPN(3,VALUES('Stock'[Subcategory])),'Stock'[Su
bcategory],", "))&" and more…",
CALCULATE(CONCATENATEX(VALUES('Stock'[Subcategory]),'Stock'[Subcategor
y],", ")))
|
8
11) Row Level Security
Row level security in Power BI is configured on the Power BI Desktop at the data model level then
published. Through the Power BI Service, the logged in user is then identified through the Active
Directory and aligned with the roles and rules to only be presented with the permitted data.
Rule - The DAX formula that limits the data visibility on each table e.g. Orders[Region] = “UK”
Role - The user groups that the rules effect e.g. “UK Market”
Under the ‘Modeling’ tab click on ‘Manage Roles’. This wizard will assist you with creating the roles.
In this example, we need three roles based on the table EBSType, these should be defined as:
Role: AirCare
Table: EBSType
Rule: [EBSType] = "AirCare"
Role: Non EBS
Table: EBSType
Rule: [EBSType] = "Non EBS"
Role: Total Solutions Table: EBSType
Rule: [EBSType] = "Total Solutions"
Note: The checkmark on top right validates the DAX expression for syntax errors.
Also note, in a later stage the members for these groups are added through the Power BI Service.
To test these roles out we can create a new page by clicking the yellow “+” tab on the bottom left side of
the canvas. Then creating a table with [EBSTypes]EBSTypes”, Orders[Count], Orders[OrderValue].
Under the ‘Modeling’ tab click on ‘View As Roles’. Then select the role “AireCare” then click ‘Ok’. You
should now see that the orders table is limited to only show lines associated with “AireCare”. Then click
“Stop Viewing” when you have done you testing.
|
9
Note: Multiple roles can be selected to be pre-viewed at one time.
Assuming we have finalised the report we press the ‘Publish’ button to send the model and report into
the Power BI Service (go to powerbi.com). In the service, on the left in the ‘Navigation’ pane you can
hover over the new Dataset we just published then click the ellipse “…” button to open the options then
select ‘Security’.
Note: As you can see the ellipse “…” option allows for important configurations for each dataset.
The ellipses are also available on Reports and Dashboards with various settings. To better
understand these options experiment with these.
Here we add users to the roles.
Note: Refer to the section: Creating a Demo environment to see how to set up an environment
with test users. If you’ve already done this or you are using a pre-configured setup, then you will
see users pop up as you start to type in names or email addresses.
Limitations: Row Level Security in only available on data imported into the model via Power BI
Desktop or for Direct Query options that are not SQL Server Analysis Services.
Also, the Row Level security for Q&A and Crotona is not yet available but on the road map.
CRM and AX7 carry forward the security setting into Power BI. You will however need to
thoroughly search on your specific versions and setups - eBECS can help if required.
|
10
What is DAX?
DAX is a collection of functions, operators, and constants that can be used in a formula, or expression, to
calculate and return one or more values. Stated more simply, DAX helps you create new information from
data already in your model.
Why is DAX so important?
You can even create reports that show valuable insights without using any DAX formulas at all.
But, what if you need to analyse growth percentage across product categories and for different date
ranges? Or, you need to calculate year-over-year growth compared to market trends? DAX formulas
provide this capability and many other important capabilities as well.
Learning how to create effective DAX formulas will help get the most out of your data. When you get the
information needed, you begin to solve business problems that affect your bottom line.
DAX Expressions vs. Excel Formulas
In Excel, we reference cells or arrays. However, in DAX it is much more like relational data. So, we reference
only tables, or columns, or columns that have been filtered. In Excel, we have less data types than we have
available to us in DAX. Some of the data types that you have in Excel will have implicit conversion from
Excel over to the data type that is supported within DAX.
DAX Syntax
DAX expressions contain Values, Operators and Functions.
Expressions always begin with “=”. Operators have a precedence order however these can be modified by
using parenthesis “( )”.
|
11
DAX Operators
Operators are of four types:
Arithmetic - Addition “+”. Subtraction “-”. Multiplication “*”. Division “/”. Exponentiation “^”.
Comparison - Equal “=”. Greater/Lesser “>”/ “<”. Greater/Lesser or equal “>=”/ “<=”. Not equal “<>”.
Text concatenation - Ampersand “&” e.g. [First Name] & [Last Name] & “from” & [Country]
Logical - Combine expressions into a single unit:
Logical ‘AND’ “&&” i.e. both expressions evaluate true then true, if not false. e.g. (([Cost] > 125.00) &&([State] = “TX”))
Logical ‘OR’ “||” i.e. if either expression evaluates to true then true, if else false. e.g. (([Cost] > 125.00) ||([State] = “TX”))
DAX Datatypes
DAX has a variety of datatypes. Import conversion occurs, allowing users to change/set datatypes. This is
then enhanced via data formats enabling additional abilities when using these values. E.g. setting the
‘calendar year’ column to Data Type: [DateTime] and Format: [(yyyy)] for only year to be extracted.
Most functions need a parameter to be passed in. These parameters mostly need to be of a certain
datatype. In most cases DAX will perform an implicit cast to transform the value in a suitable format.
However, if this fails an error is returned.
DAX Datatypes:
Whole Numbers.
Text.
N/A.
Decimal Numbers.
Currency.
Data.
True/false (i.e. Boolean).
Table (i.e. unique datatype for functions. e.g. using the [SUMX] function).
DAX Errors
There are two types of errors that incur when creating DAX expressions.
Syntactical errors (easiest to deal with). E.g. forgotten comma, or un-closed parentheses.
Semantic errors (more difficult). E.g. referring to a non-existing column or function or table.
Common Syntactic Errors:
|
12
DAX Functions
Mathematical functions in DAX are very like their siblings over in Excel.
Logical functions (by far the most common logical functions utilized), are the IF & OR statements.
Information functions are often testing values, as an input to another function, commonly nested inside
other functions.
DAX References
To build on functions please refer to the FUNCTION REFERENCE SITE that provides detailed information
including syntax, parameters, return values, and examples for each of the 200+ DAX functions.
|
13
DAX Evaluation Context
Calculated columns look at every row throughout the entire table using a row context.
Measures (a.k.a. metrics) utilize individual cells within the table, so the evaluation context of a measure
depends on the function and thus the functions declare which individual cells are used.
Note: Measures are preferred over calculated columns, wherever possible, as they compute faster and
allow for the evaluation context to be modified through filters or the calculate function.
Multiple table evaluation context does not pass the relationship between tables. E.g. If we tried
=Table#[Column#]+Table##[Column##] to create a calculated column from two different tables a
syntactical error will be generated. This is the case even with established relationships between the tables.
The work around for this is to use the [RELATED] function. The correct syntax in this case would be:
=Table#[Column#]+RELATED(Table##[Column##]). The related functions will pass the row context to the
related table throughout each of these calculated columns.
[ALL] function removes filter context from either the table or the table column passed in. It can be thought
of as "Remove all filters from a table or table and columns". E.g. syntax layout: ALL({<table> | <column>[,
<column>[,…]]} ). The ALL function removes filter context from either the table you pass in, or the table
column that you pass in.
[FILTER] function is a Boolean evaluation for each row of the table.
E.g. syntax layout: FLITER( <Table>, <Filter>)
Measures vs. Columns vs. Tables
DAX can output three key results:
Calculated Measures
Calculated Columns
Calculated Tables
When using DAX, a key foundation you must understand is when to use measures, columns and tables.
Calculated Measures return a value, these can be either numbers, dates or text.
Note: Generally best practice dictates when results are numeric thus work with aggregation, we should use
measures. These are processed on the go thus take up less resources (although cases may differ).
e.g. to calculate gross profit in £ via “Gross Profit = SUMX([Revenue]-[Cost of Goods Sold])”
|
14
Calculated Columns return an additional column of values for each row on the table.
Note: Generally best practice dictates to use columns when the results are unique for each row and cannot
be aggregated. This is generally cases where the outcome is dates, text or needs conditional formatting. As
these columns are relative to the table thus calculated and stored on the table they taking up more
resources than measures.
e.g. to extract the month name out of a full date we first extract the month number then translate
it into text via “Month Name = SWITCH([Month Number],
1, "Jan",
2, "Feb",
3, "Mar", …….. , BLANK())”
Calculated Tables return a table of rows based on a new calculation or an extract from a related table.
Note: Generally these are used when we need to restructure data, group or extract other lists forms from
exiting datasets.
e.g. to create a list of customers from a historical purchase records we could extract each distinct
costumer code via “Customers = DISTINCT('Table'[Customer Code])”
Measures, Columns and Tables working together
In this case example we have a table with transactional values and their corresponding dates; to gain better
granularity on our date filtering can create a calendar table to link into the original transactional table.
Note: here we are required to generate a table consisting of rows with unique values based on the
transaction dates recorded in our original dataset.
First we create two calculated measures to find the earliest and latest dates present via “Earliest
Date = MIN('Table'[Date])” and “Latest Date = MAX('Table'[Date])” on the transactional table.
Second we create a calculated table that also considers the measures to define the start and end
date range for the function that generates the table via “Calendar = CALENDAR([Earliest Date],
[Latest Date])”.
Then, we connect the two tables using the [Date] columns then hiding the transactional[Date]
columns. Subsequently, also hiding the two measures from report view in the transactional table.
Finally, we can create calculated columns to breakdown the Calendar[Date] column to form
columns such as [Year] via “Year = YEAR('Calendar'[Date])”.
Aggregators vs. Iterators
|
15
I.e. “SUM()” vs “SUMX()” First lets start with what both of these functions are.
SUM is an aggregator and SUMX is an iterator. They both can end up giving you the same result, but they
do it in a very different way. In short, SUM() operates over a single column of data to give you the result (the
aggregation of the single column). SUMX() on the other hand is capable of working across multiple columns
in a table. It will iterate through a table, one row at a time, and complete a calculation (like Quantity x Price
Per Unit) and then add up the total of all of the row level calculations to get the grand total.
e.g. “SUMX()”
If your Sales table contains a column for Quantity and another column for “Price Per Unit”, then you will
necessarily need to multiply Quantity by the “price per unit” in order to get Total Sales. It is no good adding
up the total quantity SUM(Quantity) and multiplying it by the average price AVERAGE(Price Per Unit) as this
will give the wrong answer. If your data is structured in this way (like the image above), then you simply must
use SUMX() via “Total Sales = SUMX (Sales, Sales[Qty] * Sales[Price Per Unit])”. Note: You can always
spot an Iterator function, as it always has a table as the first input parameter. This is the table that is iterated
over by the function.
e.g. “SUM()”
If your data contains a single column with the extended Total Sales for that line item, then you can use SUM()
to add up the values. There is no need for an iterator in this example because in this case it is just a simple
calculation across a single column. Note however you ‘could’ still use SUMX ()and it will give you the same
answer via “Total Sales Alternate = SUMX (Sales, Sales[Total Sales])”
Totals Don’t Add Up!
There is another use case when you will need to use SUMX that is less obvious. But when you encounter
this problem, you will need to use an iterator to solve it.
|
16
In the example below, the data is showing 10 customers shopping behavior over a 14-day period. Each
customer shops on a different number of days (anywhere between from 5 days to 9 days in this example).
The column “Average Visits Per Day 1” is calculating how many times each customer shopped on average
(for the days they actually shopped). Customer 10001 shopped on average 1.3 times each day, and
customer 10002 only came in once for each day shopped.
But do you spot the problem above?
The Grand Total of 6.8 doesn’t make any sense. The average across all customers is not 6.8 (it is actually
1.4). The problem is that the Grand Total above is calculated as 95 (total visits) divided by 14 (total days).
But by aggregating the Visits and Count of days BEFORE calculating the average means that you end up
losing the uniqueness in the customer level detail.
This problem would not occur if you use an iterator, because an iterator calculates each line one at a time.
The aggregator on the other hand adds up all the numbers BEFORE the calculation – effectively losing the
line level detail required to do the correct calculation.
|
17
Lookups in DAX
When you define a calculated column, you are writing a DAX expression that will be executed in a row
context. Since USERELATIONSHIP requires a CALCULATE to be used and the CALCULATE applies a context
transition when executed within a row context, obtaining the expected behavior is not easy.
Apply USERELATIONSHIP to RELATED
If you create a calculated column in FactInternetSales, you might want to use RELATED choosing the
relationship to use. Unfortunately, this is not possible. For example, if you want to denormalize the day name
of week of the order date, you write:
FactInternetSales[DayOrder] = RELATED ( DimDate[EnglishDayNameOfWeek] )
But what if you want to obtain the day name of week of the due date? You cannot use CALCULATE and
RELATED together, so you have to use this syntax instead:
FactInternetSales[DayDue] =
CALCULATE (
CALCULATE (
VALUES ( DimDate[EnglishDayNameOfWeek] ),
FactInternetSales ),
USERELATIONSHIP ( DimDate[DateKey], FactInternetSales[DueDateKey]),
ALL ( DimDate ))
There are 2 CALCULATE required in this case: the outermost CALCULATE applies the USERELATIONSHIP to
the innermost CALCULATE, and the ALL ( DimDate ) filter remove the existing filter that would be generated
by the context transition. The innermost CALCULATE applies the FactInternetSales to the filter condition
and thanks to the active USERELATIONSHIP, its filter propagates to the lookup DimDate table using the
DueDateKey relationship instead of the OrderDateKey one. The result is visible in the following screenshot.
Even if this syntax works, I strongly discourage you using it, because it is hard to understand and it is easy to
write wrong DAX code here. A better approach is using LOOKUPVALUE instead, which does not require the
relationship at all.
FactInternetSales[DayDue] =
LOOKUPVALUE (
DimDate[EnglishDayNameOfWeek],
DimDate[DateKey],
FactInternetSales[DueDateKey] )
|
18
Test your understanding – use File: SourceA
This source looks at basic financial data to keep thing simple. Use your new Power BI and DAX skills to
create the required calculations. Note: the outcomes are already also in the file so check you have reached
the same figures.
SourceA
Gat data > “SourceA”
Then try to create the following:
Q1 - Create ‘Gross Sales’ Calculated Column
[Units Sold] * [Sales Price]
Q2 - Create ‘Net Sales’ Measure
( [Units Sold] * [Sale Price] ) - [Discounts]
Q3 – ‘Profit’ Measure
=[Sales] - [COGS]
Q4 - Create a Calendar/Date Table
Q4a - Split the Date into Day, Month, Year using the appropriate functions
=DAY([Date])
=MONTH([Date]) <OR> =FORMAT([Date],“MMM”)
=YEAR([Date]) <OR> =Format([Date,“YYYY”])
Q4b - Replace the Month Number with Month Names
=SWITCH([MonthNum],
1, "Jan",
2, "Feb",
3, "Mar",
4, "Apr",
5, "May",
6, "Jun",
7, "Jul",
8, "Aug",
9, "Sep",
10, "Oct",
11, "Nov",
12, "Dec",
BLANK())
Q4c - Insert a Quarters as a Calculated Column
=CONCATENATE(“Q”,ROUNDUP(MONTH([Date])/3,0))
Q4d – insert the Week numbers as a calculated column
=CONCATENATE(“Week “,RIGHT(CONCATENATE(“0”,WEEKNUM([Date])),2))
Q4e – Grab the system date as a measure
=TOYDAY()
|
19
Creating a Demo environment
A demo environment can be created to test most Power BI features and experiments with the service –
apart from the gateways- these will need further setups and configuration.
The demo environment can be created by following the following steps:
In a search engine look for “Free Trial: Microsoft Dynamics CRM”.
Sign up for a free 30-day trial.
This will be the main admin account thus you can also test the admin features.
In the Office 365 Admin Centre you can create a few test users.
Ensure to enable the Product Licenses for Power BI on these user accounts.
Finally go into Power BI online and login using these accounts and upgrade them for a free 60-day
trial for Power BI Pro.
In the Power BI Service, a work group can then be created to include the members added.
|
20
Data Mashup Advance
This exercise looks combining data from a range of flat files then explore this data through intermediate to
advance calculations. Through this exercise we shall explore DAX, M queries and R.
1) Get Data
Note: use Folder “FX”
In a new power BI file, go to ‘Get Data’, ‘File’, ‘Folder’, “Browse”, then select “FX Samples\Data Sources”.
Note: Four options are presented at this point:
‘Edit’ will open the ‘Query Editor’ with the list of details for the Binary files i.e. .xlsx
‘Load’ will import the list of details for the Binary files i.e. .xlsx into the data model
‘Combine & Load’ will open the data model and automatically setup the Binary files to be
Appended/combined data sources – as long as the files’ columns structurally match
‘Combine & Edit’ will open the ‘Query Editor’ and automatically setup the Binary files to be
Appended/combined data sources – as long as the files’ columns structurally match
Select ‘Combine & Edit’, then in the ‘select the object to be extracted from each file’ select “Sheet1”.
Note: this step would allow the selection of various sheets/objects, if these are present in the file.
In this case there is only one sheet available in the data files.
Note: in the ‘Queries’ area, on the left hand pane, notice the various quires generated by
selecting this option. Folder have also been generated here to organize new elements. Under the
‘Other Queries’ folder, the query ‘Data Sources’ is the result of the combine effort. Also note, a
new column ‘Source.Name’ has also been generated to help identify the source for each row.
Note: The next step is to clean up the files. Here there are two options:
Edit/clean the result of the query – under the folder ‘Other Queries’ start adding steps onto the
‘Data sources’ query.
Edit/clean each file before the query systematically – use the folder ‘Transform File from Data
Sources’ and ‘Sample Query’ to programmatically invoke transformations to each file.
Note: There are four elements generated here:
Sample File Parameters – hosts the location of each file in the M query for the combine function
Sample File – illustrates which one of the files from the folder is presented as the sample file
|
21
Transform Sample File from Data Sources* – the sample files the steps are added to
Transform File from Data Sources – the function that is the result of the steps added to the
sample file*
Select the ‘Transform Sample File from Data Sources’ then from the ribbon under ‘Query’ section select
‘Advance Editor’. In the editor past the following M syntax.
let
Source = Excel.Workbook(#"Sample File Parameter1", null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Replace ""null"" with Blank" = Table.ReplaceValue(#"Promoted Headers",null,"",Replacer.ReplaceValue,{"GRG", "Gambling", "Company Number", "Company Name", "Client Type", "Deal Number", "Option Leg Number", "Linked Deal Number", "Original Deal
Number", "Department Code", "Department Name", "Trade Date", "Input Date", "Value Date", "Client Group Code", "Client Group Name", "Client Code", "Client Name", "Client Contact", "Client Country ID", "Client Country Code", "Client Country",
"Industry Code", "Industry", "Client Size Category Group", "Client Size Category", "Client Size Sub-Category", "First Trade Date", "First Trade Year", "First Trade Month", "First Trade Deal Number", "Client Registration Date", "FCE Contract Type",
"FCE Sub Contract Type", "GRP Contract Type Code", "Contract Type", "Contract Type Group", "Broker Code", "Broker Name", "GRP Method Code", "GRP Medium Code", "GRP Lead Source Code", "Main Source", "Sub Source", "Partner Type", "Partner Group
Code", "Partner Group Name", "Partner Code", "Partner Name", "Reason for Trade", "Dealer ID", "Dealer Name", "Salesman ID", "Salesman Name", "Profit Currency", "Buy Currency Code", "Buy Currency", "Buy Amount", "Sell Currency Code", "Sell
Currency", "Sell Amount", "Option Notional Currency", "TX Fee Nominal", "TX Fee Currency", "28 Day Profit Marker", "Buy / Sell", "Client Buy (Profit Ccy)", "Client Sell (Profit Ccy)", "Broker Buy (Profit Ccy)", "Broker Sell (Profit Ccy)",
"Option Notional (Notional Ccy)", "Option Premium - Bank Side (Profit Ccy)", "Option Premium - Client Side (Profit Ccy)", "Category Fee (Profit Ccy)", "Cross Refrence Fee (Profit Ccy)", "TX Fee (TX Fee Ccy)", "Partner Commission (Profit Ccy)",
"Average FX Rate (Profit Ccy)", "Average FX Rate (Option Notional Ccy)", "Average FX Rate (TX Ccy)", "Notional", "Turnover", "Cost of Sales", "Total Trading Revenue", "Category Fee", "Cross Refrence Fee", "TX Fee", "Partner Commission", "Net
Profit"}),
#"Replace ""NULL"" with Blank" = Table.ReplaceValue(#"Replace ""null"" with Blank","NULL","",Replacer.ReplaceValue,{"GRG", "Gambling", "Company Number", "Company Name", "Client Type", "Deal Number", "Option Leg Number", "Linked Deal Number",
"Original Deal Number", "Department Code", "Department Name", "Trade Date", "Input Date", "Value Date", "Client Group Code", "Client Group Name", "Client Code", "Client Name", "Client Contact", "Client Country ID", "Client Country Code", "Client
Country", "Industry Code", "Industry", "Client Size Category Group", "Client Size Category", "Client Size Sub-Category", "First Trade Date", "First Trade Year", "First Trade Month", "First Trade Deal Number", "Client Registration Date", "FCE
Contract Type", "FCE Sub Contract Type", "GRP Contract Type Code", "Contract Type", "Contract Type Group", "Broker Code", "Broker Name", "GRP Method Code", "GRP Medium Code", "GRP Lead Source Code", "Main Source", "Sub Source", "Partner Type",
"Partner Group Code", "Partner Group Name", "Partner Code", "Partner Name", "Reason for Trade", "Dealer ID", "Dealer Name", "Salesman ID", "Salesman Name", "Profit Currency", "Buy Currency Code", "Buy Currency", "Buy Amount", "Sell Currency Code",
"Sell Currency", "Sell Amount", "Option Notional Currency", "TX Fee Nominal", "TX Fee Currency", "28 Day Profit Marker", "Buy / Sell", "Client Buy (Profit Ccy)", "Client Sell (Profit Ccy)", "Broker Buy (Profit Ccy)", "Broker Sell (Profit Ccy)",
"Option Notional (Notional Ccy)", "Option Premium - Bank Side (Profit Ccy)", "Option Premium - Client Side (Profit Ccy)", "Category Fee (Profit Ccy)", "Cross Refrence Fee (Profit Ccy)", "TX Fee (TX Fee Ccy)", "Partner Commission (Profit Ccy)",
"Average FX Rate (Profit Ccy)", "Average FX Rate (Option Notional Ccy)", "Average FX Rate (TX Ccy)", "Notional", "Turnover", "Cost of Sales", "Total Trading Revenue", "Category Fee", "Cross Refrence Fee", "TX Fee", "Partner Commission", "Net
Profit"}),
#"Replace ""n/a"" with Blank" = Table.ReplaceValue(#"Replace ""NULL"" with Blank","n/a","",Replacer.ReplaceValue,{"GRG", "Gambling", "Company Number", "Company Name", "Client Type", "Deal Number", "Option Leg Number", "Linked Deal Number",
"Original Deal Number", "Department Code", "Department Name", "Trade Date", "Input Date", "Value Date", "Client Group Code", "Client Group Name", "Client Code", "Client Name", "Client Contact", "Client Country ID", "Client Country Code", "Client
Country", "Industry Code", "Industry", "Client Size Category Group", "Client Size Category", "Client Size Sub-Category", "First Trade Date", "First Trade Year", "First Trade Month", "First Trade Deal Number", "Client Registration Date", "FCE
Contract Type", "FCE Sub Contract Type", "GRP Contract Type Code", "Contract Type", "Contract Type Group", "Broker Code", "Broker Name", "GRP Method Code", "GRP Medium Code", "GRP Lead Source Code", "Main Source", "Sub Source", "Partner Type",
"Partner Group Code", "Partner Group Name", "Partner Code", "Partner Name", "Reason for Trade", "Dealer ID", "Dealer Name", "Salesman ID", "Salesman Name", "Profit Currency", "Buy Currency Code", "Buy Currency", "Buy Amount", "Sell Currency Code",
"Sell Currency", "Sell Amount", "Option Notional Currency", "TX Fee Nominal", "TX Fee Currency", "28 Day Profit Marker", "Buy / Sell", "Client Buy (Profit Ccy)", "Client Sell (Profit Ccy)", "Broker Buy (Profit Ccy)", "Broker Sell (Profit Ccy)",
"Option Notional (Notional Ccy)", "Option Premium - Bank Side (Profit Ccy)", "Option Premium - Client Side (Profit Ccy)", "Category Fee (Profit Ccy)", "Cross Refrence Fee (Profit Ccy)", "TX Fee (TX Fee Ccy)", "Partner Commission (Profit Ccy)",
"Average FX Rate (Profit Ccy)", "Average FX Rate (Option Notional Ccy)", "Average FX Rate (TX Ccy)", "Notional", "Turnover", "Cost of Sales", "Total Trading Revenue", "Category Fee", "Cross Refrence Fee", "TX Fee", "Partner Commission", "Net
Profit"}),
#"Replace ""0"" with blank on text columns" = Table.ReplaceValue(#"Replace ""n/a"" with Blank",0,"",Replacer.ReplaceValue,{"GRG", "Gambling", "Company Number", "Company Name", "Client Type", "Deal Number", "Option Leg Number", "Linked Deal
Number", "Original Deal Number", "Department Code", "Department Name", "Trade Date", "Input Date", "Value Date", "Client Group Code", "Client Group Name", "Client Code", "Client Name", "Client Contact", "Client Country ID", "Client Country Code",
"Client Country", "Industry Code", "Industry", "Client Size Category Group", "Client Size Category", "Client Size Sub-Category", "First Trade Date", "First Trade Year", "First Trade Month", "First Trade Deal Number", "Client Registration Date",
"FCE Contract Type", "FCE Sub Contract Type", "GRP Contract Type Code", "Contract Type", "Contract Type Group", "Broker Code", "Broker Name", "GRP Method Code", "GRP Medium Code", "GRP Lead Source Code", "Main Source", "Sub Source", "Partner
Type", "Partner Group Code", "Partner Group Name", "Partner Code", "Partner Name", "Reason for Trade", "Dealer ID", "Dealer Name", "Salesman ID", "Salesman Name", "Profit Currency", "Buy Currency Code", "Buy Currency", "Sell Currency Code", "Sell
Currency"}),
#"Replace Blank with ""0"" on num cols" = Table.ReplaceValue(#"Replace ""0"" with blank on text columns","",0,Replacer.ReplaceValue,{"Notional", "Turnover", "Cost of Sales", "Total Trading Revenue", "Category Fee", "Cross Refrence Fee", "TX
Fee", "Partner Commission", "Net Profit"}),
#"Changed Type" = Table.TransformColumnTypes(#"Replace Blank with ""0"" on num cols",{{"GRG", type text}, {"Gambling", type text}, {"Company Number", type text}, {"Company Name", type text}, {"Trade Date", type date}, {"Input Date", type
date}, {"Value Date", type date}, {"First Trade Date", type date}, {"Client Registration Date", type date}, {"Buy Amount", type number}, {"Sell Amount", type number}, {"Option Notional Currency", type number}, {"TX Fee Nominal", type number}, {"TX
Fee Currency", type number}, {"Notional", type number}, {"Turnover", type number}, {"Cost of Sales", type number}, {"Total Trading Revenue", type number}, {"Category Fee", type number}, {"Cross Refrence Fee", type number}, {"TX Fee", type number},
{"Partner Commission", type number}, {"Net Profit", type number}, {"Client Type", type text}, {"Deal Number", type text}, {"Option Leg Number", type text}, {"Linked Deal Number", type text}, {"Original Deal Number", type text}, {"Department Code",
type text}, {"Department Name", type text}, {"Client Code", type text}, {"Client Name", type text}, {"Client Contact", type text}, {"Client Country ID", type text}, {"Client Country Code", type text}, {"Client Country", type text}, {"Industry
Code", type text}, {"Industry", type text}, {"Client Size Category Group", type text}, {"Client Size Category", type text}, {"First Trade Deal Number", type text}, {"FCE Contract Type", type text}, {"FCE Sub Contract Type", type text}, {"GRP
Contract Type Code", type text}, {"Contract Type", type text}, {"Contract Type Group", type text}, {"Broker Code", type text}, {"Broker Name", type text}, {"GRP Method Code", type text}, {"GRP Medium Code", type text}, {"GRP Lead Source Code",
type text}, {"Main Source", type text}, {"Sub Source", type text}, {"Partner Type", type text}, {"Partner Group Code", type text}, {"Partner Group Name", type text}, {"Partner Code", type text}, {"Partner Name", type text}, {"Reason for Trade",
type text}, {"Dealer Name", type text}, {"Salesman Name", type text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"GRG", "Gambling", "Company Name", "Client Type", "Deal Number", "Original Deal Number", "Department Name", "Trade Date", "Client Group Name", "Client Code", "Client Name",
"Client Contact", "Client Country", "Industry", "Client Size Category Group", "First Trade Date", "First Trade Year", "First Trade Month", "Client Registration Date", "Contract Type", "Contract Type Group", "Broker Name", "Main Source", "Sub
Source", "Partner Type", "Partner Group Code", "Partner Group Name", "Partner Name", "Reason for Trade", "Dealer Name", "Salesman Name", "28 Day Profit Marker", "Notional", "Turnover", "Cost of Sales", "Total Trading Revenue", "Category Fee",
"Cross Refrence Fee", "TX Fee", "Partner Commission", "Net Profit"})
in
#"Removed Other Columns"
Note: All the features in the Query Editor result to an ‘M’ syntax. Here we’ve used an existing
script to recreate steps previously achieved. Altering the M syntax can also help alter the query to
a more advanced level, however it can also break the query if not executed correctly – be careful
as this does not have Auto Save nor an Undo feature.
Generally, it is advised to use the wizard to generate the M and not manually input it any code,
although as you can now see sometimes manual input also has advantages.
At this point the steps have now been applied to all the files, then the files have been combined together
to form ‘Data Sources’. Select the outcome from ‘Other Queries’. See error message “The column
'Company Number' of the table wasn't found”.
To resolve this on the ‘Applied Steps’ select the last step “Changed Type”. Delete this step. Then select all
the columns and from the tab ‘Transform’, section ‘Any Column’, select ‘Detect Data type’.
Note: The error is caused as the function tried to apply data types on the column from the source
however as we have removed some of the columns in our transformations we needed to
overwrite this step to ensure its only looking for existing columns.
|
22
Download