18/03/2025 Only theoretical questions from the first part of the course. Examples: What is a primary key. What happens if you do a vertical lookup in excel without using a primary key? Stuff like that. We are gonna separate the data source from the analysis file: up to today we were opening basic dataset which now is our external data source, where our data resides. We are gonna connect to that file with another file. From now on we have data that could be anywhere in two meanings: it can be in any kind of structure (excel file, database, website) AND in any location (our computer, external server). Download basic dataset. Open empty excel file. From the technical pov we’re gonna activate POWER PIVOT. o Top left “File” —> options —> add-ins —> COM Add-ins, go —> Power pivot for excel, ok With this add-in we can use databases within excel. We’re gonna use 2 databases: power query and power pivot. But we will only use deeply power pivot (power query will just be a temporary database that will flow the data in). We want to overcome all the limitations that excel has shown until now. Excel is an environment flexible, pretty easy to use, but it has a few drawbacks: 1. amount of data that it can host (1 mln rows, but struggles from 100k), 2. It’s an unreliable data source (unreliable storage of data) because it’s not a database, 3. The pivot tables we create in excel are bound to a single table (this makes it difficult to deal with corporate data), 4. When we created calculated fields (now we will call them measures), we don’t have any language (now we have DAX, structured language). To overcome all of these limitations, we will use databases within excel. Data in corporations are in databases (professional containers of data, not excel files). Storage database is a database with the purpose of storing the data in a reliable way (you can access the data with different rights, etc.). These storage databases are located into servers, which are machines that are remotely reachable by anybody within the organisation. Within the databases there are tables in which there is the data that we are interested in to analyse (sales data, shipping, purchases data, etc.). First, we need to connect to the server where the database resides (with the tables containing the data we are interested in): this process is called connecting. Once we connect, we extract a copy of these table: this process is called importing (extract the tables and then transform them, data are not clean and for legal reasons you have to keep also the old “dirty” version). First phase is called ETL: extract, transform and load (the new “clean” version). We load the new version in DM (data modelling), we say load because we load the data in a different database. This second database is again a storage database, but a temporary one in which we host the data we extracted to clean them ( we are still in storage technology). The third step is an analysis database (still a database but with different technology: not designed to store the data, but to analyse it). Here we connect the tables via relationships to avoid vertical lookups, we will write columns and measures (evolution of calculated fields). The next step (still step 3) is called visualisation: we’re gonna create some kind of report. Finally, there is a fourth step (which we will not go through) called sharing:once i connect to the data, i take the data i want, i clean it, load it in an analysis database, I create a report on top of that, then i need to share this report (make it available). THESE STEPS ARE CALLED BUSINESS INTELLIGENCE (BI). The ETL will be done by power query (storage database hosted in excel). For the analysis we will use power pivot (analysis database hosted in excel). For the visualisation part we shall still use a visualisation table, or a pivot chart. For the sharing part, there is nothing that Excel offer, it needs a cloud system. Now you have a blank excel file: o “Power pivot” —> manage (a new window opens) —> (we’re gonna pretend that basic dataset is our database). o “Data” —> “get data” —> “from file” (excel workbook) —> import (then it shows you the content)—> flag “select multiple items” then select everything (though normally we don’t do that) —> we have the option to transform the data (click it, because we cannot modify the data in the data source)—> you have power query (the header is no longer row 1 because this is a storage database) —> “close and load to…” (pretending to have transformed this) “only create connection” (because we don’t want to load the data in a classic excel table, because we need to load into power pivot) “add this data to the data model” “ok” The first part is now finished. We loaded the data into an analysis database. The data is not visible here because we need to open the power pivot environment. Close “queries and connections”. Go to “power pivot” —> manage (now we opened the analysis database, filled with our data). Now we save the file and start the data modelling part o “Power pivot”—> manage—> change “data view” into “diagram view” Now we want to create a pivot table. If we try the usual way (minimise window to go back in classic excel: insert, pivot table), excel doesn’t find anything because data is not there (before we where in a table in excel while now there is nothing)—> we need to create a power pivot. o Go back in power pivot —> “pivot table”—> “new worksheet” “ok” Now we have our pivot table (it’s fundamentally different from the classic one, although it looks the same, because in the field list we have tables). Now select out of the sales table the fields (pretend it’s a classic pivot table). To see the sales amount by plant select both (rows: prod plant, values: sales amt). If i want to see the revenues by sales area (select it from customer table, in rows, deselect the prod plant). We see that we have an issue now (this does not work). We need to refresh how the number with prod plant is calculated by excel: first step is filtering prod plant for the value of the column ( the value filter is PLT01, there are 10 rows). This column belongs to the sales table and the calculation i do after the filter is still on sales amt (same table—> that’s why everything works). When i select sales-area we are in a different table, that’s why the filter applied does not work. We are filtering the table customer but the calculation is on the sales table: we are not filtering the sales table because we do not want to do a vertical lookup. Now we have to connect the two tables (create a relationship to make everything work): go on manage. When we did the vertical lookup we used the customer number and now we create a connection between these two fields o click on customer no in the customer table, keep it pressed, and connect it to customer no in sales table, once you overlap release the button (this is enough ti make the pivot table work). No vertical lookup, just connection between these two fields (now we can browse by any column in this table). If now we want to filter based on the product series: if we try to connect the product no we can’t (it’s checking if we have a primary key on either side, if you don’t have it, it doesn’t allow you to do it). In the connection with the customer side we have a 1 on the customer side (primary key on that table) and a star on the other side (external key on the other side of the table). Now we need to generate the primary key in the product table and the foreign key on the foreign table. Generating the column we need to write code (dax formula). o To create a column in power pivot we go in “data view” —> select product table from the bottom—> double click on add column (Primary key), click enter and now we write the code of the column —> (remember you cannot address rows here), we can use the fields names to concatenate columns, select: table name[column name] & (symbol of concatenation) table[column name] (in our case product(prod no)& product(prod plant) )click enter (no need to double click to propagate the calculation, automatically applied to the entire column) o Now go to the sales table and generate the foreign key. Go to sales table, add new column “product foreign key” and write the code if the column in the sales table (maintain the same concatenation but use the table name “sales”). Now go on the diagram view and connect primary key to foreign key) Now on the pivot table we can browse the numbers also by product. We are missing the calendar: create connection between two tables. We want the calendar to reflect the invoice date (though based on your role you might prefer shipping date etc), we connect it to date (primary key is found). Now we can browse any column from any table without vertical lookup. Filter by rows (yearsmonth) and values (sales amt, order quantity). Now we want to calculate the WEIGHTED AVERAGE PRICE o in classic excel: we used a calculated field (click on pivot table, home, insert —> here it’s disabled). o Here we have a real database underneath so we’re gonna write a measure (evolution of calculated field): power pivot—> measures—> new measure o Assign the measure to a table (sales, you can change it later), give it a name (weighted average price). If we try to write sales[sales amt]/sales[order quantity] as we did in excel, here it does not work (because we cannot divide two columns, while in excel it automatically added a sum). To make it work: SUM(sales[sales amt])/ SUM(sales[order quantity]) —> check formula (it works)—> “ok”. Notice that you have to add decimals, in the field list it will appear with fx cause it’s not a column If now i want see the dax code of the measure i go back to power pivot—> measures —> manage measures We need to create a measure also for the first two calculations (sales amt and order quantity) for two reasons: 1. If i want to simulated an increase/decrease by a certain percentage in one or both the calculations (i need to modify the formula to do it); 2. o Let’s create a new measure called revenues: SUM(sales[sales amt]) and create a new measure called “quantity”: SUM(sales[order quantity]) o If the quantity decreases (eg. by 0.8), the revenues should not change, but the average price should change (that’s the second reason why we are changing the measure). o Edit quantity formula by adding “ *0,8 “ —> only the quantity changed, not the unit price: that’s because we need to change the code (by replacing sales[order quantity] with quantity) remember: for columns include the table name; for measures exclude the table name. o When we substitute revenues in the weighted average price, we use the function DIVIDE ([Revenues]; [Quantity]) 11/04/2025 Recap of what we have done in the power pivot environment (today we’ll go in power BI): now we use databases embedded in Excel. The first difference with what we were doing before is that we didn’t open basic dataset but we connected to it (it has become our datasource), different steps (extraction of data i need —> cleaning data —> load them into a suitable database for analysis, called data modelling), we did the first part with power query in excel (ETL is the general part and power query is the practical solution, which is a storage database hosted in excel), once the data is what i want i load it in an analysis database (which from the practical pov is power pivot, but the real name is TABULAR, or COLUMNAR DATABASE, it deals with columns while the other with rows; this makes it more suitable for vertical operations: grouping, summing, aggregating columns), in the data modelling phase we also created relationships (within table ti avoid vertical lookups), and measures (indicators, KPIs), we can create columns if needed. In power BI we can also create tables while in power pivot we cannot. The third phase is data visualisation (we use picot table), and the fourth phase is sharing(nothing is offered by Excel). This four phases are called BUSINESS INTELLIGENCE. Going back to the file. We connect to the file: - [ ] Power query —> data —> get data —> launch power query editor—> data source setting (and connect to basic dataset) Here we are in the visualisation step. For data modelling we need to go in power pivot (activate it from file —> options —> add-ins —> com add-ins —> go —> flag power pivot for excel). - [ ] Manage —> we can see the data we loaded after the cleaning (tabular) We can see the data here in two ways: data view or model view (diagram view, here we see the relationships we have created). Then from here we created a pivot table (within power pivot to be able to connect to the data). We manage a power pivot table the same way we would do with a normal pivot table: rows, columns, values (here we put measures written in dax). There are other typical aggregations that we do in analysis (until here we have summed the values of a column with function SUM, or divided two numbers with DIVIDE). Let’s duplicate this tab: - [ ] right click on the bottom —> move or copy —> flag create a copy —> ok Remove everything from values and leave only revenues, splitted by yearmonth in rows. The business problem is the following: i need to compare through time, countries or whatever, the average revenue per customer. My KPI is Average Revenue per customer. The first problem is “what do we mean as number of customer” —> to count the customer i should use sales table but there we have duplicates. To count the number of customer in a correct way, we take the revenues from the sales table and count the rows (ignore the same customer). Go on the pivot table, measure, new measure, select sales table, call it “nr active customers” - [ ] DISTINCTCOUNT(Sales[Cust No]) (remember that customers buy in different periods, that’s why there isn’t ADDITIVITY, the total is not the sum of customers per month) (distinct creates a list of customer in this case, so it creates a table, but we need to count them, that’s why we use distinctcount —> difference between table functions and scalar functions) Now, create the Average Revenue per Customer measure: - [ ] DIVIDE( Revenues ; Nr Active Customers ) Now let’s remove the month and put Country in rows. You can compare the revenues between two countries but it’s not that useful because for example Italy and Germany have a different amount of population: in order to compare them, you have to normalise them (although not perfect: the perfect thing would be to compare revenues against the population), we can use AVG Revenue per Customer. The third operation we want to do is related to this example here, go down —> move or copy —> flag create a copy, ok Filter by Salesarea (in rows), leave only “revenues” flagged. Suppose that the indicator now it’s different and i need to calculate the AVG Revenue per Sales line (one transaction related to a single product, an order is an aggregation of single lines). We have to use as input an entire table, not that if you use count for the nr of rows it’s wrong (this function is based on a column and it counts the nr of cells that are not blank: this can be ok in some cases but it’s WRONG because we don’t have to consider whether the cell is blank or not). We use - [ ] “Nr sales rows”: COUNTROWS (Sales) - [ ] DIVIDE ( Revenue ; Nr Sales Rows ) For the number of orders, i do a distinctcount of sales orders. Duplicate the whole thing again and put in rows “YearMonth” FOR THE FINAL TEST. Suppose that you have a shop in an outlet, Friday is holiday and you want to stay close of Saturday and Sunday: if you are closed, you don’t sell, therefore you want to estimate how much you will lose for staying close those three days. Calculate the AVG Revenue per day. You need your calculate the number of days in which you are open: think about a proxi. If I look at the days in the sales table, i can take into consideration the Invoice Date and, because i can sell multiple times in the same day, i have to do a DISTINCTCOUNT —> this is not perfect, but it’s a great estimate (also, cause if there are so manu days in which i don’t sell that lead my measure to not be relevant, then i would close). So, to solve the problem: - [ ] “Nr Selling Days”: DISTINCTCOUNT (Sales [invoice date]) - [ ] “AVG Revenue per Day”: DIVIDE (Revenues ; Nr SellingDays ) NOW FROM EXCEL WE GO TO POWER BI (download it on your computer if needed) First of all you want to connect to data: get data like in excel, this is exactly power query. Get data —> select “excel” —> connect to advanced dataset. In the final exam; the dataset will be either basic or advanced. We get the usual set of tables: sales and as dimensions we take calendar, customer, product and sales territory (in this dataset, the country is not a property of the customer). We have the “transform data”, we don’t have to worry about the load in (all the selection we did in power pivot), because here we don’t have the excel option, here only BI. We can just transform data to open power query. Exactly the same environment as we have seen before, we could clean the data but we don’t have time, so we simply want to load, “close and apply”. Now we are loading the data into tabular, exactly the same thing we did in excel. When the data is loaded, if i want to see it i have to go in “table view” (equivalent of data view in power pivot), you can select the table on the right and below we have the modal view (which is the diagram view in power pivot, we see that power BI has already done some relationships for us). Power BI has established a one to many relationship between the sales territory table, using the primary key to sales; the same for product (productkey—>product key), and customer (customerkey —> customerkey). Only the calendar is not connected. So we have a data model, like we have in Excel Power Pivot, the relationships, DAX, etc. if we go in the first selector, this is called Report View. Create a measure: o Go on “sales”, three dots “new measure”. 1 Revenues = 2 SUM ( 3 Sales[Sales amt] 4) Select Revenues and the object. In “Visualization”, under Build Visual, select Matrix. If we want to see the revenues by colors, put color in rows (like any pivot). We can also choose to have a bar chart instead of the matrix (but let’s keep the matrix). To generate a new object, click outside of the visual, choose another icon to generate a new object (otherwise, if you don’t click outside, you modify the current object). Create a clustered bar chart with quantity by country: write measure quantity and slice by country. 1 Quantity = 2 SUM ( 3 Sales[OrderQuantity] 4) There is interactivity between visuals (if you select one thing over one, the other one will change accordingly and if you reclick everything goes back to normal). Basically, anything that is not a table, bar-chart , line chart and not many other things, don’t use it for business purposes. Another thing you can use is called card. New measure: Nr Active Customers 1 Nr Active Customers = 2 DISTINCTCOUNT ( Sales[CustomerKey]) To show something through time, use a line chart: date in x axis and revenues in y axis. HOWEVER, there is no connection between the sales and calendar table, we have to generate the relationship. Go on model view, connect Date in Calendar to Order Date in Sales. Now the line chart works. 08/05/2025 Put Power BI in english: File —> opzioni e impostazioni —> opzioni —> impostazioni internazionali —> select english at the top, close and reopen. Click on the table with “Color | Revenues”, ctrl+C to copy it, click the + at the bottom and create a new tab. Ctrl+V to paste the object. Copy paste also the “english occupation: clerical, etc”. Sync the visuals (so what you do here will be mirrored also in the other page). If you want to see the code of the measure revenue, click in it and you will see it. DAX code of it: 1 Revenues = 2 SUM ( Sales[SalesAmount] ) Same for Nr Active Customers (to see the result just drag and drop it on the visual): 1 Nr Active Customers = 2 DISTINCTCOUNT (Sales[CustomerKey] ) We made a quantity measure: 1 Quantity = 2 SUM ( Sales[OrderQuantity] ) Recap. To create a measure AVG Revenue per Customer (combination of existing measures): go on the table you wanna create it on (sales), three dots, new measure (note: to change line Shift+Enter): 1 AVG Revenue per cust = 2 DIVIDE ( [Revenues], [Nr Active Customers] ) There is a line connecting the beginning and end of the function. Now on the visual we have just revenues (Values) filtered by color (Rows). We want to use a measure called Nr Sales Rows: this counts the number of rows of the sales table and we use countrows. New measure: 1 Nr Sales Rows = 2 COUNTROWS ( Sales ) This measure needs a table as input. Now select this in the visual and remove “revenues”. Business problem. Top management wants to know the percentage of the sales rows coming from a certain kind of customer, eg. the customer with EnglishOccupation = Clerical. Now we need a function that takes a table as input and provides a table as output; up to now we only used scalar functions (a function that has as output a single value, eg. sum, divide, distinctcount, countrows). We will define for the first time a table function (which is a function that gives as output a set of values, and a set of values is a table, the input of this function is a table). Obliviously, the number of rows of output table is smaller than the one of the input table if i am applying a property (filter). It can only be equal when the property is true for every row, but surely it cannot be bigger. This kind of function is called FILTER (Table, Condition) We cannot see the EnglishOccupation because it’s not in sales, it’s a property of the Customer (so we will have to go in Model view: We can say to power BI that we want to check a condition in a column that is not directly on the table we are in, but it’s on a table that is connected one-to many to the table we are in(we cannot do the opposite, we can only search in the table with the primary key, i am on the many side and search in the one side (SEARCH for properties of a primary key and foreign key). New measure: 1 Nr Sales Rows Clerical = 2 COUNTROWS ( 3 FILTER ( 4 Sales, 5 RELATED (Customer [EnglishOccupation]) = “Clerical” 6 ) 7) Now add the new measure to the visual. New measure. 1 Nr Sales Rows Clerical Pct = 2 DIVIDE ( 3 [Nr Sales Rows Clerical], 4 [Nr Sales Rows] 5) On the visual: nr sales rows filtered by color. This time the problem is the following: i still want to find a percentage but not between a reduction of the rows and all the rows; this time we want to calculate out of the number of rows of each color, the percentage that this number represents out of the total (eg. Pct out of total nr sales rows represented by Black or by Blue, etc.). We already have the numerator (nr sales rows), the problem this time is the denominator: we need to have the total. We need to create a measure that might look incorrect, we need to take a table in input and have an output table that is bigger than the initial one: if i have an table function that takes a table with a filter and returns the table ignoring the filter, then i can make it —> this is what we call a technical measure (a measure that the customer should not see because it looks wrong). New measure. 1 ALL Nr Sales Rows = 2 COUNTROWS ( ALL (Sales) ) Countrows is a scalar function, 60000 rows are in ALL(sales) (ALL returns the entire table without filters, it retrieves all the table, even the rows in which i have nothing). 1 Nr Sales Rows Pct = 2 DIVIDE ( 3 4 5) [Nr Sales Rows], [ALL Nr Sales Rows] All of these three calculations are important for indicators, many times indicators are percentages so we need to be able to calculate nominator and denominator. The measures will still work if i filter my measures by country instead of by color. What i did with my previous measures is that the total of Nr Sales Rows was the value in every row of ALL Nr Sales Rows. If now i want the total of Nr Sales Rows Clerical as the value of every row i create a new measure. 1 ALL Nr Sales Rows Clerical = 2 COUNTROWS ( 3 FILTER ( 4 ALL (Sales), 5 RELATED (Customer [EnglishOccupation]) = “Clerical” 6 ) 7) First we expand ignoring the filter, then on that table we reduce for Clerical There can be two different way of thinking: ● start from the Nr sales rows clerical and add an ALL around Sales ● start from COUNTROWS ( ALL (, then restrict by adding the FILTER in between and the RELATED after 4 calculations you need to know: o Nr Sales Rows: in this first behaviour, we are not protecting the sales table anywhere, we are completely passive to any filter (just passive) —> countrows o Nr Sales Rows Clerical: in this second behaviour, we are still passive against the filter (sales is not protected) but we select a subset (passive with a selection) —> countrows filter o ALL Nr Sales Rows: now we completely protect the sales table (we ignore every filter and always expose the total number of rows) —> countrows all o ALL Nr sales rows clerical: ignoring the filters but applying a selection on top of that. —> countrows filter all 16/05/2025 We calculated Nr Sales Rows Clerical in order to know the impact of a particular kind of customer on Nr Sales Rows. We defined for the first time table functions, and that a measure must be the result of a scalar function, a measure needs to be a single value. Scalar functions take many inputs and give one single result, at the end a measure needs to be the result of a function like that, but that doesn’t mean that you cannot use within the code of a measure a table function. The function SUM actually does not exist, it works only for easy basic calculations, it’s not a DAX function but a “syntax sugar”. Duplicate page, filter revenues and Nr sales rows by color. Notice that countrows needs a table as input, while sum needs a column: we cannot use FILTER in this case. Actually, we should use: SUMX (Table, Expression) which means go row by row on a table, calculate this expression in every row, and aggregate the result with a sum. With this “new formula” we can also apply filter. Now let’s rewrite Revenues: 1 Revenues = 2 SUMX ( 3 Sales, 4 Sales[SalesAmount] 5) 6 - - which is the same as writing SUM ( Sales[SalesAmount] ) We cannot make the new formula mire meaningful by supposing that we don’t have the sales amount column, but only quantity and unit price. 1 Revenues = 2 SUMX ( 3 Sales, 4 Sales[UnitPrice] * Sales[OrderQuantity] 5) In order to apply the filter for clerical: 1 Revenues Clerical = 2 SUMX ( 3 FILTER( 4 Sales, 5 RELATED(Customer[EnglishOccup]) = “Clerical”, 6 ), 7 Sales[UnitPrice] * Sales[OrderQuan] 8) From line 3 to 6 we have our table (with the filter applied). 1 ALL Revenues = 2 SUMX ( 3 ALL (Sales), 4 Sales[UnitPrice] * Sales[OrderQuantity] 5) 1 ALL Revenues Clerical = 2 SUMX ( 3 FILTER ( 4 ALL ( Sales ), 5 RELATED(Customer[EnglishOccup]) = “Clerical” 6 ), 7 Sales[UnitPrice] * Sales[OrderQuantity] 8) Remember that the KPI, the measure, is one thing; while the grouping (eg. color, country etc) it’s something else and independent. Now filter Nr Sales Rows by Country. Type a new measure to create the same result without using countrows. 1 Nr Sales Rows = 2 SUMX ( 3 Sales, 4 1 5) A measure that finishes in X (eg. MINX, AVERAGEX, MAXX, SUMX) goes row by row, calculate something row by row and returns what the name says (eg. MIN, AVERAGE, MAX, SUM)—> these kind of functions are called ITERATORS (table, expression). There is another function that goes row by row in a table, FILTER, and filter is also an iterator but, contrary to the others, it’s not a scalar function (it’s a table function, in fact it does no aggregate itself, but you can do a sumx etc on a filter). When we need to consider more than one condition, we can use AND and OR: two different operators that work on a set of propositions, which are any statement that can be either true or false (there are also statements called undecisables). A function is DIVIDE, an operator in / With AND use &&, with OR use || it’s better to use the operators in these two cases cause it’s easier to put more than one proposition. 1 MAX Revenues Clerical = 2 MAXX ( 3 FILTER 4 Sales, 5 RELATED(Customer[EnglishOccup]) = “Clerical”, 6 ), 7 Sales[UnitPrice] * Sales[OrderQuan] 8) Notice: if you think the formula is not working cause you have the same result everywhere, try filtering for color or country etc. 1 Revenues Clerical OR professional= 2 SUMX ( 3 FILTER ( 4 Sales, 5 OR ( 6 RELATED(Customer[EnglishOccup]) = “Clerical”, 7 RELATED(Customer[EnglishOccup]) = “Professional” 8 ) 9 ), 10 Sales[UnitPrice] * Sales[OrderQuan] 11 ) We have to think in “and-logic” and “or-logic”, in this case if i put AND it means that someone has both the occupations, while i just needed one of the two (the result in the case of OR will be bigger). The suggestion is to “kill the OR” and use the operator. 1 Revenues Clerical OR Professional = 2 SUMX ( 3 FILTER 4 Sales, 5 RELATED(Customer[EnglishOccup]) = “Clerical” || 6 RELATED(Customer[EnglishOccup]) = “Professional” 7 ), 8 Sales[UnitPrice] * Sales[OrderQuan] 9) 1 Revenues Clerical 2003 = 2 SUMX ( 3 FILTER ( 4 Sales, 5 AND ( 6 RELATED(Customer[EnglishOccup]) = “Clerical”, 7 RELATED(Calendar[CalendarYear]) = 2003 8 ) 9 ), 10 Sales[UnitPrice] * Sales[OrderQuan] 11 ) 1 Revenues Clerical 2003= 2 SUMX ( 3 FILTER 4 Sales, 5 RELATED(Customer[EnglishOccup]) = “Clerical” && 6 RELATED(Calendar[CalendarYear]) = 2003 7 ), 8 Sales[UnitPrice] * Sales[OrderQuan] 9) 1 Revenues Clerical OR 2003 = 2 SUMX ( 3 FILTER ( 4 Sales, 5 OR ( 6 RELATED(Customer[EnglishOccup]) = “Clerical”, 7 RELATED(Calendar[CalendarYear]) = 2003 8 ) 9 ), 10 Sales[UnitPrice] * Sales[OrderQuan] 11 ) 1 Revenues Clerical OR 2003= 2 SUMX ( 3 FILTER 4 5 6 7 8 9) Sales, RELATED(Customer[EnglishOccup]) = “Clerical” || RELATED(Calendar[CalendarYear]) = 2003 ), Sales[UnitPrice] * Sales[OrderQuan] If I type “Revenues 2003 OR Condition1 AND Condition2”, the part with AND will be done first, and after that the OR (“Revenues 2003 OR (Condition1 AND Condition2)”). If you want to do the OR first and then put it in AND with the rest, you’re forced to put a parenthesis “Revenues (2003 OR Condition1) AND Condition2”. 21/05/2025 What happens when you mix in the same calculation AND and OR: the precedence of operators is going to matter. If we want to specify country, occupation and year: 1 Revenues Germany AND Clerical OR 2003 = 2 SUMX ( 3 FILTER 4 Sales, 5 RELATED(SalesTerritory[Country]) = “Germany” && 6 RELATED(Customer[EnglishOccup]) = “Clerical” || 7 RELATED(Calendar[CalendarYear]) = 2003 8 ), 9 Sales[UnitPrice] * Sales[OrderQuantity] 10 ) Revenues Germany AND Clerical OR 2003 is the same as Revenues (Germany AND Clerical) OR 2003 Remember, if you use round parenthesis you impose priority 1 Revenues Germany AND (Cleric OR 2003)= 2 SUMX ( 3 FILTER 4 Sales, 5 RELATED(SalesTerritory[Country]) = “Germany” && 6 ( 7 RELATED(Customer[EnglishOccup]) = “Clerical” || 8 RELATED(Calendar[CalendarYear]) = 2003 9 ) 10 ), 11 Sales[UnitPrice] * Sales[OrderQuantity] 12 ) If you write DAX code in a measure it’s different from when you write it in a column: depending on the object you are creating you have to understand a few things. Let’s slice our matrix by Product Key and focus on the Revenues: if i copy the code for revenues, go to the product table (“table view”—> “product” on the right) and paste in a “new column” as “Product Revenues”; i see that the result is not the same. If now i select the sales table and create a new column there to classify the price as high, low or medium: 1 Price category = 2 IF ( 3 Sales[UnitPrice]<100, “Low”, 4 IF ( 5 Sales[UnitPrice]<300, “Medium”, 6 “High” 7 ) 8) If I try to write this code in a measure, it does not work. In fact, if i go on “Report View”, i create a new measure and i paste this last code, it doesn’t work. This clarifies that the two objects, measures and columns, have different rules in terms of DAX: the first problem to solve is that the same DAX code in a measure applied to a column produces a different result; the second problem is that if i do the opposite, it doesn’t work in a measure. The fact is: when i am in the product table, with my code i am comparing a whole column (unit price) to a single number (100, 300), this is working because the code goes row by row. Here we are in a ROW CONTEXT (the DAX code is evaluated one row at the time). Therefore, when you write the code of a column, you can compare one column with one value because that column is calculated row by row (so from the practical point of view you are comparing one value with one value). Every time you are in a table and create a column, the DAX code is evaluated row by row in the table on which you created the column itself. However, a measure is not evaluated row by row: the first problem is that we are not on a table, in fact we are in a report here; the second problem is that a measure doesn’t have a value in a table (if it did, it would be a column—> in a measure there is no row context). What we have in a measure is a FILTER CONTEXT: it filters a table and we calculate the measure based on that. In measures you consider filters but you do not go row-by-row. In columns there is no filter because there is no report there (we filter only when we are in the reporting side). Only a column can generate a grouping and therefore a filter, because columns have values in every row of a table. If we want the measure to work in a column, we have to apply the filter: 1 Product Revenues = 2 SUMX ( 3 FILTER ( 4 Sales, 5 Sales[ProductKey] = Product[ProductKey] 6 ), 7 Sales[UnitPrice] * Sales[OrderQuantity] 8) Iterators (such as sumx) create a row context on a table, otherwise there would be only a filter context. So you can create a row context on a measure; while in a column the row context is already there but there is no filter (you could add the filter context in a column but it’s more advanced). Remember that a column is a value for every row of a table What is the difference between measures and columns? The main characteristic of a column is being a set of values for each row of a table, and a column has a cost in terms of memory (so, the more columns you generate, the more menory you need to store them). However, a measure doesn’t take up memory, it’s just a piece of code and doesn’t have values. The cost of a measure is CPUs (for doing the report, calculation power), so if you write a bad measure and a big table the computer will start heating up. Whatever you plan to do with a column, you use resources; if you end up not using a measure you do not waste resources (a column always has a cost, a measure doesn’t). Measure is also called KPI key performance indicator. Remember that you can filter only for a column, not a measure (because a measure does not have values). Columns a needed to create a set of contexts to calculate measures (eg. months, unit price, etc). 23/05/2025 The problem with some codes is that they are: too complex, too long, repetitive, and too difficult to maintain. This is the reason why we use CALCULATE to add and remove filters. The approach is the following: we start with a CALCULATE statement, which is a measure able to inject filters before calculating something. So, I don’t need to rewrite sumx every time for example, I can call the measure with its name: however, I need to do something on the filter, I have to inject the filter. At the exam, you have to use the previous methods, and use this method for additional marks. 1 Revenues Clerical CALCULATE = 2 CALCULATE ( 3 [Revenues], 4 Customer [EnglishOccupation] = “Clerical” 5) 1 Revenues Clerical = 2 SUMX ( 3 FILTER ( 4 Sales, 5 RELATED(Customer[EnglishOccup]) = “Clerical”, 6 ), 7 Sales[UnitPrice] * Sales[OrderQuan] 8) These two writings are the same (the second one is mandatory in the exam). The new method is shorter, easier to read, not repetitive, and much easier to maintain (for example, if you need to change something in the Revenues, you just change the original measure). CALCULATE first injects the filter and then calculates the measure, we can inject one or more filters and remove them. 1 ALL Revenues = 2 SUMX ( 3 ALL (Sales), 4 Sales[UnitPrice] * Sales[OrderQuantity] 5) This becomes (we remove filters in this case): 1 ALL Revenues CALCULATE = 2 CALCULATE ( 3 4 5) [Revenues], REMOVEFILTERS (‘Product’[Colour]) We first say what we want to calculate because it’s actually the last thing that CALCULATE runs (so we put the filter after that). We can also remove and add a filter at the same time. Example: calculate the total revenues for clerical customers but show it everywhere. 1 ALL Revenues Clerical = 2 SUMX ( 3 FILTER ( 4 ALL ( Sales ), 5 RELATED(Customer[EnglishOccup]) = “Clerical” 6 ), 7 Sales[UnitPrice] * Sales[OrderQuantity] 8) This can become: 1 ALL Revenues Clerical = 2 CALCULATE ( 3 [Revenues], 4 REMOVEFILTERS (‘Product’[Colour]), 5 Customer [EnglishOccup] = “Clerical” 6) In this case you can exchange the order of the two filters injected (remove and clerical), but this is not always the case. If I want to remove all the filters and then inject the filter on occupation (in this case I would write REMOVEFILTERS ( ) ) one might think that the order matters. Actually, the order still doesn’t matter because CALCULATE first removes the filters and then injects them. Always remember that if you want to add the filters to a measure that ignores all the filters (ALL), it won’t work: that’s why you have to recall the original measure (e.g. Revenues). 1 Revenues Clerical 2003= 2 SUMX ( 3 FILTER 4 Sales, 5 RELATED(Customer[EnglishOccup]) = “Clerical” && 6 RELATED(Calendar[CalendarYear]) = 2003 7 ), 8 Sales[UnitPrice] * Sales[OrderQuan] 9) It becomes: 1 Revenues Clerical 2003 CALCULATE= 2 CALCULATE ( 3 [Revenues], 4 Customer [EnglishOccup] = “Clerical”, 5 Calendar [CalendarYear] = 2003 6) CALCULATE puts automatically the filters in AND, no need to specify anything. The alternative approach would be 1 Revenues Clerical 2003 CALCULATE= 2 CALCULATE ( 3 [Revenues Clerical CALCULATE], 4 Calendar [CalendarYear] = 2003 5) At the exam we will have measure with either AND or OR (not the two together). In the case of the OR, there is even further specification, in fact the OR needs to be on columns of the same table. In the exam it will be explicitly said whether to do or not the measure in CALCULATE. 1 Revenues Clerical OR 2003= 2 SUMX ( 3 FILTER 4 Sales, 5 RELATED(Customer[EnglishOccup]) = “Clerical” || 6 RELATED(Calendar[CalendarYear]) = 2003 7 ), 8 Sales[UnitPrice] * Sales[OrderQuan] 9) If I try to do “Revenues Clerical OR 2003” with CALCULATE, I will get an error, in fact the columns are not on the same table. 1 Revenues Clerical OR Bachelor CALCULATE = 2 CALCULATE ( 3 [Revenues], 4 Customer[EnglishOccup] = “Clerical” || Customer[EnglishEducat]) = “Bachelor” 5) In this case i put the OR condition on the same line (otherwise it will be automatically put in AND condition). FINAL TEST won’t involve any paper, you can choose whether to use power pivot or power BI to solve the problems, in fact you will send a word file with the answers. Everyone will receive their own dataset (usual dataset but with different numbers) via email, download the word file and send that back. Test will last around 90 mins, go 10 mins earlier. Same measures done in class simply with different columns and numbers, just know how to replicate stuff. 5/6 measures to create, maybe 1 column, and two open questions (if you write too much without saying anything he will remove points). What is a primary key? What happens in a vertical lookup if you don’t have a primary key in Excel? What is a measure? What is a column? Know how to explain concepts. No cleaning, no power query, just load the tables and solve the problems. You can do a snapshot and copy paste on word (both the DAX code and the word file). No classical Excel exercises, just theory from that. No graphics. Probably something basic about Power Pivot.
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )