Uploaded by krrrtpaa

BI Course: Data Analysis, Power BI, Excel

advertisement
No. Topics for Week 1 & 2:
1 Data Analysis Terminology
2 Worksheet Formulas
3 PivotTables
4 Dynamic Spilled Arrays
5 Power Query
6 Power Pivot
7 Power BI Desktop
8 Power BI Online Service
9 Dataflow (Online Power Query)
Topics, 01-DAMEMPT-Start.xlsx
Page 1 of 28
BI 348 Highline Class
Field Names
name the
Variables
Column = Field
Data Types
Foreign Key
Primary Key
Text
Number
Many-To-One Relationship
Field Names in first row
Date
Units
Product
6/19/2025
288 175AP
Many *
1 One ProductID
100CR
Product
Apple
Crate Price
Cost
35
15.4
Records in rows
10/17/2024
132 100CR
175AP
Orange
29.75
17.255
Records in rows
Records in rows
11/13/2024
3/31/2025
10/15/2024
132 100CR
132 100CR
180 100CR
255YN
QA430
BL579
Kiwi
Banana
Cherry
22.5
17.5
43.7
12.6
11.725
25.346
10/25/2025
1/15/2024
8/30/2025
11/24/2024
3/22/2024
7/9/2025
5/26/2025
8/7/2025
10/23/2025
10/12/2024
9/3/2024
7/1/2025
9/24/2025
9/9/2024
1/3/2025
5/10/2024
9/21/2025
4/29/2024
156 100CR
168 BL579
204 255YN
180 100CR
276 QA430
108 BL579
96 QA430
192 100CR
156 BL579
156 BL579
84 255YN
228 175AP
240 175AP
72 BL579
216 100CR
84 175AP
96 175AP
216 100CR
Bits of Data:
1/15/2024
168
BL579
Data & Tables, 01-DAMEMPT-Start.xlsx
Table
Fact Table
Dimension (Lookup) Table
Page 2 of 28
BI 348 Highline Class
Invoice Total Sales =
Less Detail,
Bigger Grain
Invoice Sales Shipping Invoice
Date
Number RepID Costs($) Discount($)
1/1/2017 125447
9
98.7
144.18
1/2/2017 125448
28
26.25
73.06
1/3/2017 125450
4
207.55
437.62
1/4/2017 125451
15
262.15
542.26
1/4/2017 125452
23
159.25
381.63
Grain, 01-DAMEMPT-Start.xlsx
Year Grain is Bigger than Month Grain
Year Grain has Less Detail than Month Grain
Line Items Sales =
More Detail,
Smaller Grain
Year
Invoice Product
Unit
Number ID
Quantity Price($)
125447 LS-900
21
22.36
125447 TC-500
88
14.97
125447 OK-800
35
12.32
125448 DQ-100
53
27.57
125450 SC-1100
25
48.75
125450 TC-500
34
16.22
125450 TS-300
200
13.03
125451 AC-1000
223
11.68
125451 LS-900
224
12.58
125452 TM-600
38
18.82
125452 IY-700
238
13.03
Page 3 of 28
Sales($)
2023
78,139
2024
58,066
Year
Month
2023 Jan
Sales($)
7,684
2023 Feb
2023 Mar
2023 Apr
2023 May
2023 Jun
2023 Jul
2023 Aug
2023 Sep
2023 Oct
2023 Nov
2023 Dec
2024 Jan
2024 Feb
2024 Mar
2024 Apr
2024 May
2024 Jun
2024 Jul
2024 Aug
2024 Sep
2024 Oct
2024 Nov
2024 Dec
8,255
3,286
8,572
2,829
7,296
5,731
5,610
8,218
6,507
12,862
1,289
12,683
4,900
1,999
1,720
1,343
2,840
7,475
10,813
1,249
580
9,355
3,109
BI 348 Highline Class
SQL Server Database
Relational Database
Much of the data in
the world is stored in
SQL databases
SQL Server Database, 01-DAMEMPT-Start.xlsx
Page 4 of 28
BI 348 Highline Class
Columnar database in the Power Pivot and Power BI Data Model and Sematic Model compresses data into a smaller
more efficient structure than just storing rows of data.
Date
ProductID
3/19/2021
4/8/2021
4/12/2021
4/12/2021
4/22/2021
5/12/2021
5/24/2021
6/11/2021
6/11/2021
6/17/2021
7/5/2021
7/18/2021
7/20/2021
7/24/2021
8/6/2021
8/18/2021
9/10/2021
10/2/2021
10/2/2021
10/7/2021
10/7/2021
Columnar Database, 01-DAMEMPT-Start.xlsx
SalesRepID
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Units
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
80
5
88
70
6
5
1
92
91
209
91
66
110
4
94
2
57
4
1
86
90
Records in original table:
# Columns:
Total cells with data:
606
4
2424
Total cells in columnar database:
Count
Count
621
Count
Count
426
4
4
Date
3/19/2021
4/8/2021
4/12/2021
4/22/2021
5/12/2021
5/24/2021
6/11/2021
6/17/2021
7/5/2021
7/18/2021
7/20/2021
ProductID
2
3
4
1
SalesRepID
4
2
3
1
Page 5 of 28
187
Units
80
5
88
70
6
1
92
91
209
66
110
BI 348 Highline Class
Source data
Data destination
On-premises file path
Online source data
Delimiter
Structure Or Schema
Data Analysis
Source data in files.
The original location of the data, like in a text file, an Excel file of a database.
The location where the data is loaded, such as in an Excel or a Power BI Desktop file or an online
source like a Power BI workspace.
A hard coded source data file path in the data destination, such as an Excel or Power BI Desktop file.
On-premises file paths can cause errors when the data destination file is moved and the connection to the
source data is lost.
Online source data can solve the problem of On-premises file and folder paths.
Web sites, SQL Server databases and Power BI Online are examples of online sources that stay connected to
the source data when the data destination file is moved.
Is a character that separated bits of data, such as a comma, tab and other characters.
The rules or structure for tables, data files and databases.
Converts data into useful information to gain insight and make decisions.
Text Files:
Csv (Comma Separated Values)
Table Structure/Schema: field names & data are separated by commas (delimiter).
Systems understand that the data is stored with a table structure.
Txt (Tab Separated Values)
Table Structure/Schema: field names & data are separated by Tabs (delimiter).
Systems understand that the data is stored with a table structure.
DataSources, 01-DAMEMPT-Start.xlsx
Page 6 of 28
BI 348 Highline Class
Xml (eXtensible Markup Language)
Table Structure/Schema:
A "Sales" table container contains fields.
Transaction record containers contains fields.
Various field containers contain data.
Systems understand that the data is stored with a table structure.
Json (JavaScript Object Notation)
Record Structure/Schema:
All records are housed in square brackets [ ].
Individual records are housed in curly brackets { }.
Delimiter is colon :.
There is no table schema. Json files store records, not tables.
Systems understand that the data is stored with a record structure.
After you import records, you must convert records into a table.
Excel files
Worksheets
Record Structure/Schema:
Data is stored in worksheet and is understood
by systems to be records of data.
After you import records, you must convert records into a table.
Excel Tables
Table Structure/Schema:
Data is stored in an Excel Table object and is understood by systems to be a table of data.
Data is stored in an Excel Table object and is understood
by systems to be a table of data.
DataSources, 01-DAMEMPT-Start.xlsx
Page 7 of 28
BI 348 Highline Class
Source data from on-premises files.
Data destination is this Power BI Desktop file.
On-Premises file path:
Path is broken if destination file is moved.
(Easy to fix, but annoying)
Online Source like SQL Database:
Data destination is this Power BI Desktop file.
Online SQL Server database
Online data source
Path is NOT broken if destination file is moved.
DataSources, 01-DAMEMPT-Start.xlsx
Page 8 of 28
BI 348 Highline Class
When you hard coded a file path into a query, it means that if you move that source
data file to a different location, or you e-mail the file that contains the query to a
colleague, the connection to the source data is broken and you will receive the
following error: Data Source Not Found. When you hard code a file or folder path into
a query, the path is called an on-premises path. As you can imagine, on-premises
paths cause a lot of trouble if you are sharing files or moving files around. An on-line
data source, like an SQL server database, Dataflow or Power BI do not have this
problem because the data is stored online in a location that does not move. Multiple
credentialed people can have access to “a single source of truth” where there are no
on-premises paths and no conflicts with multiple versions of the same file.
Nevertheless, not all data is stored online, and on-premises paths are common. The
good news is that if you know where the source data file is, it is easy to redirect the
query to the new location. There are at least three ways to change the on-premises
path:
•In the Source step for almost any query, you can edit the on-premises path in
the formula bar.
•You can click the gear icon in the Source query step to open the source data
dialog box. Many data sources such as Csv, Excel, Sql Databases, Web sites
and more allow you to use the gear icon in the source query step to edit the
connection details.
•If you have used the same file or folder in multiple queries, it is most efficient
to edit the path universally in the Data Source Settings dialog box. There are
multiple ways to open this dialog box in Excel and Power BI. If you are in the
Power Query Editor:
o In the Excel Power Query Editor, in the Home tab, Data Sources group,
click the Data source settings button.
o In the Power BI Desktop Power Query Editor, in the Home tab, Data
source group, click the Transform data dropdown and then click Data
source settings.
o In the Dataflow Power Query Editor, in the Home tab, Data sources
group, click the Manage connections button.
Data Source Settings - 01-DAMEMPT-Start.xlsx
Page 9 of 28
BI 348 Highline Class
Data Analysis (Data Analytics, Analytics, Business Intelligence, Data Science):
Define:
Converts draw data into useful, actionable information to gain insight and make decisions.
Information can be in the form of: reports, visuals, dashboards, and other forms.
Data analysis allows you to make data-driven decisions, which tend to be more accurate & help to achieve goals more consistently.
Business Intelligence:
Same definition, but the process is performed within the context of business data and business decision making.
High Level Definition:
Data
Insightful,
Actionable
Information
Data analysis process:
1. Determine what questions need answers & what decisions need to be made. Everything else in the process is dictated by these questions and decisions.
2. Where is the data? How much data? What is the structure of the data?
3. Which MS tool to use? (Almost always starts with Power Query).
4. Clean, transform and shape the data into a table or model that is best suited to answer questions and make decisions,
5. Build final model with measures, metrics, relationships, and other features.
6. Create useful information: reports, visuals and dashboards.
7. Refresh when new data arrives.
8. Change and update model as necessary.
Define DA and BI, 01-DAMEMPT-Start.xlsx
Page 10 of 28
BI 348 Highline Class
Examples:
Business context:
Data
Useful information: metrics to see how customers sales perform over months, sales reps and products.
Sports contexts:
Data
Decide who becomes NBA 2024 MVP.
Rank PPG
Define DA and BI, 01-DAMEMPT-Start.xlsx
PLAYER
TEAM
1 Luka Doncic
DAL
2 Giannis Antetokounmpo
MIL
3 Shai Gilgeous-Alexander
OKC
4 Kevin Durant
PHX
5 Jalen Brunson
NYK
6 Jayson Tatum
BOS
7 Devin Booker
PHX
8 Stephen Curry
GSW
9 De'Aaron Fox
SAC
50 Collin Sexton
UTA
GP
MIN
63
67
70
66
67
67
59
66
64
73
Page 11 of 28
Ave PPG FGM FGA
37.4
33.9
11.5
35.1
30.6
11.5
34.4
30.4
10.8
37.2
27.6
10.1
35
27.4
9.8
35.8
27.3
9.2
35.8
27.2
9.6
32.7
26.5
8.8
35.6
26.5
9.7
26.3
18.6
6.5
FG%
23.7
18.7
20
19.2
20.5
19.5
19.5
19.6
20.8
13.2
48.7
61.4
54
52.9
47.7
47.3
49.2
44.7
46.6
49.4
BI 348 Highline Class
Educational content
Data
Decision: Which students get scholarships? History or Sociology majors are eligible.
Oracle Database
Student
Start Date
Major
Coats, Saharra
9/29/2020 Sociology
Emmons, Christi
7/14/2018 Accounting
Lear, Vania
9/3/2020 Chemistry
Washington, T
11/21/2019 History
Mohamed, Abdi
1/28/2021 Business
Nga, Luong
7/7/2020 Physics
Mims, Chantel
4/12/2020 History
Rouse, Sioux
6/30/2020 Chemistry
Simone, Alanna
8/2/2019 Physics
Thornburg, Tyrone
12/27/2019 Sociology
Credits
GPA
45
135
45
90
23
45
70
40
60
75
1.7
2.3
3
3.1
1.6
2.4
4
2.4
3.5
3.9
Eligible?
TRUE
FALSE
FALSE
TRUE
FALSE
FALSE
TRUE
FALSE
FALSE
TRUE
Major:
History
Sociology
Personal Budgeting context:
Data keep in Excel worksheet:
Insight:
Date
Amount
Expenses
1/3/24 Weekend Trip
1/3/24 Medical
1/8/24 Utilities
1/10/24 Ikea
1/16/24 Car Insurance
1/19/24 Movies
1/23/24 Groceries
1/26/24 Garden
2/1/24 Gas
2/9/24 Utilities
2/19/24 Gas
2/27/24 Gas
2/27/24 Car Insurance
3/2/24 Groceries
3/3/24 Gas
Define DA and BI, 01-DAMEMPT-Start.xlsx
What are the top 5 expenses so far this year?
Top
182.1
409.54
136.37
259.63
102.5
43.69
35.2
12.75
40
92.31
48.2
55.8
102.5
102.52
50
5
Date
Expenses
Amount
1/3/2024 Medical
409.54
3/28/2024 B-day for for Mom
378.02
1/10/2024 Ikea
259.63
1/3/2024 Weekend Trip
182.1
4/1/2024 Groceries
140.3
Page 12 of 28
BI 348 Highline Class
Data modeling
Converting source data into a data structure that allows you to create the useful information.
Tools to perform data modeling
Worksheet formulas
Power Query M Code formulas
Data Model DAX formulas
Others too
Query
A question that we ask of the raw data, or an action taken to shape data, like:
Import Csv file, append tables, or group line item sales to calculate invoice total sales.
Data modeling tasks:
Cleaning data
ISO Date
Into =>
20240522
Proper Date
5/22/2024
Formula:
=TEXT(B20,"0000-00-00")+0
Or other methods in Power Query
Transforming data
Data Modeling, 01-DAMEMPT-Start.xlsx
Page 13 of 28
BI 348 Highline Class
5) Star schema data model
A model with a fact table surrounded by dimension tables, relationships, pre-made measures,
and is constructed to be user friendly.
The Data Model in Power Pivot and Power BI are specifically designed to work efficiently with a start schema data model.
*Semantic model in Power BI just means that you upload a model like this, but because it is stored online,
People that are assigned access to the model have: A Single Source of Truth
5) Steps in build a data model:
1) Connect to data sources
2) Clean and Transform the data
3) Load to Preferred Location:
Worksheet
PivotTable Cache
Data Model in Power Pivot
Data Model in Power BI Desktop
Workspace in Power BI Online
4) Create Relationships (if necessary)
5) Using formulas build metrics and measures
6) Create user friendly reporting area, for example hiding columns
Data Modeling, 01-DAMEMPT-Start.xlsx
Page 14 of 28
BI 348 Highline Class
Summary Report
Detailed numbers with labels, which often are metrics
that help gauge performance or help make some
decision.
Visuals
Present quantitative values to get a quick impression,
see patterns and trends more quickly
than reports or tables.
Dashboards
Reports and visualizations in one location to monitor activity as new data arrives.
Dashboard can be created in Excel, Power BI Desktop, or Power BI Online
Useful Info, 01-DAMEMPT-Start.xlsx
Page 15 of 28
BI 348 Highline Class
Contains the worksheet, Power Query, M Code, Power Pivot, DAX .
Worksheet formulas instantly update when the source data changes. No
Worksheet Formulas
refresh needed.
You are not confined to structured data such as tables and columns, you
easily reference, cells, ranges, columns, or tables.
You have freedom to incorporate any part of the worksheet and any of the
many features.
Must lock references, copy formulas, edit all cells with formulas and many
formulas such as filtering a list are difficult.
Legacy worksheet formulas:
Benefits over Legacy Worksheet Formulas: 1) Offer new array functions such
as UNIQUE, SORT and GROUPBY, 2) Usually do not need to lock references,
Dynamic spilled array formulas
3) usually do not need to manually copy formula, 4) editing is only done in
top cell of array.
Case-sensitive function based language, called M Code, to connect to any
Power Query
data source and make any transformation.
Allow you to work with and transform data structures such as tables, files,
columns of files, tables, records or lists.
Load to worksheet, PivotTable Cache, Data Model, Connection Only, or
workspace.
User interface can write almost all the code for you.
Excel
M Code formulas
PivotTables
Power Pivot
DAX formulas
Power BI Desktop
Power BI Online Service
Workspace
Dataflow
Tools, 01-DAMEMPT-Start.xlsx
Memorizes all steps and allows you to go back and change or edit any step.
Unparalled functional language to work with data to transform and shape
into a form that is best suited for the desired analysis output.
For summing, counting and calculating percentages, there is no other
calculation tool that is as fast and easy to use as the standard
PivotTable.
Can store millions of rows of data, has great formula advantages with
DAX and Relationships, and you can have multiple tables in the
reporting area.
Works with columnar databases to allow calculations across big data.
DAX can generate tables of data at any grain internally in the formulas,
and thereby reduce the complexities of calculations.
Has all the Data Model advantages, plus it has better visualization
capabilities, visuals are interactive, and reports, visuals and
dashboards are easy to share.
Advantages of sematic models, reports, visuals, dashboards,
Dataflow and workspaces where colleagues can collaborate and
connect to a single source of data truth.
Area in Power BI Online, where you can assign organizational emails
access to the workspace so you can share and collaborate with
workspace objects such as:
Reports, Workbooks, Dashboards, Semantic Models, Dataflow.
Power Query Online that simultaneously can connect to and
transform data, and serve as a single source of data truth.
Page 16 of 28
BI 348 Highline Class
No Data Analysis Task
1) Source data in worksheet.
2) Use worksheet formulas to build data model.
3) PivotTable to create year/month sales report.
4) Spilled Array Formulas for product sales report.
Why use Tools?
Why worksheet formulas for data modeling and reports? Data is already in
worksheet, we don't have a lot of data, and calculations we need to make are easy
to do with formulas and PivotTables.
Why PivotTable? For counting, adding, averaging, % calculations, and year/month
totals, PivotTables are easier and faster to use than other tools.
Why worksheet formulas? Solution instantly updates when source data changes.
You can use cells, ranges, columns and tables. PivotTables, M Code and DAX can't
do either.
Why GROUPBY or PIVOTBY? Easier than any other tool, even PivotTable. Solution
5) GROUPBY function for product sales report.
instantly updates when source data changes. Only works in worksheet.
Why Power Query? Can work with data structures like tables, files and databases,
Txt file, Excel file, Power Query, Merge / Join feature to and can shape data better than other tool. The functions in M Code for dealing with
perform lookup. Then Table.Group function to create data are unparalleled in the worksheet and DAX. The Merge feature allows you to
6) region report.
perform lookup (similar to XLOOKUP and Relationships).
When you connect to a file, the file path is hard coded into M Code formula. If you
move the source data file and the destination file to the same new location, the
data connection is valid. But if you move the source data file to a new location, you
can break the connection. To re-connect, use the Data Source Settings option in
There are multiple ways to open this dialog box in Excel and Power BI. If you are in
the Power Query Editor:
In the Excel Power Query Editor, in the Home tab, Data Sources group, click the
Data source settings button.
In the Power BI Desktop Power Query Editor, in the Home tab, Data source group,
click the Transform data dropdown and then click Data source settings.
In the Dataflow Power Query Editor, in the Home tab, Data sources group, click the
7) Changing on-premises path.
Manage connections button.
DataTasksForVideo, 01-DAMEMPT-Start.xlsx
Page 17 of 28
BI 348 Highline Class
No Data Analysis Task
Duplicate Query, Merge / Join feature to perform
lookup, calculated columns in Power Query, load
8) table of data to PivotTable cache.
Why use Tools?
Calculated columns in PQ allow to build full table before loading to PivotTable
cache. Loading a table directly to a PivotTable cache prevents the table from being
stored both in the worksheet and in the PivotTable cache.
Data Model and DAX can handle big data and can create tables at different grains
internally in formulas more easily than other tools. Average Daily Sales is the
example we do to illustrate tables with specified grains in DAX formulas.
Relationships help to reduce complexities in formulas, for example the RELATED
function for exact match lookup. Relationships also allow you have multiple tables
in the reporting area of a PivotTable or Power BI.
Filter Context helps formula calculate over big data more efficiently by filtering the
large fact table down to just the rows that contain the conditions for the
Json file, Excel file, Power Query, Data Model, DAX to calculation. . Row Context allows formula to see the values in each row of a table,
create reports with SUMX and AVERAGEX. Data Model or an iterator function like SUMX. Context Transition is when the row context in a
Relationships to 1) preform lookup, 2) drag fields from function like AVERAGEX gets converted to filter context to help reduce the number
dimension tables into reports and visuals to slice and of rows that the fact table has to iterate so that the calculation can be performed
9) dice.
efficiently and accurately.
This is most easily done with DAX formulas because they can determine the grain
Calculating average daily units sold by customer and of a table internally in the formula. Context Transition is when all available row
10) product.
context gets converted to filter context.
Xml file, Excel file, Power BI Desktop to build
11) interactive visuals and reports.
Power BI has interactive visuals and reports are easy to share.
Publish report and semantic model to workspace in
Power BI Online. Connect to the model in the
Power BI Online allows collaboration in workspaces and can provide semantic
12) workspace in an Excel file.
models and dataflows as "single source of truth".
Dataflow (Power Query Online) to get and transform
data and then make it available as a single source of Dataflow is like getting two tools for the price of one: 1) get and transform data, 2)
13) truth to external tools.
Dataflows that serve as a "single source of truth" data source.
DataTasksForVideo, 01-DAMEMPT-Start.xlsx
Page 18 of 28
BI 348 Highline Class
No Data Analysis Task
1) Source data in worksheet.
Why use Tools?
4) Spilled Array Formulas for product sales report.
Why worksheet formulas for data modeling and reports? Data is already in
worksheet, we don't have a lot of data, and calculations we need to make are easy
to do with formulas and PivotTables.
Why PivotTable? For counting, adding, averaging, % calculations, and year/month
totals, PivotTables are easier and faster to use than other tools.
Why worksheet formulas? Solution instantly updates when source data changes.
You can use cells, ranges, columns and tables. PivotTables, M Code and DAX can't
do either.
5) GROUPBY function for product sales report.
Why GROUPBY or PIVOTBY? Easier than any other tool, even PivotTable. Solution
instantly updates when source data changes. Only works in worksheet.
2) Use worksheet formulas to build data model.
3) PivotTable to create year/month sales report.
Examples 1 to 5, 01-DAMEMPT-Start.xlsx
Page 19 of 28
BI 348 Highline Class
No Data Analysis Task
Why use Tools?
Why Power Query? Can work with data structures like tables, files and databases,
Txt file, Excel file, Power Query, Merge / Join feature to and can shape data better than other tool. The functions in M Code for dealing with
perform lookup. Then Table.Group function to create data are unparalleled in the worksheet and DAX. The Merge feature allows you to
6) region report.
perform lookup (similar to XLOOKUP and Relationships).
When you connect to a file, the file path is hard coded into M Code formula. If you
move the source data file and the destination file to the same new location, the
data connection is valid. But if you move the source data file to a new location, you
can break the connection. To re-connect, use the Data Source Settings option in
There are multiple ways to open this dialog box in Excel and Power BI. If you are in
the Power Query Editor:
In the Excel Power Query Editor, in the Home tab, Data Sources group, click the
Data source settings button.
In the Power BI Desktop Power Query Editor, in the Home tab, Data source group,
click the Transform data dropdown and then click Data source settings.
In the Dataflow Power Query Editor, in the Home tab, Data sources group, click the
7) Changing on-premises path.
Manage connections button.
6) Report created by Power Query:
Region Ave. Units Sold Standard Deviation
West
156.23
84.03
MidWest
161.67
83.95
East
169.31
82.97
7) On-premises file path error:
Examples 6 & 7, 01-DAMEMPT-Start.xlsx
Page 20 of 28
BI 348 Highline Class
No Data Analysis Task
Duplicate Query, Merge / Join feature to perform
lookup, calculated columns in Power Query, load
8) table of data to PivotTable cache.
Why use Tools?
Calculated columns in PQ allow to build full table before loading to PivotTable
cache. Loading a table directly to a PivotTable cache prevents the table from being
stored both in the worksheet and in the PivotTable cache.
8) Load Power Query data to PivotTable:
Table
loaded
to
behind-the
scenes
PivotTable
Cache
No Data Analysis Task
Why use Tools?
Data Model and DAX can handle big data and can create tables at different grains
internally in formulas more easily than other tools. Average Daily Sales is the
example we do to illustrate tables with specified grains in DAX formulas.
Relationships help to reduce complexities in formulas, for example the RELATED
function for exact match lookup. Relationships also allow you have multiple tables
in the reporting area of a PivotTable or Power BI.
Filter Context helps formula calculate over big data more efficiently by filtering the
large fact table down to just the rows that contain the conditions for the
Json file, Excel file, Power Query, Data Model, DAX to calculation. . Row Context allows formula to see the values in each row of a table,
create reports with SUMX and AVERAGEX. Data Model or an iterator function like SUMX. Context Transition is when the row context in a
Relationships to 1) preform lookup, 2) drag fields
function like AVERAGEX gets converted to filter context to help reduce the number
from dimension tables into reports and visuals to
of rows that the fact table has to iterate so that the calculation can be performed
9) slice and dice.
efficiently and accurately.
This is most easily done with DAX formulas because they can determine the grain
Calculating average daily units sold by customer and of a table internally in the formula. Context Transition is when all available row
10) product.
context gets converted to filter context.
9) Power Pivot's Data Model - DAX Measures & Relationships:
Data Model PivotTable:
10) Average is made at day grain:
Product
TotalSales($)
Average Daily Sales
Apple
1,260,420.00
6,463.69
Banana
548,940.00
3,248.17
Cherry
1,635,079.20
8,054.58
Kiwi
811,890.00
4,460.93
Orange
912,849.00
5,246.26
Grand Total
5,169,178.20
9,484.73
Example 8 to 10, 01-DAMEMPT-Start.xlsx
Page 21 of 28
BI 348 Highline Class
No Data Analysis Task
11) Xml file, Excel file, Power BI Desktop to build interactive visuals and reports.
Why use Tools?
Power BI has interactive visuals and reports are easy to share.
11) Build Power BI Data Model and interactive visuals:
Desktop
No Data Analysis Task
Why use Tools?
Publish report and semantic model to workspace in Power BI Online. Connect to the Power BI Online allows collaboration in workspaces and can provide semantic
12) model in the workspace in an Excel file.
models and dataflows as "single source of truth".
12) Publish to Workspace:
Desktop
Example 11 & 12, 01-DAMEMPT-Start.xlsx
Online
Service
Page 22 of 28
BI 348 Highline Class
Data Model gets published as a Semantic Model (Single Source of Truth):
Report gets published as Report that anyone in workspace an consume:
Connect to semantic model from Excel:
Connect to semantic model from Power BI Desktop:
Example 11 & 12, 01-DAMEMPT-Start.xlsx
Page 23 of 28
BI 348 Highline Class
No Data Analysis Task
Dataflow (Power Query Online) to get and transform data
and then make it available as a single source of truth to
13) external tools.
Why use Tools?
Dataflow is like getting two tools for the price of one: 1) get and
transform data, 2) Dataflows that serve as a "single source of
truth" data source.
13) Dataflow in Power BI Online Workspace:
Online
Service
(Online Power Query)
1) Get, transform data, save to workspace:
2) "Single source of truth". Workspace Dataflows are data source for external tools like Excel and Power BI Desktop:
Excel connects to Dataflows:
Example 13, 01-DAMEMPT-Start.xlsx
Power BI Desktop connects to Dataflows:
Page 24 of 28
BI 348 Highline Class
When we use Dataflow to upload files, the files are stored in OneDrive. Here are the OneDrive status icons that you need to be aware of:
Icon
Icon Name
Blue cloud
Shared with People
Indicates
that the file is only available online. Online-only files don’t take up space on your computer. The
Description
file doesn’t download to your device until you open it. You can’t open online-only files when your device isn’t
connected to the Internet.
Always keep on this device
Indicates the file or folder has been shared with other people.
When you open an online-only file (blue cloud), it downloads to your device and becomes a locally
available file. If you need more space, you can change the file back to online only: right-click the file and
select “Free up space.” With Storage Sense turned on, these files will become online-only files after the
time period you've selected.
These files download to your device and take up space, but they’re always there for you even when you’re
offline. OneDrive is just keeping a copy online as backup.
Padlock
File or folder has settings which prevent it from syncing.
New File
File or folder is new. You'll see this when using OneDrive.com online.
Red X
File or folder cannot be synced. You'll see this in File Explorer or on the OneDrive notification area icons.
Learn More
Click the OneDrive icon in the notification area to learn more about the problem.
Gray OneDrive icon
Pasue
Means you're not signed in, or OneDrive setup hasn't completed.
The paused symbol over the OneDrive icon means your files are not currently syncing. To resume syncing,
select the OneDrive icon in the notification area, select More and then Resume syncing.
Sync pending
The circular arrows over the OneDrive notification icons signify that sync is in progress. This includes when
you are uploading files, or OneDrive is syncing new files from the cloud to your PC.
Account blocked
Account is blocked.
Warning
Your account needs attention. Select the icon to see the warning message displayed in the activity center.
Signed into both work or school and a personal account. The blue one is for your work or school account,
the white one is for your personal account.
Online, Locally Downloaded
Two Cloud Icon
OneDriveIcons - 01-DAMEMPT-Start.xlsx
Page 25 of 28
BI 348 - Highline Data Analysis Class
Data and table terms:
Variable
Data
A value that can change, like a product name or an amount of a sale.
Variables can be quantitative (number), categorical (text), Boolean (T/F), or other.
Values collected for one or more variables and kept together for reference or analysis.
Data stored in its smallest form.
Data:
22 3/29/2024 Product
Not data: 22,3/29/2024, Product
Data Type
Field
Field Name
Record
Table
Fact Table
Dimension (Lookup) Table
Primary Key
Foreign Key
One-To-Many Relationship
Note:
Grain or Granularity
Data is not information. Information is created from data.
Declared type of data for a column such as: number, text or logical.
Safeguards for the column of data and helps to assure data consistency and accuracy for reports and visuals.
Column that is used to collect data for a variable. Column should have declared data type and must have field
name at top of column.
In the world of databases, columns are called fields. In other arenas, such as Microsoft power tools, fields are
often called columns.
Name at top of field that accurately describes the data. Field names are used in reports and visuals to indicate
which variables to summarize
Synonyms = column name = header name.
A row in table that contains related data for each field for a given observation, such as a sales transaction,
scientific observation or employee data.
A collection of one or more columns, with field names in first row, records of data are in subsequent
rows, and data types for each column.
Data must be contained in a table to easily be analyzed.
A great amount of data is not stored in tables, very often, the job of the analysts is to transform the unstructured
data into a table structure.
A table that contains the facts to summarize or measure, like sales, units, amount of time, sports statistics, and
other facts.
The facts are the measurements of activities, like amount of sales, number of points scored in a game
or length of time at a web site.
Fact tables are sometimes very large and can have thousands, millions or billions of rows of data.
A table that contains a field with unique list of entities, called a primary key column, and subsequent columns
with attributes for the entity.
An entity like Product ID would have attributes like product name, price, cots and product weight.
Dimension tables are also called lookup tables.
The entity field in a dimension table that contains a unique list of items and is used to assure that
there are no duplicate records in the dimension table.
When a primary key is used in a fact table it is called a foreign key.
When the primary key from a dimension table is connected to a foreign key in a fact table, the primary
is called the one-side and the foreign key is called the many-side.
A one-to-many relationship helps to make lookup formulas easy and allows dimension table
attributes to filter reports and visuals.
The terms "fact" and "dimension" go back to 1960s when the General Mills company and Dartmouth University
used them to name their tables of data.
Then in the 1970s the data research companies AC Nielsen & IRI company used the terms.
However, the terms were popularized by Ralph Kimball in the 1980s with his extensive writing about data
warehousing and business intelligence.
The level of detail stored in a table, or the size of the number.
A table of invoice total sales amounts has a larger grain and less detail than a table of line items sales amounts.
A table of line items sales amounts has a smaller grain and more detail than a table of invoice total sales
amounts.
The grain of the 2023 total sales amount is bigger and has less detail than the grain of the May, 2024 sales
amount.
Terms - 01-DAMEMPT-Start.xlsx
Page 26 of 28
BI 348: DAMEwMPT
Database
Relational Database
Text Files
Source data
Data destination
On-premises file path
Online source data
Delimiter
Structure Or Schema
Data Analysis
Business Intelligence
Data analysis process
Query
Data modeling
Cleaning data
Transforming data
M Code (Data Mashup)
A data model
Star schema data model
Terms - 01-DAMEMPT-Start.xlsx
The grain of the May, 2024 sales amount is smaller and has more detail than grain of the 2023 total sales
amount.
This concept is important because if you have two fact tables at different grains, you often have to create a
method to convert the two fact tables into one.
Location where most data in the world is stored.
A database that follows strict rules for storing related tables of data with no redundancy.
A common vehicle to transfer tables of data from one system to another system using delimiters such as
comma and tab.
The original location of the data, like in a text file, an Excel file of a database.
The location where the data is loaded, such as in an Excel or a Power BI Desktop file or an online
source like a Power BI workspace.
A hard coded source data file path in the data destination, such as an Excel or Power BI Desktop file.
On-premises file paths can cause errors when the data destination file is moved and the connection to the
source data is lost.
Online source data can solve the problem of On-premises file and folder paths.
Web sites, SQL Server databases and Power BI Online are examples of online sources that stay connected to
the source data when the data destination file is moved.
Is a character that separated bits of data, such as a comma, tab and other characters.
The rules or structure for tables, data files and databases.
Converts data into useful information to gain insight and make decisions.
Information can be in the form of: reports, visuals, dashboards, and other forms.
Data analysis allows you to make data-driven decisions, which tend to be more accurate & help to achieve
goals more consistently
Synonyms: Data Analytics, Analytics, Business Intelligence, Data Science.
The definition is the same as data analysis, but the process is performed within the context of
business data and business decision making.
1. Determine what questions need answers and what decisions need to be made. Everything else in the
process is dictated by these questions and decisions.
2. Where is the data? How much data? What is the structure of the data?
3. Which MS tool to use? (Almost always starts with Power Query).
4. Clean, transform and shape the data into a table or model that is best suited to answer questions and make
decisions,
5. Build final model with measures, metrics, relationships, and other features.
6. Create useful information: reports, visuals and dashboards.
7. Refresh when new data arrives.
8. Change and update model as necessary.
A question that we ask of the raw data, like import Csv file, append tables, or group line item sales to
calculate invoice total sales.
Involves all the steps necessary to get and transform the data from the original data sources into a data
structure that allows you to create the useful information required.
Data modeling can be done with many tools such as: Power Query, worksheet formulas, DAX formulas,
Relationships and many other Excel and Power BI features.
Involves fixing individual bites of data, such as extracting text from a larger text string, converting text dates
to proper dates or rounding numbers.
Involves structural changes to source data or tables of data, such as converting Json files to proper tables,
merging tables to add columns, or converting two fact tables into one table.
Case-sensitive function language in Power Query that allows you to work with data and data structures to
clean, transform and shaped the data into the required structure.
Is the final data structure that allows you to create the useful information required is called a data model and
becomes the intermediate step between the source data and the reports and visuals.
A model with a fact table surrounded by dimension tables, relationships between the tables (usually
one-to-many), pre-made measures, and is constructed to be user friendly.
The Data Model in Power Pivot and Power BI are specifically designed to work efficiently with a start schema
data model.
Page 27 of 28
BI 348: DAMEwMPT
Semantic model in Power BI
The Data Model
Columnar database
DAX (Data Analysis eXpressions)
Measures
One-to-many Relationship
Load data
Summary Report
Visuals
Dashboards
It is the same as a Data Model in Power Pivot and Power BI Desktop, and is usually a Star schema data model.
The difference is that the model is stored online, contains data security features and serves as a single source
of online data truth.
Is the location in Power Pivot and Power BI Desktop where the data is stored in a columnar database
and the DAX formulas, relationships and other model features are added.
In the Data Model, behind the scenes, there is a RAM memory Columnar Database that compresses and
stores the data, and which allows the DAX formulas to work efficiently with big data.
Functional language used in the Data Model to work efficiently with big data and can internally create tables at
different grains to reduce the complexity of many formulas.
Metrics that help gauge performance or help make some decision.
Primary key of dimension table is connected to the foreign key in the fact table to simplify lookup formulas and
allow dimension table fields in the report area to filter reports, visuals and databases.
Involves deciding where to load the data, such as loading to the worksheet, a PivotTable Cache, the Data
Model, or a Power BI workspace.
Reports often contain detailed numbers with labels, and the numbers are almost always metrics that help
gauge performance or help make some decision.
Reports almost always contains conditional calculations and therefore it is very helpful if you are fluent with
logical tests.
Present quantitative values in a visual way to get a quick impression and see patterns and trends more quickly
than reports or tables .
Examples of visuals: column, bar, line, scatter, map, waterfall charts, Pictures, Conditional Formatting, and
more.
Contain summary reports, visualizations, and other elements in one location so that we can monitor the activity
as new data arrives.
Just like a dashboard in a car, a dashboard should present information that is required for making
good decisions
Contains
the worksheet, Power Query, M Code, Power Pivot, DAX .
Excel
Worksheet formulas instantly update when the source data changes. No refresh needed.
Worksheet Formulas
You are not confined to structured data such as tables and columns, you easily reference, cells, ranges,
columns, or tables.
You have freedom to incorporate any part of the worksheet and any of the many features.
Must lock references, copy formulas, edit all cells with formulas and many formulas such as filtering a list are
Legacy worksheet formulas:
difficult.
Do NOT need to lock references or copy formulas, edit is done in top cell of array only, and formulas for filtering
Dynamic spilled array formulas
are simple.
Case-sensitive function based language to connect to any data source and make any transformation.
Power Query
Allow you to work with and transform data structures such as tables, files, columns of files, tables, records or
lists.
Load to worksheet, PivotTable Cache, Data Model, or workspace.
User interface can write almost all the code for you.
Memorizes all steps and allows you to go back and change or edit any step.
PivotTables
Power Pivot
Power BI Desktop
Power BI Online Service
Workspace
Dataflow
Terms - 01-DAMEMPT-Start.xlsx
For summing, counting and calculating percentages, there is no other calculation tool that is as fast
and easy to use as the standard PivotTable.
Can store millions of rows of data, has great formula advantages with DAX and Relationships, and you
can have multiple tables in the reporting area.
Has all the Data Model advantages, plus it has interactive visualizations.
Advantages of sematic models, reports, visuals, dashboards, Dataflow and workspaces where
colleagues can collaborate and connect to a single source of data truth.
Area in Power BI Online, where you can assign organizational emails access to the workspace so you
can share and collaborate with workspace objects such as:
Reports, Workbooks, Dashboards, Semantic Models, Dataflow.
Power Query Online that simultaneously can connect to and transform data, and serve as a single
source of data truth.
Page 28 of 28
BI 348: DAMEwMPT
Download