No. Topics for Week 1 & 2: 1 Data Analysis Terminology 2 Worksheet Formulas 3 PivotTables 4 Dynamic Spilled Arrays 5 Power Query 6 Power Pivot 7 Power BI Desktop 8 Power BI Online Service 9 Dataflow (Online Power Query) Topics, 01-DAMEMPT-Start.xlsx Page 1 of 28 BI 348 Highline Class Field Names name the Variables Column = Field Data Types Foreign Key Primary Key Text Number Many-To-One Relationship Field Names in first row Date Units Product 6/19/2025 288 175AP Many * 1 One ProductID 100CR Product Apple Crate Price Cost 35 15.4 Records in rows 10/17/2024 132 100CR 175AP Orange 29.75 17.255 Records in rows Records in rows 11/13/2024 3/31/2025 10/15/2024 132 100CR 132 100CR 180 100CR 255YN QA430 BL579 Kiwi Banana Cherry 22.5 17.5 43.7 12.6 11.725 25.346 10/25/2025 1/15/2024 8/30/2025 11/24/2024 3/22/2024 7/9/2025 5/26/2025 8/7/2025 10/23/2025 10/12/2024 9/3/2024 7/1/2025 9/24/2025 9/9/2024 1/3/2025 5/10/2024 9/21/2025 4/29/2024 156 100CR 168 BL579 204 255YN 180 100CR 276 QA430 108 BL579 96 QA430 192 100CR 156 BL579 156 BL579 84 255YN 228 175AP 240 175AP 72 BL579 216 100CR 84 175AP 96 175AP 216 100CR Bits of Data: 1/15/2024 168 BL579 Data & Tables, 01-DAMEMPT-Start.xlsx Table Fact Table Dimension (Lookup) Table Page 2 of 28 BI 348 Highline Class Invoice Total Sales = Less Detail, Bigger Grain Invoice Sales Shipping Invoice Date Number RepID Costs($) Discount($) 1/1/2017 125447 9 98.7 144.18 1/2/2017 125448 28 26.25 73.06 1/3/2017 125450 4 207.55 437.62 1/4/2017 125451 15 262.15 542.26 1/4/2017 125452 23 159.25 381.63 Grain, 01-DAMEMPT-Start.xlsx Year Grain is Bigger than Month Grain Year Grain has Less Detail than Month Grain Line Items Sales = More Detail, Smaller Grain Year Invoice Product Unit Number ID Quantity Price($) 125447 LS-900 21 22.36 125447 TC-500 88 14.97 125447 OK-800 35 12.32 125448 DQ-100 53 27.57 125450 SC-1100 25 48.75 125450 TC-500 34 16.22 125450 TS-300 200 13.03 125451 AC-1000 223 11.68 125451 LS-900 224 12.58 125452 TM-600 38 18.82 125452 IY-700 238 13.03 Page 3 of 28 Sales($) 2023 78,139 2024 58,066 Year Month 2023 Jan Sales($) 7,684 2023 Feb 2023 Mar 2023 Apr 2023 May 2023 Jun 2023 Jul 2023 Aug 2023 Sep 2023 Oct 2023 Nov 2023 Dec 2024 Jan 2024 Feb 2024 Mar 2024 Apr 2024 May 2024 Jun 2024 Jul 2024 Aug 2024 Sep 2024 Oct 2024 Nov 2024 Dec 8,255 3,286 8,572 2,829 7,296 5,731 5,610 8,218 6,507 12,862 1,289 12,683 4,900 1,999 1,720 1,343 2,840 7,475 10,813 1,249 580 9,355 3,109 BI 348 Highline Class SQL Server Database Relational Database Much of the data in the world is stored in SQL databases SQL Server Database, 01-DAMEMPT-Start.xlsx Page 4 of 28 BI 348 Highline Class Columnar database in the Power Pivot and Power BI Data Model and Sematic Model compresses data into a smaller more efficient structure than just storing rows of data. Date ProductID 3/19/2021 4/8/2021 4/12/2021 4/12/2021 4/22/2021 5/12/2021 5/24/2021 6/11/2021 6/11/2021 6/17/2021 7/5/2021 7/18/2021 7/20/2021 7/24/2021 8/6/2021 8/18/2021 9/10/2021 10/2/2021 10/2/2021 10/7/2021 10/7/2021 Columnar Database, 01-DAMEMPT-Start.xlsx SalesRepID 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Units 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 80 5 88 70 6 5 1 92 91 209 91 66 110 4 94 2 57 4 1 86 90 Records in original table: # Columns: Total cells with data: 606 4 2424 Total cells in columnar database: Count Count 621 Count Count 426 4 4 Date 3/19/2021 4/8/2021 4/12/2021 4/22/2021 5/12/2021 5/24/2021 6/11/2021 6/17/2021 7/5/2021 7/18/2021 7/20/2021 ProductID 2 3 4 1 SalesRepID 4 2 3 1 Page 5 of 28 187 Units 80 5 88 70 6 1 92 91 209 66 110 BI 348 Highline Class Source data Data destination On-premises file path Online source data Delimiter Structure Or Schema Data Analysis Source data in files. The original location of the data, like in a text file, an Excel file of a database. The location where the data is loaded, such as in an Excel or a Power BI Desktop file or an online source like a Power BI workspace. A hard coded source data file path in the data destination, such as an Excel or Power BI Desktop file. On-premises file paths can cause errors when the data destination file is moved and the connection to the source data is lost. Online source data can solve the problem of On-premises file and folder paths. Web sites, SQL Server databases and Power BI Online are examples of online sources that stay connected to the source data when the data destination file is moved. Is a character that separated bits of data, such as a comma, tab and other characters. The rules or structure for tables, data files and databases. Converts data into useful information to gain insight and make decisions. Text Files: Csv (Comma Separated Values) Table Structure/Schema: field names & data are separated by commas (delimiter). Systems understand that the data is stored with a table structure. Txt (Tab Separated Values) Table Structure/Schema: field names & data are separated by Tabs (delimiter). Systems understand that the data is stored with a table structure. DataSources, 01-DAMEMPT-Start.xlsx Page 6 of 28 BI 348 Highline Class Xml (eXtensible Markup Language) Table Structure/Schema: A "Sales" table container contains fields. Transaction record containers contains fields. Various field containers contain data. Systems understand that the data is stored with a table structure. Json (JavaScript Object Notation) Record Structure/Schema: All records are housed in square brackets [ ]. Individual records are housed in curly brackets { }. Delimiter is colon :. There is no table schema. Json files store records, not tables. Systems understand that the data is stored with a record structure. After you import records, you must convert records into a table. Excel files Worksheets Record Structure/Schema: Data is stored in worksheet and is understood by systems to be records of data. After you import records, you must convert records into a table. Excel Tables Table Structure/Schema: Data is stored in an Excel Table object and is understood by systems to be a table of data. Data is stored in an Excel Table object and is understood by systems to be a table of data. DataSources, 01-DAMEMPT-Start.xlsx Page 7 of 28 BI 348 Highline Class Source data from on-premises files. Data destination is this Power BI Desktop file. On-Premises file path: Path is broken if destination file is moved. (Easy to fix, but annoying) Online Source like SQL Database: Data destination is this Power BI Desktop file. Online SQL Server database Online data source Path is NOT broken if destination file is moved. DataSources, 01-DAMEMPT-Start.xlsx Page 8 of 28 BI 348 Highline Class When you hard coded a file path into a query, it means that if you move that source data file to a different location, or you e-mail the file that contains the query to a colleague, the connection to the source data is broken and you will receive the following error: Data Source Not Found. When you hard code a file or folder path into a query, the path is called an on-premises path. As you can imagine, on-premises paths cause a lot of trouble if you are sharing files or moving files around. An on-line data source, like an SQL server database, Dataflow or Power BI do not have this problem because the data is stored online in a location that does not move. Multiple credentialed people can have access to “a single source of truth” where there are no on-premises paths and no conflicts with multiple versions of the same file. Nevertheless, not all data is stored online, and on-premises paths are common. The good news is that if you know where the source data file is, it is easy to redirect the query to the new location. There are at least three ways to change the on-premises path: •In the Source step for almost any query, you can edit the on-premises path in the formula bar. •You can click the gear icon in the Source query step to open the source data dialog box. Many data sources such as Csv, Excel, Sql Databases, Web sites and more allow you to use the gear icon in the source query step to edit the connection details. •If you have used the same file or folder in multiple queries, it is most efficient to edit the path universally in the Data Source Settings dialog box. There are multiple ways to open this dialog box in Excel and Power BI. If you are in the Power Query Editor: o In the Excel Power Query Editor, in the Home tab, Data Sources group, click the Data source settings button. o In the Power BI Desktop Power Query Editor, in the Home tab, Data source group, click the Transform data dropdown and then click Data source settings. o In the Dataflow Power Query Editor, in the Home tab, Data sources group, click the Manage connections button. Data Source Settings - 01-DAMEMPT-Start.xlsx Page 9 of 28 BI 348 Highline Class Data Analysis (Data Analytics, Analytics, Business Intelligence, Data Science): Define: Converts draw data into useful, actionable information to gain insight and make decisions. Information can be in the form of: reports, visuals, dashboards, and other forms. Data analysis allows you to make data-driven decisions, which tend to be more accurate & help to achieve goals more consistently. Business Intelligence: Same definition, but the process is performed within the context of business data and business decision making. High Level Definition: Data Insightful, Actionable Information Data analysis process: 1. Determine what questions need answers & what decisions need to be made. Everything else in the process is dictated by these questions and decisions. 2. Where is the data? How much data? What is the structure of the data? 3. Which MS tool to use? (Almost always starts with Power Query). 4. Clean, transform and shape the data into a table or model that is best suited to answer questions and make decisions, 5. Build final model with measures, metrics, relationships, and other features. 6. Create useful information: reports, visuals and dashboards. 7. Refresh when new data arrives. 8. Change and update model as necessary. Define DA and BI, 01-DAMEMPT-Start.xlsx Page 10 of 28 BI 348 Highline Class Examples: Business context: Data Useful information: metrics to see how customers sales perform over months, sales reps and products. Sports contexts: Data Decide who becomes NBA 2024 MVP. Rank PPG Define DA and BI, 01-DAMEMPT-Start.xlsx PLAYER TEAM 1 Luka Doncic DAL 2 Giannis Antetokounmpo MIL 3 Shai Gilgeous-Alexander OKC 4 Kevin Durant PHX 5 Jalen Brunson NYK 6 Jayson Tatum BOS 7 Devin Booker PHX 8 Stephen Curry GSW 9 De'Aaron Fox SAC 50 Collin Sexton UTA GP MIN 63 67 70 66 67 67 59 66 64 73 Page 11 of 28 Ave PPG FGM FGA 37.4 33.9 11.5 35.1 30.6 11.5 34.4 30.4 10.8 37.2 27.6 10.1 35 27.4 9.8 35.8 27.3 9.2 35.8 27.2 9.6 32.7 26.5 8.8 35.6 26.5 9.7 26.3 18.6 6.5 FG% 23.7 18.7 20 19.2 20.5 19.5 19.5 19.6 20.8 13.2 48.7 61.4 54 52.9 47.7 47.3 49.2 44.7 46.6 49.4 BI 348 Highline Class Educational content Data Decision: Which students get scholarships? History or Sociology majors are eligible. Oracle Database Student Start Date Major Coats, Saharra 9/29/2020 Sociology Emmons, Christi 7/14/2018 Accounting Lear, Vania 9/3/2020 Chemistry Washington, T 11/21/2019 History Mohamed, Abdi 1/28/2021 Business Nga, Luong 7/7/2020 Physics Mims, Chantel 4/12/2020 History Rouse, Sioux 6/30/2020 Chemistry Simone, Alanna 8/2/2019 Physics Thornburg, Tyrone 12/27/2019 Sociology Credits GPA 45 135 45 90 23 45 70 40 60 75 1.7 2.3 3 3.1 1.6 2.4 4 2.4 3.5 3.9 Eligible? TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE Major: History Sociology Personal Budgeting context: Data keep in Excel worksheet: Insight: Date Amount Expenses 1/3/24 Weekend Trip 1/3/24 Medical 1/8/24 Utilities 1/10/24 Ikea 1/16/24 Car Insurance 1/19/24 Movies 1/23/24 Groceries 1/26/24 Garden 2/1/24 Gas 2/9/24 Utilities 2/19/24 Gas 2/27/24 Gas 2/27/24 Car Insurance 3/2/24 Groceries 3/3/24 Gas Define DA and BI, 01-DAMEMPT-Start.xlsx What are the top 5 expenses so far this year? Top 182.1 409.54 136.37 259.63 102.5 43.69 35.2 12.75 40 92.31 48.2 55.8 102.5 102.52 50 5 Date Expenses Amount 1/3/2024 Medical 409.54 3/28/2024 B-day for for Mom 378.02 1/10/2024 Ikea 259.63 1/3/2024 Weekend Trip 182.1 4/1/2024 Groceries 140.3 Page 12 of 28 BI 348 Highline Class Data modeling Converting source data into a data structure that allows you to create the useful information. Tools to perform data modeling Worksheet formulas Power Query M Code formulas Data Model DAX formulas Others too Query A question that we ask of the raw data, or an action taken to shape data, like: Import Csv file, append tables, or group line item sales to calculate invoice total sales. Data modeling tasks: Cleaning data ISO Date Into => 20240522 Proper Date 5/22/2024 Formula: =TEXT(B20,"0000-00-00")+0 Or other methods in Power Query Transforming data Data Modeling, 01-DAMEMPT-Start.xlsx Page 13 of 28 BI 348 Highline Class 5) Star schema data model A model with a fact table surrounded by dimension tables, relationships, pre-made measures, and is constructed to be user friendly. The Data Model in Power Pivot and Power BI are specifically designed to work efficiently with a start schema data model. *Semantic model in Power BI just means that you upload a model like this, but because it is stored online, People that are assigned access to the model have: A Single Source of Truth 5) Steps in build a data model: 1) Connect to data sources 2) Clean and Transform the data 3) Load to Preferred Location: Worksheet PivotTable Cache Data Model in Power Pivot Data Model in Power BI Desktop Workspace in Power BI Online 4) Create Relationships (if necessary) 5) Using formulas build metrics and measures 6) Create user friendly reporting area, for example hiding columns Data Modeling, 01-DAMEMPT-Start.xlsx Page 14 of 28 BI 348 Highline Class Summary Report Detailed numbers with labels, which often are metrics that help gauge performance or help make some decision. Visuals Present quantitative values to get a quick impression, see patterns and trends more quickly than reports or tables. Dashboards Reports and visualizations in one location to monitor activity as new data arrives. Dashboard can be created in Excel, Power BI Desktop, or Power BI Online Useful Info, 01-DAMEMPT-Start.xlsx Page 15 of 28 BI 348 Highline Class Contains the worksheet, Power Query, M Code, Power Pivot, DAX . Worksheet formulas instantly update when the source data changes. No Worksheet Formulas refresh needed. You are not confined to structured data such as tables and columns, you easily reference, cells, ranges, columns, or tables. You have freedom to incorporate any part of the worksheet and any of the many features. Must lock references, copy formulas, edit all cells with formulas and many formulas such as filtering a list are difficult. Legacy worksheet formulas: Benefits over Legacy Worksheet Formulas: 1) Offer new array functions such as UNIQUE, SORT and GROUPBY, 2) Usually do not need to lock references, Dynamic spilled array formulas 3) usually do not need to manually copy formula, 4) editing is only done in top cell of array. Case-sensitive function based language, called M Code, to connect to any Power Query data source and make any transformation. Allow you to work with and transform data structures such as tables, files, columns of files, tables, records or lists. Load to worksheet, PivotTable Cache, Data Model, Connection Only, or workspace. User interface can write almost all the code for you. Excel M Code formulas PivotTables Power Pivot DAX formulas Power BI Desktop Power BI Online Service Workspace Dataflow Tools, 01-DAMEMPT-Start.xlsx Memorizes all steps and allows you to go back and change or edit any step. Unparalled functional language to work with data to transform and shape into a form that is best suited for the desired analysis output. For summing, counting and calculating percentages, there is no other calculation tool that is as fast and easy to use as the standard PivotTable. Can store millions of rows of data, has great formula advantages with DAX and Relationships, and you can have multiple tables in the reporting area. Works with columnar databases to allow calculations across big data. DAX can generate tables of data at any grain internally in the formulas, and thereby reduce the complexities of calculations. Has all the Data Model advantages, plus it has better visualization capabilities, visuals are interactive, and reports, visuals and dashboards are easy to share. Advantages of sematic models, reports, visuals, dashboards, Dataflow and workspaces where colleagues can collaborate and connect to a single source of data truth. Area in Power BI Online, where you can assign organizational emails access to the workspace so you can share and collaborate with workspace objects such as: Reports, Workbooks, Dashboards, Semantic Models, Dataflow. Power Query Online that simultaneously can connect to and transform data, and serve as a single source of data truth. Page 16 of 28 BI 348 Highline Class No Data Analysis Task 1) Source data in worksheet. 2) Use worksheet formulas to build data model. 3) PivotTable to create year/month sales report. 4) Spilled Array Formulas for product sales report. Why use Tools? Why worksheet formulas for data modeling and reports? Data is already in worksheet, we don't have a lot of data, and calculations we need to make are easy to do with formulas and PivotTables. Why PivotTable? For counting, adding, averaging, % calculations, and year/month totals, PivotTables are easier and faster to use than other tools. Why worksheet formulas? Solution instantly updates when source data changes. You can use cells, ranges, columns and tables. PivotTables, M Code and DAX can't do either. Why GROUPBY or PIVOTBY? Easier than any other tool, even PivotTable. Solution 5) GROUPBY function for product sales report. instantly updates when source data changes. Only works in worksheet. Why Power Query? Can work with data structures like tables, files and databases, Txt file, Excel file, Power Query, Merge / Join feature to and can shape data better than other tool. The functions in M Code for dealing with perform lookup. Then Table.Group function to create data are unparalleled in the worksheet and DAX. The Merge feature allows you to 6) region report. perform lookup (similar to XLOOKUP and Relationships). When you connect to a file, the file path is hard coded into M Code formula. If you move the source data file and the destination file to the same new location, the data connection is valid. But if you move the source data file to a new location, you can break the connection. To re-connect, use the Data Source Settings option in There are multiple ways to open this dialog box in Excel and Power BI. If you are in the Power Query Editor: In the Excel Power Query Editor, in the Home tab, Data Sources group, click the Data source settings button. In the Power BI Desktop Power Query Editor, in the Home tab, Data source group, click the Transform data dropdown and then click Data source settings. In the Dataflow Power Query Editor, in the Home tab, Data sources group, click the 7) Changing on-premises path. Manage connections button. DataTasksForVideo, 01-DAMEMPT-Start.xlsx Page 17 of 28 BI 348 Highline Class No Data Analysis Task Duplicate Query, Merge / Join feature to perform lookup, calculated columns in Power Query, load 8) table of data to PivotTable cache. Why use Tools? Calculated columns in PQ allow to build full table before loading to PivotTable cache. Loading a table directly to a PivotTable cache prevents the table from being stored both in the worksheet and in the PivotTable cache. Data Model and DAX can handle big data and can create tables at different grains internally in formulas more easily than other tools. Average Daily Sales is the example we do to illustrate tables with specified grains in DAX formulas. Relationships help to reduce complexities in formulas, for example the RELATED function for exact match lookup. Relationships also allow you have multiple tables in the reporting area of a PivotTable or Power BI. Filter Context helps formula calculate over big data more efficiently by filtering the large fact table down to just the rows that contain the conditions for the Json file, Excel file, Power Query, Data Model, DAX to calculation. . Row Context allows formula to see the values in each row of a table, create reports with SUMX and AVERAGEX. Data Model or an iterator function like SUMX. Context Transition is when the row context in a Relationships to 1) preform lookup, 2) drag fields from function like AVERAGEX gets converted to filter context to help reduce the number dimension tables into reports and visuals to slice and of rows that the fact table has to iterate so that the calculation can be performed 9) dice. efficiently and accurately. This is most easily done with DAX formulas because they can determine the grain Calculating average daily units sold by customer and of a table internally in the formula. Context Transition is when all available row 10) product. context gets converted to filter context. Xml file, Excel file, Power BI Desktop to build 11) interactive visuals and reports. Power BI has interactive visuals and reports are easy to share. Publish report and semantic model to workspace in Power BI Online. Connect to the model in the Power BI Online allows collaboration in workspaces and can provide semantic 12) workspace in an Excel file. models and dataflows as "single source of truth". Dataflow (Power Query Online) to get and transform data and then make it available as a single source of Dataflow is like getting two tools for the price of one: 1) get and transform data, 2) 13) truth to external tools. Dataflows that serve as a "single source of truth" data source. DataTasksForVideo, 01-DAMEMPT-Start.xlsx Page 18 of 28 BI 348 Highline Class No Data Analysis Task 1) Source data in worksheet. Why use Tools? 4) Spilled Array Formulas for product sales report. Why worksheet formulas for data modeling and reports? Data is already in worksheet, we don't have a lot of data, and calculations we need to make are easy to do with formulas and PivotTables. Why PivotTable? For counting, adding, averaging, % calculations, and year/month totals, PivotTables are easier and faster to use than other tools. Why worksheet formulas? Solution instantly updates when source data changes. You can use cells, ranges, columns and tables. PivotTables, M Code and DAX can't do either. 5) GROUPBY function for product sales report. Why GROUPBY or PIVOTBY? Easier than any other tool, even PivotTable. Solution instantly updates when source data changes. Only works in worksheet. 2) Use worksheet formulas to build data model. 3) PivotTable to create year/month sales report. Examples 1 to 5, 01-DAMEMPT-Start.xlsx Page 19 of 28 BI 348 Highline Class No Data Analysis Task Why use Tools? Why Power Query? Can work with data structures like tables, files and databases, Txt file, Excel file, Power Query, Merge / Join feature to and can shape data better than other tool. The functions in M Code for dealing with perform lookup. Then Table.Group function to create data are unparalleled in the worksheet and DAX. The Merge feature allows you to 6) region report. perform lookup (similar to XLOOKUP and Relationships). When you connect to a file, the file path is hard coded into M Code formula. If you move the source data file and the destination file to the same new location, the data connection is valid. But if you move the source data file to a new location, you can break the connection. To re-connect, use the Data Source Settings option in There are multiple ways to open this dialog box in Excel and Power BI. If you are in the Power Query Editor: In the Excel Power Query Editor, in the Home tab, Data Sources group, click the Data source settings button. In the Power BI Desktop Power Query Editor, in the Home tab, Data source group, click the Transform data dropdown and then click Data source settings. In the Dataflow Power Query Editor, in the Home tab, Data sources group, click the 7) Changing on-premises path. Manage connections button. 6) Report created by Power Query: Region Ave. Units Sold Standard Deviation West 156.23 84.03 MidWest 161.67 83.95 East 169.31 82.97 7) On-premises file path error: Examples 6 & 7, 01-DAMEMPT-Start.xlsx Page 20 of 28 BI 348 Highline Class No Data Analysis Task Duplicate Query, Merge / Join feature to perform lookup, calculated columns in Power Query, load 8) table of data to PivotTable cache. Why use Tools? Calculated columns in PQ allow to build full table before loading to PivotTable cache. Loading a table directly to a PivotTable cache prevents the table from being stored both in the worksheet and in the PivotTable cache. 8) Load Power Query data to PivotTable: Table loaded to behind-the scenes PivotTable Cache No Data Analysis Task Why use Tools? Data Model and DAX can handle big data and can create tables at different grains internally in formulas more easily than other tools. Average Daily Sales is the example we do to illustrate tables with specified grains in DAX formulas. Relationships help to reduce complexities in formulas, for example the RELATED function for exact match lookup. Relationships also allow you have multiple tables in the reporting area of a PivotTable or Power BI. Filter Context helps formula calculate over big data more efficiently by filtering the large fact table down to just the rows that contain the conditions for the Json file, Excel file, Power Query, Data Model, DAX to calculation. . Row Context allows formula to see the values in each row of a table, create reports with SUMX and AVERAGEX. Data Model or an iterator function like SUMX. Context Transition is when the row context in a Relationships to 1) preform lookup, 2) drag fields function like AVERAGEX gets converted to filter context to help reduce the number from dimension tables into reports and visuals to of rows that the fact table has to iterate so that the calculation can be performed 9) slice and dice. efficiently and accurately. This is most easily done with DAX formulas because they can determine the grain Calculating average daily units sold by customer and of a table internally in the formula. Context Transition is when all available row 10) product. context gets converted to filter context. 9) Power Pivot's Data Model - DAX Measures & Relationships: Data Model PivotTable: 10) Average is made at day grain: Product TotalSales($) Average Daily Sales Apple 1,260,420.00 6,463.69 Banana 548,940.00 3,248.17 Cherry 1,635,079.20 8,054.58 Kiwi 811,890.00 4,460.93 Orange 912,849.00 5,246.26 Grand Total 5,169,178.20 9,484.73 Example 8 to 10, 01-DAMEMPT-Start.xlsx Page 21 of 28 BI 348 Highline Class No Data Analysis Task 11) Xml file, Excel file, Power BI Desktop to build interactive visuals and reports. Why use Tools? Power BI has interactive visuals and reports are easy to share. 11) Build Power BI Data Model and interactive visuals: Desktop No Data Analysis Task Why use Tools? Publish report and semantic model to workspace in Power BI Online. Connect to the Power BI Online allows collaboration in workspaces and can provide semantic 12) model in the workspace in an Excel file. models and dataflows as "single source of truth". 12) Publish to Workspace: Desktop Example 11 & 12, 01-DAMEMPT-Start.xlsx Online Service Page 22 of 28 BI 348 Highline Class Data Model gets published as a Semantic Model (Single Source of Truth): Report gets published as Report that anyone in workspace an consume: Connect to semantic model from Excel: Connect to semantic model from Power BI Desktop: Example 11 & 12, 01-DAMEMPT-Start.xlsx Page 23 of 28 BI 348 Highline Class No Data Analysis Task Dataflow (Power Query Online) to get and transform data and then make it available as a single source of truth to 13) external tools. Why use Tools? Dataflow is like getting two tools for the price of one: 1) get and transform data, 2) Dataflows that serve as a "single source of truth" data source. 13) Dataflow in Power BI Online Workspace: Online Service (Online Power Query) 1) Get, transform data, save to workspace: 2) "Single source of truth". Workspace Dataflows are data source for external tools like Excel and Power BI Desktop: Excel connects to Dataflows: Example 13, 01-DAMEMPT-Start.xlsx Power BI Desktop connects to Dataflows: Page 24 of 28 BI 348 Highline Class When we use Dataflow to upload files, the files are stored in OneDrive. Here are the OneDrive status icons that you need to be aware of: Icon Icon Name Blue cloud Shared with People Indicates that the file is only available online. Online-only files don’t take up space on your computer. The Description file doesn’t download to your device until you open it. You can’t open online-only files when your device isn’t connected to the Internet. Always keep on this device Indicates the file or folder has been shared with other people. When you open an online-only file (blue cloud), it downloads to your device and becomes a locally available file. If you need more space, you can change the file back to online only: right-click the file and select “Free up space.” With Storage Sense turned on, these files will become online-only files after the time period you've selected. These files download to your device and take up space, but they’re always there for you even when you’re offline. OneDrive is just keeping a copy online as backup. Padlock File or folder has settings which prevent it from syncing. New File File or folder is new. You'll see this when using OneDrive.com online. Red X File or folder cannot be synced. You'll see this in File Explorer or on the OneDrive notification area icons. Learn More Click the OneDrive icon in the notification area to learn more about the problem. Gray OneDrive icon Pasue Means you're not signed in, or OneDrive setup hasn't completed. The paused symbol over the OneDrive icon means your files are not currently syncing. To resume syncing, select the OneDrive icon in the notification area, select More and then Resume syncing. Sync pending The circular arrows over the OneDrive notification icons signify that sync is in progress. This includes when you are uploading files, or OneDrive is syncing new files from the cloud to your PC. Account blocked Account is blocked. Warning Your account needs attention. Select the icon to see the warning message displayed in the activity center. Signed into both work or school and a personal account. The blue one is for your work or school account, the white one is for your personal account. Online, Locally Downloaded Two Cloud Icon OneDriveIcons - 01-DAMEMPT-Start.xlsx Page 25 of 28 BI 348 - Highline Data Analysis Class Data and table terms: Variable Data A value that can change, like a product name or an amount of a sale. Variables can be quantitative (number), categorical (text), Boolean (T/F), or other. Values collected for one or more variables and kept together for reference or analysis. Data stored in its smallest form. Data: 22 3/29/2024 Product Not data: 22,3/29/2024, Product Data Type Field Field Name Record Table Fact Table Dimension (Lookup) Table Primary Key Foreign Key One-To-Many Relationship Note: Grain or Granularity Data is not information. Information is created from data. Declared type of data for a column such as: number, text or logical. Safeguards for the column of data and helps to assure data consistency and accuracy for reports and visuals. Column that is used to collect data for a variable. Column should have declared data type and must have field name at top of column. In the world of databases, columns are called fields. In other arenas, such as Microsoft power tools, fields are often called columns. Name at top of field that accurately describes the data. Field names are used in reports and visuals to indicate which variables to summarize Synonyms = column name = header name. A row in table that contains related data for each field for a given observation, such as a sales transaction, scientific observation or employee data. A collection of one or more columns, with field names in first row, records of data are in subsequent rows, and data types for each column. Data must be contained in a table to easily be analyzed. A great amount of data is not stored in tables, very often, the job of the analysts is to transform the unstructured data into a table structure. A table that contains the facts to summarize or measure, like sales, units, amount of time, sports statistics, and other facts. The facts are the measurements of activities, like amount of sales, number of points scored in a game or length of time at a web site. Fact tables are sometimes very large and can have thousands, millions or billions of rows of data. A table that contains a field with unique list of entities, called a primary key column, and subsequent columns with attributes for the entity. An entity like Product ID would have attributes like product name, price, cots and product weight. Dimension tables are also called lookup tables. The entity field in a dimension table that contains a unique list of items and is used to assure that there are no duplicate records in the dimension table. When a primary key is used in a fact table it is called a foreign key. When the primary key from a dimension table is connected to a foreign key in a fact table, the primary is called the one-side and the foreign key is called the many-side. A one-to-many relationship helps to make lookup formulas easy and allows dimension table attributes to filter reports and visuals. The terms "fact" and "dimension" go back to 1960s when the General Mills company and Dartmouth University used them to name their tables of data. Then in the 1970s the data research companies AC Nielsen & IRI company used the terms. However, the terms were popularized by Ralph Kimball in the 1980s with his extensive writing about data warehousing and business intelligence. The level of detail stored in a table, or the size of the number. A table of invoice total sales amounts has a larger grain and less detail than a table of line items sales amounts. A table of line items sales amounts has a smaller grain and more detail than a table of invoice total sales amounts. The grain of the 2023 total sales amount is bigger and has less detail than the grain of the May, 2024 sales amount. Terms - 01-DAMEMPT-Start.xlsx Page 26 of 28 BI 348: DAMEwMPT Database Relational Database Text Files Source data Data destination On-premises file path Online source data Delimiter Structure Or Schema Data Analysis Business Intelligence Data analysis process Query Data modeling Cleaning data Transforming data M Code (Data Mashup) A data model Star schema data model Terms - 01-DAMEMPT-Start.xlsx The grain of the May, 2024 sales amount is smaller and has more detail than grain of the 2023 total sales amount. This concept is important because if you have two fact tables at different grains, you often have to create a method to convert the two fact tables into one. Location where most data in the world is stored. A database that follows strict rules for storing related tables of data with no redundancy. A common vehicle to transfer tables of data from one system to another system using delimiters such as comma and tab. The original location of the data, like in a text file, an Excel file of a database. The location where the data is loaded, such as in an Excel or a Power BI Desktop file or an online source like a Power BI workspace. A hard coded source data file path in the data destination, such as an Excel or Power BI Desktop file. On-premises file paths can cause errors when the data destination file is moved and the connection to the source data is lost. Online source data can solve the problem of On-premises file and folder paths. Web sites, SQL Server databases and Power BI Online are examples of online sources that stay connected to the source data when the data destination file is moved. Is a character that separated bits of data, such as a comma, tab and other characters. The rules or structure for tables, data files and databases. Converts data into useful information to gain insight and make decisions. Information can be in the form of: reports, visuals, dashboards, and other forms. Data analysis allows you to make data-driven decisions, which tend to be more accurate & help to achieve goals more consistently Synonyms: Data Analytics, Analytics, Business Intelligence, Data Science. The definition is the same as data analysis, but the process is performed within the context of business data and business decision making. 1. Determine what questions need answers and what decisions need to be made. Everything else in the process is dictated by these questions and decisions. 2. Where is the data? How much data? What is the structure of the data? 3. Which MS tool to use? (Almost always starts with Power Query). 4. Clean, transform and shape the data into a table or model that is best suited to answer questions and make decisions, 5. Build final model with measures, metrics, relationships, and other features. 6. Create useful information: reports, visuals and dashboards. 7. Refresh when new data arrives. 8. Change and update model as necessary. A question that we ask of the raw data, like import Csv file, append tables, or group line item sales to calculate invoice total sales. Involves all the steps necessary to get and transform the data from the original data sources into a data structure that allows you to create the useful information required. Data modeling can be done with many tools such as: Power Query, worksheet formulas, DAX formulas, Relationships and many other Excel and Power BI features. Involves fixing individual bites of data, such as extracting text from a larger text string, converting text dates to proper dates or rounding numbers. Involves structural changes to source data or tables of data, such as converting Json files to proper tables, merging tables to add columns, or converting two fact tables into one table. Case-sensitive function language in Power Query that allows you to work with data and data structures to clean, transform and shaped the data into the required structure. Is the final data structure that allows you to create the useful information required is called a data model and becomes the intermediate step between the source data and the reports and visuals. A model with a fact table surrounded by dimension tables, relationships between the tables (usually one-to-many), pre-made measures, and is constructed to be user friendly. The Data Model in Power Pivot and Power BI are specifically designed to work efficiently with a start schema data model. Page 27 of 28 BI 348: DAMEwMPT Semantic model in Power BI The Data Model Columnar database DAX (Data Analysis eXpressions) Measures One-to-many Relationship Load data Summary Report Visuals Dashboards It is the same as a Data Model in Power Pivot and Power BI Desktop, and is usually a Star schema data model. The difference is that the model is stored online, contains data security features and serves as a single source of online data truth. Is the location in Power Pivot and Power BI Desktop where the data is stored in a columnar database and the DAX formulas, relationships and other model features are added. In the Data Model, behind the scenes, there is a RAM memory Columnar Database that compresses and stores the data, and which allows the DAX formulas to work efficiently with big data. Functional language used in the Data Model to work efficiently with big data and can internally create tables at different grains to reduce the complexity of many formulas. Metrics that help gauge performance or help make some decision. Primary key of dimension table is connected to the foreign key in the fact table to simplify lookup formulas and allow dimension table fields in the report area to filter reports, visuals and databases. Involves deciding where to load the data, such as loading to the worksheet, a PivotTable Cache, the Data Model, or a Power BI workspace. Reports often contain detailed numbers with labels, and the numbers are almost always metrics that help gauge performance or help make some decision. Reports almost always contains conditional calculations and therefore it is very helpful if you are fluent with logical tests. Present quantitative values in a visual way to get a quick impression and see patterns and trends more quickly than reports or tables . Examples of visuals: column, bar, line, scatter, map, waterfall charts, Pictures, Conditional Formatting, and more. Contain summary reports, visualizations, and other elements in one location so that we can monitor the activity as new data arrives. Just like a dashboard in a car, a dashboard should present information that is required for making good decisions Contains the worksheet, Power Query, M Code, Power Pivot, DAX . Excel Worksheet formulas instantly update when the source data changes. No refresh needed. Worksheet Formulas You are not confined to structured data such as tables and columns, you easily reference, cells, ranges, columns, or tables. You have freedom to incorporate any part of the worksheet and any of the many features. Must lock references, copy formulas, edit all cells with formulas and many formulas such as filtering a list are Legacy worksheet formulas: difficult. Do NOT need to lock references or copy formulas, edit is done in top cell of array only, and formulas for filtering Dynamic spilled array formulas are simple. Case-sensitive function based language to connect to any data source and make any transformation. Power Query Allow you to work with and transform data structures such as tables, files, columns of files, tables, records or lists. Load to worksheet, PivotTable Cache, Data Model, or workspace. User interface can write almost all the code for you. Memorizes all steps and allows you to go back and change or edit any step. PivotTables Power Pivot Power BI Desktop Power BI Online Service Workspace Dataflow Terms - 01-DAMEMPT-Start.xlsx For summing, counting and calculating percentages, there is no other calculation tool that is as fast and easy to use as the standard PivotTable. Can store millions of rows of data, has great formula advantages with DAX and Relationships, and you can have multiple tables in the reporting area. Has all the Data Model advantages, plus it has interactive visualizations. Advantages of sematic models, reports, visuals, dashboards, Dataflow and workspaces where colleagues can collaborate and connect to a single source of data truth. Area in Power BI Online, where you can assign organizational emails access to the workspace so you can share and collaborate with workspace objects such as: Reports, Workbooks, Dashboards, Semantic Models, Dataflow. Power Query Online that simultaneously can connect to and transform data, and serve as a single source of data truth. Page 28 of 28 BI 348: DAMEwMPT