Data Analytics Career Skills: Roles, Fluency, Governance

Introduction to Career Skills in Data Analytics Defining data analysis and roles in data analysis One of the challenges we face as we decide to pursue data as a career choice is the fact that there are many different paths and specializations. Let's define some of those roles, and then discuss the common skills that are shared among all. The most universal role is that of the data worker. This person consumes data regularly. Works with data often. Performs some data manipulation, and presents that data as a part of their everyday work. Let's take Sally for an example. She works in the business unit, and not necessarily the IT department, and each week she prepares a report for her manager. She'll then prepare the data for the reports, and her reports are the same as last week, the only difference is new data. You see most data workers have limited access to all the different systems through the backend. They likely receive data from people who have access to the databases. Data workers like Sally may even export the data out of a system into CSV or Excel files, and the process of their data work begins. A data analyst goes further. Generally, they might have a little more access to data, model the data, and have connected to the data, so they can just refresh the reports, and begin the analysis, and the presentation of the data. The data analyst will handle a lot of ad hoc requests, and especially if they're efficient. They will likely have more than just Excel to work with and are likely considered a guru or a wizard in their department. The data worker and the data analyst are what I consider the largest amount of roles available. Most people are some form of data worker, and they dive into data analysis more than they even know. The common skills of all data professionals are gathering data. manipulating it to me requirements, and then reporting the outcomes in some way. Data engineers take a special skill of being able to build and design data sets. Whereas a data worker, and the data analysts will work with what is already built, and model the data as needed. You will find a lot of people in crossover roles where sometimes they play data engineers, and sometimes they act as a data analyst. One could argue on the top of the hierarchy of data roles are the data architect, and the data scientist. A data architect is a creator of architecture, no different than an architect that designs a building. The data architect designs data systems. The importance of the architecture really can't be understated as all roles including the data scientist needs this architecture. What most see as the literal top of the hierarchy is the data scientist. And I believe this is likely due to the fact that most companies have their data architecture in place, and now it's time to take that data, and put it to use. This is where the data scientist comes into play. A data scientist will likely have all the common skills of the data analyst, the data engineer and the data architect. They'll also have deeper skills in coding, statistics and math. It's okay to not know where you'll end on your journey, but I think it's important to start. You can begin either as a data worker or recognizing you already are one. You can increase your skills as a data analyst, and then as you grow deeper in your experience with data, you'll discover where you want to be. In all roles, you'll gain a deeper understanding of data, and it's okay to find a place and stay there. Developing data fluency Is your organization data literate, data fluent, or none of the above? Let's break these terms down. Data literate means that you could read it, converse about it, and understand it. Let me give you an example, your bank account. It's all about your finances, right? And if you really look at it, it's just all your data through transactions. Can you read your balance? Can you tell me when something is there that shouldn't be? And if it is, can you call the bank and explain it? That means you're literate about your banking data. Now to the meaning of fluent. Fluent means you can create something with it that shows skills outside of just being able to read it and use it. We know people who speak other languages. They are either literate or fluent. Someone who is literate can, again, pick up on the common things with the language and speak in simple sentences, but a person who is fluent can carry on conversations and author stories in that language. Just like these terms apply to language, they apply to data. Let's go back to our banking example. If you are fluent with data, you can then turn your banking data from last year into insights that will allow you to build a budgeting system and a finance tracking system. To really build your data skills, you must begin to think about how that data skill applies to your everyday life. If you are data fluent, and at work someone hands you information, and it's the first time you've ever seen it, you will have an approach that lets you learn that data, and you will have questions that seem natural for you to ask. Approach is everything, and building an approach or identifying it can start today, right now even. Start thinking about every time you have a new data set in front of you, what do you do? That's your approach. If your approach is to stare at it and wonder, well, that tells you where to begin. Now there are degrees of data literacy and data fluency that are appropriate in the workplace. And I would argue that everyone should be data literate. The ability to read, speak, listen, and understand the data, or at least the data that applies to them. Could be time sheets or even paychecks. An organization that only has a small percentage of data fluent people means they do not have enough people to do the exploring and building that might just be the tool that takes their company to the next level. Becoming data literate and then transitioning to data fluent can be a game changer in your career. You can go from reading and basic understanding to producing insight and data tools for your organization or maybe for yourself. Understanding how data governance impacts the data analyst Have you ever asked permission to gain access to data and been denied? Have you ever asked for permissions and just been given global admin when all you needed was read permissions? If so, then you've been a part of the data governance of the organization, or the lack of it. Data governance is a framework that incorporates strategies to create solid quality data, enable accountability, and provide transparency to the data in the organization. Data governance has processes, procedures, and people at various levels of the organization. It's meant to control every aspect of the data in the organization. Data governance can support quality of data, accountability, trust, and compliance. There is some form of data governance in every organization at every size and level. If you work in a regulated industry, then data governance will likely be more mature than other industries. I have worked in almost every size industry, regulated or not regulated, and I'm either at the mercy of the data governance or protecting myself for the lack of it. Here are some common components of data governance that directly impacts the data analyst. Access to information, how you can access it. There is typically a chain of command and the data analyst is rarely meant to be the top of it. If you need access to information, there is someone that you will request permission to gain access to the information like your manager. Once they hear the request, they will typically instruct you to contact the next person responsible, or they will contact them on your behalf. I once requested access to the back end of a system to my manager. He then sent the request to the technology department who then between the two parties agreed I could have it. Little did I know it would go to a third person to implement it and notify me. The third person was the person in the cube to my left. I ate lunch with him every day. It was very controlled, and I did not understand it at the time, but now I have an appreciation of it. As a data analyst, we seek the source of truth, the golden record, and data governance is a part of providing that. We want to make sure there's an identifiable truth and that we can trust what we're working with. When we do not have at least two or three of these components to work with, we'll deal with challenges. For example, you may have been given more access than you need, and it might leave you wondering which data set you could really trust. Master data management is also a key component of the data governance framework. Making sure that the data we all need is complete, accurate, and meets the business rules. This is one area where organizations that do not have a strong data governance plan or strategy will have suffering data analysts. You may find yourself always correcting something as simple as a product name that have been entered incorrectly, but are literally the same product. You might be constantly customer address information. I'm always telling organizations that regardless of regulations, they have a data governance plan in place, whether they documented it or not. As a data analyst, determining the data governance plan at your organization will help you to know who to talk to, when to talk to them, and how to adequately follow the process of all things that relate to the life cycle of data at the organization. Understanding the importance of data quality As a little girl, I got sick. I mean really sick. My mother immediately took me to the doctor, and they did an x-ray because I had a headache so bad, I'd been sick for two days. The x-ray showed nothing, but the physical signs of the illness, and the blood work were enough for the doctor to send me to the ER. A day or two later, they did another x-ray, and when they did, they discovered why I was so sick. You see, I had a bacterial infection that was unfortunately on its way to my brain. I was hospitalized for 11 days, and then given the right types of treatments to prevent it getting worse, and treatments to help get it better. What does this have to do with data quality? Well, the first x-ray showed nothing, and what they actually discovered is that their machine was broken. Would I have gotten better faster if that first x-ray showed them what the second x-ray did? We'll never know. We can't go back in time. Quality data is data that can be trusted to produce accurate insights so decisions can be made. In my situation, had they waited even longer to do the second x-ray or even sent me home, I would not be here today. Not all data decisions are life or death, but they can have terrible consequences for businesses if data quality is not an everyday part of the culture. It is important for us to all remember as data professionals that people are using data to make decisions, and bad data can mean bad decisions with profound consequences. There are data quality dimensions that you can be aware of as a data analyst. This isn't a complete list of everything you will find for data quality, but here are the four major hallmarks of quality data, complete, consistent, valid and accurate. Completeness of data. Do we have all the data that's needed? Is any of it missing? Is it all usable? Consistency. Is this data in other systems, and is the information consistent across all of them? In other words, does the same record in production system match what we sent to the invoicing system? Validity. Does the data meet the requirements of what we are attempting to do with it? And is it in the right format in which we need to do it? Accuracy. Is it accurate? This is a big one. Is this information accurate? And in my case, it was not. I think it's important that we know quality can be measured, and we can determine if it's complete, consistent, valid and accurate. And if it's not 100%, well, we need to know that. Again, some data means life or death. So data quality at the highest rate is important. What is BI and the value to business? Have you ever heard the phrase, Our company makes data driven decisions? Well, of course they do. And they are making data driven decisions all the time. It could just be bad decisions because the data is bad. Data driven decisions happen all the time. We need more money in the account, So someone get those salespeople motivated. That is a data driven decision and action. The problem is this is a single data point and at a crisis moment. I tend to ask people if they want to be data informed? Data and business intelligence, lets you have both information and the ability to make intelligent business decisions. For example, with the correct data and an understanding of the process and the business goals defined with a solid set of KPI, key performance indicators, a business can see a downward trend in sales before it becomes a problem. This allows the business an opportunity to course correct to attempt to prevent a crisis moment. For business intelligence to be practical, it requires you to store the data that's important to the business and all its processes. You can't just focus on one number like our earlier example. Just knowing one number and that you have to hit it, means that you understand the goal. However, it's not all the information you need. All the other data that impacts that goal needs to be analyzed with the business roles. Fortunately, we have business intelligence tools and they are a tool to build business intelligence with. The tools do not provide it by themselves. Just like a hammer requires nails and someone to use it to build something. Businesses need to define the metrics that help them track the overall health of the organization. Again, these metrics are KPIs. Let me make it practical. And I'll use health as an example. If you know your heart rate as an adult is supposed to be anywhere between 60 to 100 beats per minute and you watch it every day and suddenly you see it spike and stay elevated, it would indicate that something is happening to make your heart beat faster. With a bit more information like tracking what you eat or drink, you can analyze this data and you notice that it's elevated when you drink a certain type of drink and it stays elevated for a couple of hours and then goes back down. To make an adjustment, you can stop drinking that drink or reduce the amount of it. Whatever the adjustment is, you make it, you then analyze your heart rate to determine the adjustment to see if it made a difference. When you apply this concept to the overall health of business then you can easily determine what the heart rate of the business is. And what are the items that impact it. This allows you to start to define the metrics that help you monitor the health of the business and provide business intelligence. How are business analytics and BI different? I started running about four years ago and it made me realize that business intelligence, business analytics, and even data analytics are really three individual things with lots of overlap. Let me break it down. I was preparing for my first half marathon. I needed the data to tell me how fast I was running so I could improve my speed. That speed for my mile was my business intelligence. For this example, it represents a single number. Business analysis focuses on all the numbers that would allow me to get faster over time by again, analyzing the data and creating more of it. For me, every time I made a run, I was tracking that information. Data analysis is where we analyze and capture the actual data. We can then analyze the historical data, keep capturing new data as it grows every day. To use all these concepts together, I was comparing every run to the last run. However, initially I just started capturing a run to establish a baseline and I just added more runs and more miles. The business intelligence was telling me how fast on average I was running a mile or the timing of certain miles, like, my 5K speed versus just my single mile. I had a goal that I wanted to attain. So, using business analysis skills, I applied this to my running. I would also use other values, like where I ran, what time I was running, to determine my future outcomes. For example, I discovered I needed to change my shoes. I changed my shoes, I ran a little bit faster and I hurt a lot less. I also discovered that if I picked more familiar routes, that I might run a little bit faster than on a route that was new. I was using these pieces of information to adjust my routine so I could see a faster speed over time. So, business intelligence tells us where we are on any given day for any process that we use data to study. My example is running. But it could easily be applied to business metrics, like, sales or production. And business analytics helps us to see the trends and predict future outcomes which are critical to businesses. We need both business intelligence and business analytics and we use data analysis to determine where we are and how to reach our end goals or the desired outcomes. Think of it this way. Business intelligence can tell you how you're performing today and business and data analysis can tell you how you can potentially perform in the future. How data can provide intelligence to the organization Call me strange, but I have a relationship with data. To me, data is living. I do realize it's an inanimate object, but I do think it's an intelligent object. Data cannot only provide information like this month's sales, but it can also communicate with software to provide automation. My first experience with data at this level was with a regulated industry. They were required to provide information at different time points, and then they were required to report on all the things they did to meet those times. Mail came into the office, people went to retrieve their mail for the contracts they supported three times a day. They also printed information and scanned it as needed by walking to and from the printer. Let's just look at the data around the printer, and the scanner process alone. Look at the data as a person walks, there is a distance and that distance takes time. And what if they stop and talk? That's more time. Think about the amount of time at the printer, and what if two people walk up at the same time? Now there is a time the person is there, and the time the other person is waiting. When they walk back to the cube, they begin the real work. When a person has everything ready to go, they walk back to get the mail and deliver the mail that's ready to go out. This occurred multiple times over time by different people all the time. Okay, now let us multiply that by 10 people doing the same job all day long and then let's multiply that by 260 business days. Now business intelligence says that today that's X number of hours spent in transit, business analysis says if we put a printer at their desk, we save X. This is just one way data can help support improvements to the process. We often think about data in the form of a field, or columns on a spreadsheet. Just with one key date, we can create other dates to trigger other events. Technology will allow us to create information automatically the very same information that a human would have to figure out and then we can have the human verify the information, save time and be more accurate. The most effective data analyst develop skills and a relationship with data. It's important to start learning to see it as a living thing that can help us refine current processes like walking to a printer and automate processes like creating more data. Understanding the value of data-driven decision-making If I told you that we have an opportunity to purchase a product, and that it was going to make a million dollars in the first year, you would get excited, right? That one number sounds amazing to people. Some people would immediately look at that number and begin to act; that is a data-driven decision. In our scenario, our company thought the product at cost was a steal of a deal, and they bought it all with the hopes that a million in revenue would produce a large profit. The only problem is that million dollar number is only the top line and in no way reflects the impact to the bottom line. When people get a single number in their mind, they can miss the other more important numbers, and it can have damaging consequences. Let's take our million-dollar project and break it down. The company bought the product at cost, and the company will sell the product at a list price, and the difference between that cost and the list price is the margin. All the numbers here, cost, list and margin, matter but of the three, the margin matters the most. It's important to remember that a million in revenue does not equal a million in profit. Our profit is made by the margin. Some people see that margin and get excited, but if you stop there, you are in trouble. You must account for the items that eat the margin because that eats the profit. When you use data to inform your decision making, you must use the topdown and bottom-up approach together. If you're an experienced person with these types of scenarios, you may have already figured out where this is headed. What do we need to do to produce the distribution of this product? We'll keep it simple. When companies sell products, someone has to sell them and there's a cost to that. And even if it's an online sales model, there are people involved in maintaining the information to make that happen. Let's just say for every $1 of the product, it costs 10 cents of that dollar to pay for the sales process. Then there are other costs: cost to store the product, cost to package the materials, cost to deliver the product. There's cost in infrastructure. Cost to automate sales processes. Payroll costs for people to maintain systems, answer phones, and ensure that delivery is met. If you really dig in, you realize quickly that if everything that is required to make the million in revenue eats up the entire margin, or you sold it at the wrong price, or you get hit with unexpected costs like increase in delivery, increase in storage or changes to tax, you are sunk. And if you can't sell it and you can't hold onto it, that million dollars no longer looks like a gold mine. So the impact of being data informed can ensure you're profitable, and that million dollars is really not a million dollars after all, but a total loss, which is maybe why it was a steal of a deal in the first place. Questioning techniques to collect the right data Have you ever heard of analysis paralysis? It's where you overthink a problem that stops you from moving forward. It's a real thing for some people. It's likely due to stress and anxiety related to making the wrong decision or not knowing exactly what to do next. Building an approach and thinking through standard questions and critical thinking with active listening should help you. Technical skills or hard skills are one thing for the analyst, but the soft skills matter just as much. And if you're stuck, no hard skill matters. To be fair, the more exposure to real problems and solving them with data solutions will help you build your approach, but you can slowly start building your questioning now. There are some common questions you might ask for every data-related project and the questions might be more specific based on actual problems and the data that you have at hand. Our scenario is that we have five of our top products. They're being purchased all the time, but the company is losing money. First, you need to understand that there is data in everything, in it and around it. This will help you start to consider the questions. Our task as the analyst is to try to determine why if the sales are moving, why are we losing? There are some basic questions that you should ask about each of the five products. Have these products ever been profitable? If they were profitable in the past, at what point in time? What is different about this point in time versus that point in time? Did the wholesale cost change? Did the list price change? Did the cost of storing or delivering the product change? Any of these answers will lead you further into data analysis. When we start with these basic questions and begin to answer them, then it will lead to more questions. As an example, let's say that in our initial questioning, we determine that the wholesale costs nor the list price have changed in the last three years. The cost to deliver has not changed enough to drive an impact. The cost of storing the products has been steadily increasing. The next round of questions begin. Is it only these five products that are impacted by the steady increase of storage cost? And what we discover, it's not just impacting these five products but all the products. The company just started to realize it in these five products. What can we do to reduce the storage costs? What type of increase can we justify on the products without overpricing the product? Both these questions lead to very different datasets within the organization and then each round of questions and answers leads to more questions. The goal here is to remember you must start asking questions and then remember they rarely stop. They just drive further investigation. The greatest part of the question process is that the end result is discovery and recommendations that are made to improve outcomes. Discovering and interpreting existing data Have you really thought about how much data is around a person? There's more than you may think. There's data like date of birth, names, race, and ethnicity. There's work data like employee ID, job title, hire date, or department. These data points are the items we think about when we work with data related to people, right. Some of this data is one value, like birthday. It's a value that it is, and it doesn't change. Then there are other items like job title, which might change when you get a new promotion at work. There's also real time data always occurring like heart rate, blood sugar, blood pressure, and even temperature. There's also geographical data like location. Imagine social data as well as what brands we follow, what brands we purchase, how often we have food delivered versus go out to eat. Data is always happening. The challenge we face as data analysts is there's a lot of potential data and not all of it is actually available to us. We also find a lot of the same data is redundant and in some cases can even be incomplete or inaccurate. All of us are seeking the single source of truth from the data that we work with. We actually want it to be accurate when we report on the data. Let me give you some examples. Companies have several different software packages that are used to handle different types of information. And they're often disconnected. There's people management software for HR type information, which is employee data. We have our marketing and sales management data. That's maybe in a couple of different systems and it handles not only staff information in regards to sales, but also customer information. There is also software that kicks in when a customer goes from being in conversations with our sales team to purchasing from the company. That data flows from purchasing to the warehouse. There's also data that flows to the accounting team to handle transactions that support reporting like profit and loss. What this means is that data flows through the organization at different times. Systems are often disconnected so finding which systems have the most accurate information is one of the first challenges. The only way to really know is to begin the investigation and question along the way. We sometimes hit roadblocks due to permissions and the sensitivity of data. For example, the data you might need to confirm your values is stored in the accounting software and only the accounting team has access to that data. Just because you can't directly access it doesn't mean you're done. You can provide them the values and those teams will work to help you validate. In reality, whether systems are connected or not, they should hold the same record of information. If your sales team reports that there's a hundred thousand dollars set to invoice this month, then the accounting software should reflect a hundred thousand dollars worth of invoices. When they don't balance out, you have to figure out where the breakdown has occurred. As a data analyst, you need to be thoughtful of the type of data you might find. And then you have to find the data you do have access to and develop strategies to validate your reports. Just remember data shows up in everything but it's our job to bring it together accurately. Data sources and structures We hear about data all the time, right? But what does that really mean? Let's start with the basics. Data has a value, like your birthday, that's a value like November 20th of any year. So your birthday would be 11/20 of that year. Data has a type, like birthday. It's a date data type. And data has a field name, like DOB, for Date of Birth. When we put these fields together, like First Name, Last Name and Date of Birth, we're creating a record. People use records and spreadsheets all the time, but they don't really think of the sheet as a table, but it actually is. It's just a table called Sheet One. And when fields are combined in a database, they're stored in tables. They still have names, values, and data types. And when we fill in this information for a person, we're creating a record. Tables are a great way to capture multiple types of data in a structured way. This way of storing data is way more flexible than the spreadsheet environment. There are also other types of systems that collect and store data for the analysts to use for their reporting requirements. This varies of course by company, but you can expect to find spreadsheets, databases or even data warehouses. Data warehouses really are data systems that have the refined tables from our production systems, like the purchasing system, for example. A customer-dedicated software system might have a database with hundreds of tables and details, but only certain tables and fields are needed for reporting. These fields get cleaned up by data warehousing professionals and brought into the warehouse for storage and safekeeping. It is a valuable source of nicely structured data that has been vetted for the analysts to begin their reporting projects. Structured data that fits neatly into tables and feeds a beautifully designed warehouse is amazing, but not all data is structured. This is where systems like data lakes help organizations capture data so they're storing it before it's actually refined for reporting needs. Data warehousing and data lakes and even data lake houses are very interesting. And if you're into designing databases or designing data solutions, you may find you want to explore these skills further. Data analysts will tap into these systems for the data. They don't necessarily create them. As a data analyst, you will find yourself working with various systems and file types. At the start of your career, you can expect a lot of spreadsheets and CSP files as you work your way up into working with data stored in larger data systems. And don't worry, no matter the level, most data professionals love a good spreadsheet when it's used for analysis and not for storing data. Describing data best practices [Narrator] Do you have an approach to data? Have you ever really thought about it? I know after years of working with data for projects or ad hoc reporting, that I've built a pretty defined approach to every data set that I work with. There are just some things that I do with every data set. The process may be a little bit different based on the software that I'm working with, but in this example, I'm using Microsoft Excel. This transactions file has actually been exported from a software that we use to analyze our transactions. Normally, when we're working on an ad hoc report or a project, we have an expectation of what we're going to deliver. But to show you that this approach will work with any data set, I don't have an end goal in mind. I just want to learn about this data set. If I take a little time upfront to learn more about this data set, I'll be better off when I start trying to meet the end goal of the project. Excel will sort, filter and perform data commands on what it sees as the data set. And that's a key point, what Excel sees as the data set. So the very first thing I want to do is confirm that the data that I'm working with in the transactions list, is entirely recognized by Excel as a data set, meaning that there are no breaks in the data. I do this by using one of my most favorite shortcuts. It will select all the data that Excel sees in the range. To do this shortcut, I just simply do Ctrl+A. That's not enough though, because this is a lot of data. It looks like it picked it all up. But if I zoom out, I notice pretty quickly that I have a broken data set. You see all of column Z is empty. So that means Excel will only sort and filter everything to the left. In order to fix this data set, I can right-click column Z and delete it. Okay, let's do that shortcut again. I'll do Ctrl+A, and now I have a fully intact data set that Excel will recognize. This makes it easier for me to sort, filter, and do all sorts of data commands. Okay, let me do Ctrl+Home to go up to A1. Before I go any further, one of the very first things I'll do in working with the data set, is I'll make a copy of it. So I'm going to take my mouse and put it on the bottom of the transactions list here on the transaction sheet tab. I'll hold my Ctrl key, and then I'll drag and drop it one step to the right. Now, it's important, I'm going to let go of my mouse first, and then let go of Ctrl. That makes a copy. Okay, I'll rename it to working. Copy. That way, if I mess up, I can always go back to the original transactions list. Okay. Let's take a deeper look at this data. When I see fields named ID, like transaction ID, this is database language for key fields. Okay, let's see how many of those we have. So I'm going to hit the select all which selects the entire sheet, and double click in between the A and the B column headers, and this sizes all of the data. So I'm looking at transaction ID. I have product IDs. I have reference order ID. So these are key fields and it automatically makes me wonder, are there duplicates in this data set? So let me highlight the transaction ID because that's what I really need to be unique. So I highlight transaction ID, and I want to spot the duplicates before I deal with them if they exist or not. I'll go to conditional formatting. I'll choose highlight sales rules, and I'll choose duplicate values. I'll go ahead and make them light red fill, and click okay. I have to look at the data and immediately I see some duplicated data. That means, that I have duplicates in this data set. So if I were to total it up or count the records, I would get an inflated amount of information. Okay, so I need to address these duplicates. Let me do Ctrl+Home to go back up to A1. It's easy to deal with duplicates when you know what fields to choose. What makes this a duplicate transaction, is the fact that the transaction ID is duplicated. I see them all highlighted in red. It's a little bit more obvious now that we know that duplicates exist, but in a sea of data, it can be hard to find them. Okay, let's go remove the duplicates. Now this command will actually remove them, but that's okay, I have my copy here. I'll go to data. I'll choose, remove duplicates. I'll choose, unselect all for this example. And I'll choose transaction ID. I'll go ahead and click okay. It tells me that it found a ton of duplicates, and that it's only going to leave me 1,228 records that are unique. Perfect. I'll go ahead and click okay. Now I have a data set with integrity, no blank rows, no solid columns. I know that I don't have duplicates because I've removed them, and I have a working copy so that I can continue to explore this data. This is in no way, a comprehensive list of approaches. These are just techniques that when you start working with Excel data, you might want to do them on every data set. Assessing and adapting the data for transformation [Instructor] Have you ever heard of data profiling? It's where we create a high-level profile of the characteristics of the data that we're working with. We should apply this approach to every data set. The greatest thing about profiling data is that when we use this approach, we get to learn about the data we're working with at a high level. Profiling helps to inform us on some pretty valuable items. It tells us how much data we have in this set. It can also tell us what the totals counts or averages of any number may be. This helps us validate our numbers later. It can also inform us about the data cleaning we will need to complete when we get ready to transform our data. I have some sales order data here, and I want to profile this data to help me get started working towards a report on sales orders. I'll first start by profiling the amount of data. I want to take a look at the record counts. How many records do I have in this data set? To do this, I can click on column A, and use the auto calculate feature on the bottom right-hand side of my screen. Now I have all of the auto calculate functions turned on. To do that, I just right-click the auto calculate area and then I can select each one of the options that I need. Okay, great. So when I look at this, I can see there's a count and a numerical count. So count will count everything I have highlighted and the numerical count will only count the numbers. So if I look at this record set, I actually have 3,500 records that represent the sales orders. We can also use some in average. Let's take a look at how much money is actually represented in this record set based on total due. I'll highlight column L. And this tells me I have approximately $33,700,000 worth of money represented in the total due column. It also tells me that my average is $9,633. Let's look at the average of the subtotal. This is the money before tax and freight. So the average sub total in this data set is $8,581. And the total is around 30 million. This tells me that if I see numbers like 60 million or 66 million, I have a problem in my data. So knowing how much it would total is important for validation later. Data profiling is so easy to do, but this is just the starting point of what you'll learn to profile your data. Remember, it will also help us inform our data cleaning. Take a look at columns, B, C, and D with me. These are order dates, but they look like zeros. If I click on B2, I can see that there is actually a date included. Just can't see it based on the formatting. Also for the purposes of my reporting, I don't need those timestamps. They're all set to midnight anyway. So this informs me that on my data cleaning process, I'll need to address the dates. There are additional profiling options that we will uncover as we explore deeper into our data and with other tools, but everyone with the data set and Excel, can use these options to profile their data. Understanding the rules of the data We hear about business requirements in the world of business all the time. They control what we are doing on any given project. Part of meeting the business requirements is the business rules. It is important when working with any data that you understand the rules around the data that you're working with. These rules can inform you when to expect data, what you can do with certain data of certain criteria, and also explain what needs to happen in the transformation of data. Let's work through some examples of business rules and how they can impact our data. Let's get started with just understanding what we mean by rules. Business rules can be as simple as a definition, is a contact for a salesperson, a customer, or a prospect. It could be as simple as a business rule that defines a customer is a customer once they actually place an order. These rules also control the flow of data. So if in our system, we have a sales order record, that means that the order has occurred. It means that that prospect and the potential sale made it to a certain stage of the process. Then the business can use this to easily distinguish a potential sale from an actual sale. This is an example of a simple business rule, and this rule can also be used to then convert a prospect to a customer using data. Some rules can be a bit more specific and have a technical requirement. We have some sales order data. This sales order data is going to be prepared to go into a new system that provides additional reporting about our sales orders. This information will go to our production team. So the business requirement is that we need to prepare the data to go into the new system. Now we have the data that we want to transfer to another system for reporting purposes. It has a specific template, and we must use this data from our system to match that data specification of where it's going. We've been provided this technical requirements document for our data. Let's take a quick read through that. First of all, it tells us that the sales order ID must be converted to a text data type, but it must not contain any letters. All of the date fields should not include time stamps. We also have to have a main account GL number. And that main account GL number holds a four-digit code for accounting and the last two digits to specify the category. Also, we see that territory ID and comment fields need to be removed. And the final step is to save our data in a CSV or comma-separated value file so that we can import it into the new reporting system. So now that we have our technical requirements, let's take a look at the data. Okay, so the business role in our technical spec said that sales order ID and sales order number need to be text data types. So I can look at sales order number and see pretty quickly it's a number data type. I know that because it's right-aligned in the field. I can see the sales order number is already a text data type. It's aligned left, but it doesn't meet the requirements because it contains two letters, S and O for sales order. I'll take a look at my dates. I can clearly see they include time stamps, so part of my technical requirement will be to clean this data to meet the rules, which would be only dates and no timestamps. Our specification also said we had to have a main account GL number, and this is a four-digit code for accounting and the last two digits specify the category. But when I look at the data, I don't see a main account GL number. However, because I know the business rules of the account number for these records, I know that that main account GL number could actually be created from the account number. I also see we do have columns that they said to not include, which would be the territory ID and the comments. When working with any new data project, you want to make sure you consider the rules of the organization in regard to their definitions for data. You also need to account for the flow of data and any specific technical requirements. Tips on preparing the data in Excel [Instructor] In the life of every data analyst, you reach a point where it's time to prepare the data. This is the part where we clean and transform our data to meet the requirements, and if you haven't heard, we do this a lot. You've profiled your data, you've reviewed all the business rules, and now it's time to dig in and actually get started. I want to work with my sales order data to prepare it for a template to import it into a new system. I typically start with a new blank workbook and I'll use Power Query to connect to my data, and then I'll do my data transformations there. I'll go to my data tab, I'll choose get data, and I'll choose from file. There are several connections here, but because my data is an export that's stored in an Excel workbook, I can choose from file and from workbook. Okay, I'll navigate to my data for template, I'll double click it, and this is what is establishing the connection between my Excel file and Power Query. I'll choose sales orders, and then I have two options. I can go ahead and load the data to this spreadsheet, or I can choose transform. Because I know I have transformations to make, I'll go ahead and choose transform data. Okay, so I'm connected to my data, and I can see my sales order query. I see my query settings and my applied steps. First off, I want to show you that it promoted my headers. Now, what that actually means is it took the first row of information that it saw from my spreadsheet and made that my column headers. And then it changed the type. What that means is that it looked at the second row of information, which was actually the first row of values, and tried to determine what the data types would be based on the values that it sees. Okay, for example, sales order ID. It has numbers. So it automatically translated that as a number. Order date has a date and a time, so it automatically made that date and time. Okay, as part of my requirements, I know that I have to change sales order to be text. So I'll hit the one, two, three, and change it to a text data type. It's asking me do I want to replace the current step or add a new step? I don't want to change how it read every single data type, so I'll go ahead and add a new step, and then on the right hand side, you see my applied steps has a new step where I changed the sales order ID to text. We also know from our technical requirements that we can have sales order number. It is also supposed to be text, but it cannot contain any letters. So I need to remove the S and the O from the front of the data. So what I'll do is I'll highlight that whole column and I can right click and choose replace values. I can also just select a single field and choose replace values. I can highlight the whole column and choose replace values up top. Okay, so I'll choose replace values. No matter what step I choose, the outcome will be the same. So I want to find all of the SOs in this column, and I want to replace them with nothing because I want just the number. I'll take a look at the advanced options. It's asking me do I want to match the entire sale contents or replace using special characters? Neither of these apply. Okay. I'll choose okay, and then immediately, I see sales order number. It's still text, which is appropriate for my requirements, but it no longer contains the S and the O. On the right hand side in my applied steps, I see replaced value. And if I needed to change anything, I could hit the little gear shape, and that takes me right back into my steps. Okay, I'll choose cancel there. Because Power Query keeps all of our steps, it's similar to what people do with recording macros or coding VBA for data cleaning. Except we're not having to code or record. We're just actually performing the actions and it's keeping up with it. Let me show you what I mean. So, let me click on navigation. Notice that the first row contains my column headers. So now when I choose the next step, it shows me that it promoted those headers. Then it changed all the data types based on the data that it sees. And then I started my first step which was changing the sales order ID, and notice I still see the SO until I choose replaced value. That means if my data changes, I can update my data source, and it will reapply all the same steps. Okay. Let's go ahead and change the data types for dates. I don't need the timestamp, and also notice they're all set to midnight anyway. I'll go ahead and hit the dropdown and choose date, choose date again, and then date again. So now I have my dates in order. Perfect. Let's go ahead and work with parsing text. So, first of all, we have an account number. This account number actually really needs to be referred to as the main account GL. So I'll go ahead and double click account number and change it to main account GL. Now each piece of this main account GL actually represents another field of data that I need. So, I need to actually parse this text. I need to split it apart and I'll use what's called a delimiter to do that. Notice there's a dash in between each section. The first thing I'll do is duplicate this field, and that will throw it all the way to the right. That way I can keep the main account GL and then also create the three new fields. I'll right click, I'll choose split column, and I'll choose delimiter. Notice there's several options here. I'll choose delimiter. My delimiter is a dash, although I have multiple options here. Okay, so custom dash, and I do want to split it at each occurrence of the delimiter. All right, I'll go ahead and click okay. Let me scroll over. And I have my three new fields built from the main account GL. Let's go ahead and name these. So I'll call this gentle. This should be labeled GL number. This field will be called account number 'cause that's what it represents. And then this last number here is called category. Okay, perfect. Now, if you look up top, you see what's called M for mashup. This is the language that's keeping all of my steps. If you want to see all of those steps, you can go to the advanced editor, and this is its recording of everything we're completing. Okay. I'll go ahead and close that advanced editor. I also need to remove columns. Now, I can actually right click any column and choose remove. I can keep the columns that I want and then right click and tell it to remove all other columns, or I can go to choose columns up top and then just deselect the ones I do not need. So I do not need territory ID or comments for my final file. I'll go ahead and do okay. Now that all my transformations are made, I can go ahead and close and load this data to my sheet. It tells me that I have 3,500 rows loaded. This is perfect. Okay, great, tells me where my data sources are and the time of my last refresh. These are basic steps that anyone can perform to clean up columns, convert data types, and break text apart. Transforming data in Excel with Power Query [Instructor] We've been tasked to look at how long it takes for our supplier transactions to go from the transaction date, to the finalization date. We want to see if there's any suppliers that may take a little bit longer for any given transaction. What we really hope to find, is that most of our transactions are under three days. We have all the data we need, but we don't have all the calculations we need to perform the analysis. So let's get started with a few transformations, and building out the calculations we need. Okay, I'll go to queries and connections, and I'll choose to edit my supplier's query. There are few transformations that, normally would've caused me to create functions in Excel, but because I'm using Power Query, I can perform these functions without having to create a function. Let me start by showing you supplier name. For our purposes, we need all the supplier names to be in uppercase. I can easily transform this column to uppercase. Also, we need the transaction date, but we also need the transaction year. I'll right-click transaction date, I'll duplicate that column, and then I'll transform this to just show the year. I can do that by right-clicking, transform, I can choose year, and then choose year. Okay, I'll go ahead and name that, transaction year. And then I'll just go ahead, and move it over by my transaction date. I have two amounts here. I have the amount excluding tax, and the actual tax amount. What I really need is the total amount. So I'm going to create my first formula. I'll go to add column, I'll choose custom column, I'll name it total amount. And then using my available columns on the right hand side, I'll scroll, I'll double-click amount, excluding tax, I'll add the plus sign, I'll double-click tax amount. It tells me that I have no syntax errors, and I can click OK. I'll go ahead and adjust this to be a currency data type. I only need the total amount, so I'll go ahead and right-click amount excluding tax, and choose remove. And then I can also remove my tax amount. Now we want to look at the number of days that have elapsed between the transaction date, and the finalization date. Let's go add another column. I'll go to custom column, I'll name this days, I'll choose transaction date, minus, finalization date, and click OK. Using this method will return the number of days, but because the transaction date was before the finalization date, it's showing as a negative number. It also doesn't really look like a number. It looks like a timestamp. What I'll do is go ahead and change it to a whole number. And what I'm really looking for is the absolute value. So again, I'll right-click, transform, and choose absolute value. Now I have all of the information I need, except I don't have the field that tells me if it's over or under three days. I'll use a conditional column. I'll tell it to look at the days, and then provide me text, that says over three days, or under. I'll go to conditional column, I'll name this over under, I'll choose days. And if it's greater than, or equal to, three days, I want it to say three days or more. For anything that's two days or less, I want it to say two days or less. This is a logical function that looks at the days, and then gives me a value if it's true, or a value of false. If I were doing this in Excel, it's similar to an IF function. I'll go ahead and click OK. Now I'm prepared to start my analysis. I'll go to home, I'll choose close and load. Now I see I have all of my extra columns that I've added, and my supplier name is automatically capitalized. This is fantastic. I'm ready to start looking at my supplier transactions, to determine if they're over or under three days. Now that our data is prepared, we can answer a few common questions on the production days. We'll start by inserting a pivot. I'll do insert, pivot table. It's going to use my supplier's range, and it'll be on a new worksheet. Perfect. I'll drag my over and under to rows, I'll go ahead and drag my supplier transaction ID to values. And because it's a number, it will automatically sum it. I'll go ahead and change that to account. Click OK. Just looking at the numbers, I can tell that most of the transactions have been three days or more. Let me do one more quick analysis step. I can right-click, show, value as, and tell it to show me the percentage of the grand total. This high level detail tells us that 69% of our transactions, really are taking three days or more to produce, only 31% approximately, are actually under two days. Okay. We need to do some more analysis. Transforming data can mean a lot of different small techniques applied to the data as you work to get to your analysis. Transforming data in SQL [Instructor] If you are a data analyst, at some point you will encounter or hear SQL or SEQUEL. Let's start with the basics. SQL stands for Structured Query Language. Structured query language is vast. It's not unlike any other language. SEQUEL, however, is a computer language that works with data and the relationships between them. Microsoft SQL Server is a relational database management system developed by Microsoft with the primary function of being storing and retrieving data, although it does so much more. It was developed over 30 years ago and it does a lot of different things with data. And it's important to understand you don't need to know them all. As a data analyst, you do need to know some basic queries. A basic query allows you to select data from the database. There are two required statements for a SELECT. You must know what you want and where you want it from. This is the SELECT and the FROM statement. The SELECT will list all the fields from the table, and the FROM actually lists the table name. If I want to filter data, then I'll use the WHERE statement. And if I want to sort data, I can use the ORDER BY statement. WHERE and ORDER BY are not required to be in the statement. However, when they are used together, they are required to be in the right order. You have to filter the data before you sort it. Let me show you how to run some basic SEQUEL statements. I'm using SQL Server Express and I'm working with Microsoft SQL Server Management Studio on the Wide World Importer Sample Database. I'm going to run the supplier transactions. I'll right click, I'll select top 1000 rows. This generates a basic SQL statement. It selects all of the fields from Wide World Importers. Okay, it's also a filter for the top 1000 records. I'll go ahead and remove that statement and execute it again. If you look on the bottom right hand side, this tells me I have 2438 supplier transactions. To add more meaning to this data, I actually need to add another table. And this brings us to working with joins. When you have data in multiple tables, you leverage joins to control what data shows in the results. Okay, I'm going to highlight my select statement. I'll right click it and go to the design query and editor. Even though I can code all these statements, it is easier to work inside a GUI, a graphical user interface, especially if you're at the beginning. Okay. So I'm going to size my table here so I can see everything. All right, I'll right click and go add a table. And I want to add the suppliers. Because these tables have an established relationship in the database design, they're automatically joined. They're joined by the supplier ID being in both tables. I can also see that it's a key shape with a one to many, meaning I have one supplier listed and they may be attached to many transactions. When I hover over the diamond shape, it shows me the inner join, but I can also see that in the statement here. Okay, perfect. Now let me add the supplier name. Now it will automatically throw the supplier name at the last part of the select statement. But if I want to put it at the beginning, I can just drag it up. Okay, I'll click OK, and then I'll execute my statement. An inner join works by looking at both tables to find a match. And what that means with these two tables is that if I have a supplier name and that supplier has a transaction record, they will show in the results. This is showing me 2438 records where I have a supplier and a transaction. This is perfect. The only issue I have with this data is if I wanted to report on suppliers that we have in our system, regardless of their transactions, I have to adjust the join type. All right, I'll highlight my statement, I'll go to the design view. The diamond shape is where I can control the joins. I'll right click this and tell it to show me all rows from suppliers. And this will create an outer join. If it's left or right, will be determined on how the database sees left or right. So I've told it to show me all suppliers, regardless of their transactions. I'll click OK. And if you'll notice in the statement, it's a right outer join. It sees the supplier table on the right of the data. Okay, I'll go ahead and execute. I now see that I have more records. I have 2,444 records. This means I do have suppliers listed in our data set that do not have transaction records. Let's scroll to the bottom and see what that looks like. Starting with nod publishers, I see the first set of suppliers that do not have transaction records. They're easy to see because the transaction record all says null in each field. That's because there are no supplier transactions for these final suppliers. If I want to see if there are supplier transactions that do not have a supplier, then I can adjust the join type. I can tell it to give me a left outer join. Now I could go to the design view and adjust this, or I can just type left outer join here in my statement. And then I can execute. That's because there's a relationship between these tables that will not allow you to put a transaction in without a valid supplier. But again, you could easily have suppliers that do not have transactions yet. Because join types do impact the data we have in our results set, you always need to critically think through what you're trying to achieve with your data and know that you might need to adjust the join type. Transforming data in Power BI [Instructor] Power BI offers two core functions for the data analyst: Transforming the data, as well as presenting the data. We want to analyze the sales of our products to note our top 10 products. We will eventually visualize this data for an executive meeting or a future dashboard. The opening screen of Power BI desktop is a blank page ready for visualization. This happens after you've connected to your data. If you notice in my Field pane on the far right, I've connected to the tables I need to analyze the top products. It's the Order Details and the Products. We'll do the transformations on Product and Order tables in Power Query. I'll get to show you the Group By function, so I can total the orders. As well as the Query function, where I merge the orders and the products together using the Merge Queries option. Because I've connected to my data, I can just go to the Transform option and begin my data cleanup. I'll start with Products. The only information I need for the top product analysis is the product ID and the product name. So I'll click Product ID. Hold my Control key and choose Product Name. I'll right-click and Remove Other Columns. This just leaves me with the two that I need. Because this information might present better with the product name being all uppercase, I'll go ahead and rightclick, and transform it to be uppercase. I'm now ready to move on to Order Details. Order Details gives me the order ID, the product ID, the unit price of that product, how much was ordered, and the discount. One of the very first things I need to create is the function that gives me the total amount after the applied discount. Okay, I'll choose Add Column. I'll do a Custom Column. I'll do Total Order Amount. In this statement, I'm going to create the subtotal, calculate the discount, and then deduct them from each other. Again, mathematically, you could do this multiple ways. Okay. I'll click OK. And now I have my quantity times my unit price minus the discount amount. I want this to be a fixed decimal number. Fantastic, I now have my Total Order Amount. Now I'll use the Merge function to create the query that merges the Products table to the Order Details table. I'll start by clicking on Products. I'll go to my Home tab. And I have the option for Merge Queries. We have two options here: Merge Queries or Merge Queries as New. Merge queries, if I select it, will allow me to merge data directly into Products. Merge Queries as New will give me a third object to work with. For the example in what I'm creating today for analysis, I just need to do a Merge Query. I can merge that data directly into Products. Now I have the Merge screen and I have Products. And I want to merge it with Order Details. And the common field between the two is the Product ID. So I want to make sure that I've highlighted those. The join types, just like any other data set, are here in the Merge. If you look at the bottom of the screen, it says Join Kind. And if you notice, there are 75 records that match 77 rows from the first table. That means I actually have two products with no records. Meaning, they haven't been ordered. And that's okay. We're looking at the top products, so obviously, all of them wouldn't have been ordered. I'll go ahead and hit that drop down. I have Left Outer, which would show me what it's showing now, all products, and if it hasn't order. Right Outer, which would show me all order details regardless of the match to the product. A Full Outer, meaning, if I have products and order details that don't have records matching, it would show all rows from both. An Inner join, which is what I need here, show me just products with orders. You also have Left Anti and Right Anti. This would show you just the null values. So if I were to choose Left Anti, it would only list the two products that didn't have order details. For my top analysis, I need Inner. I'll go ahead and click OK. And now I can expand my table. I'll hit my Expand here. I don't need to use the original column name as a prefix, but that's a preference. I don't need all of the columns. I really just need the Total Order Amount. I can go ahead and click OK. And now I see the Product Name and the Total Order Amount. Now I'm ready to group them up. This will allow me to use the Group By function, and total by each product. Okay, I'll go to my Transform tab. I'll go to Group By. Okay, I want to group by the product name. And I want to get a total... by summing up... the actual total order amount. This will take each individual line item and total it up by product. Giving me the total orders. I'll go ahead and click OK. Now I see each product name and how much was ordered. Okay. When I go back into my visualization, I really only need to see Products. So I'll go ahead and tell Order Details not to load. I'm not using it at any visualizations, so it's okay for me to continue here. All right, I'll go ahead and go to Home. And I'm ready to apply this data set... to my visualization page. So now on my Fields list, I see my Products. I'll visualize this in a table, so I'll choose Table. And then I'll drag my Product Name... and my Total to the Values. Okay, I'll go ahead and size this out so I can see it. Now, right now, this represents every single product. And we're trying to get to the top 10. I'll go to the Product Name on the filters. I'll tell it to do an advanced Top N filter. Where top is 10. I'm going to base that on the total, so I'll drag that Total to the By value. And then I'll apply that filter. After applying that filter, I see the top 10 products. Let's go ahead and sort it. Just by clicking that Total header. These techniques and joins show you exactly how powerful Power BI and data can be when you establish cleaning routines for basic presentations of data. Common cleaning and transformation When building your cleaning and transformation toolbox, there's some common cleaning and transformation items you will use. Others will be more specific to the deeds of the data you work with. Let's start with general cleaning. Spaces are invisible to the eye, but in fact, they're characters. And when a field has extra spaces, you will want to clean those by removing them. There are leading spaces which are spaces that are at the front of the field. There are trailing spaces which are at the end of the field. When we want to remove either leading or trailing spaces, then we can use functions like trim or clean. The act of breaking out text is referred to as parsing text. And we can do this with any type of delimiter and every program handles this a little bit differently, but the outcome is the same. Spaces will also serve as a delimiter, like the spaces between words are valid spaces. Imagine first name and last name. In the case we want to have both last and first in their own individual columns for sorting, as an example, we will use the space to break those columns. This is not the only time we parse texts using delimiters. You might break apart text fields based on things like a dash or even a comma. We use things like text-to-columns, split by delimiter and functions like left, right and mid to work with parsing text. We don't only break apart text. There's also times when we need to combine text fields together. This is commonly known as concatenate or concat. We also replace text with valid text. For example, if someone enters an abbreviation of a state in the United States, but we want the full state spelled out, we might replace that text with the valid response. It could be a misspelling that we're correcting. There are several methods for replacing invalid data with valid data. We also change the case of text. Example would be maybe we need everything to be in uppercase or lowercase or even corrected to proper case. There are functions to do each of these commands, and again, they might differ between programs, but the outcome will be the same. These are very simple commands to perform in any data program. You may find that you'll also remove duplicates from a dataset and this can be done with commands like remove duplicates or using distinct keywords inquiry statements. We also transform data types to be appropriate for what we need to do with the data. You may have date fields that are stored as text, but to work with date-related functions, you need to convert it to an actual date data type. The same goes for numbers. If you need to work with a mathematical function, then the value of the field must be a number data type. These are just a few of the basic commands that we use for cleaning and transformation of data and some of the first ones to understand and master. Using built-in functions [Instructor] There are a lot of people who don't enter into the field of data because they're intimidated by math. It's important to recognize that one of the powers of these tools is that it performs all types of math from basic to complex mathematical computations for us. We don't have to manually create every function we need. The tool provides us a lot of calculations. For example, in this Power BI dashboard, let's take a look at the fields, look at Quantity and UnitPrice and even Discount. Do you see how they have the sigma shape? It's because it recognizes them as numbers, and this means it will automatically aggregate them for us and summarize them. Let me show you what I mean. I'll go ahead and add a table, and I'll go ahead and expand orders. I want to bring in the order ID. I also want to take a look at the product name. Now let me expand this. I want to be able to see as I ad fields in. I'll go ahead and bring in the quantity, and then I'll bring in the unit price. Do you see how it automatically totals the quantity and the unit price? This doesn't make sense to me. The unit price is just the price and the quantity, well, that's the quantity that was ordered for that order ID. So what I'll do is I'll right-click on the quantity and tell it to not summarize. I also go to unit price and tell it not to summarize. I think I would prefer to see unit price over quantity. So what I'll do is I'll just drag that order and change them, perfect. I do want to see the subtotal, and one thing I'm finding here is that I don't have it. I'll build that in my model. I'll go to Transform Data. I want to add it to the order details. I'll go to Add Column, and I'll choose Custom. Okay, I'll go ahead and call the SubTotal, and here I'm using their function builder. I'm going to go ahead and say UnitPrice, by double-clicking, multiplied by Quantity. Tells me I have no syntax errors, which is great. I'll click OK, and now I have my new subtotal. Notice that my default is A, B, C and 1, 2, 3, alphanumeric. I'm going to go ahead and change that to a fixed decimal number, all right? I'll go to Home, Close & Apply, and then I'll bring my subtotal into my table. Now, in this case, I do want this number to total. This makes perfect sense to do that. This is my amount before I apply a discount. Okay, let's take a look at something else Power BI does for us and, in this case, it keeps us from having to write functions. Notice the order date. It's actually got a date icon, and when I hit the little expand, it has a date hierarchy. That's because Power BI assumes that I will probably want to work with year, quarter, month or day. Let me drag my date hierarchy into my model, and I'll put it up by the order ID. Notice, automatically, I get the four individual fields. There are times I do want this and times I don't. In this case, I just want to see the order date. So what I'll do is I'll actually right-click the order date hierarchy and tell it just to show me the order date. If you work as a data analyst, you probably work with pivots and matrix. Remember, that's roses, columns and summary values. Here, I'm going to add a matrix, and I'm going to look at values based on the shipping country. I'll go ahead and add Ship Country to the rows, and I'll grab my subtotal and add it to my values. This lets me see every single country, and it automatically summarizes its subtotal. Now, if I wanted it to be an average, I could right-click and choose Average. If I wanted to show the max of any particular subtotal in a country, I could choose Max. Again, I'm not completing this math. I'm just choosing the right options. I'll go ahead and choose Sum. Another powerful feature of Power BI is the ability to use quick measures. I'll go ahead and click on Quick measures. These are actually measures that are written in DAX. They're freely available for me to use. I can go ahead and hit the dropdown, and I can see options like Aggregate per category, giving me average, variance, max or min. I have different filter scenarios, different time intelligence scenarios, like year-to-date totals, year-overyear change. I also see totals, like running total. Let's do that. I'll choose the running total. I want to work with my subtotal, and I want that running total to be based on the different country. So I'll go to my orders. I'll choose my ship country. I'll go ahead and leave it as ascending, and I can hover over each one of these options to learn more about it. All right, I'll go ahead and click OK. Now I see the DAX behind this particular calculation and on the right-and side, do you notice how I have my subtotal running total? And it has a little calculator shape. Let me go change this to read RT_SubTotal. Okay, and then I want to actually go put this into my matrix, which is going to just drag it underneath my subtotal there. So the running total works by adding each value to each value. So I started with 8,000 approximately. So my starting running total is 8,000 approximately, and now when I go to the next value, Austria at 134, it adds that 8,000 to that 134 and gives me the 142,000. That's the running total. One of the really great things is that I can actually add more variation to this and change my running total. For example, I want to see a running total across the years. So I can actually drag Year into my columns, and then my running total starts over again for each year. These are just simple examples of some of the power of the built-in functionality in Power BI. Just remember, with power comes a great responsibility. That really sounds like the beginning of a superhero movie. I tell people all the time, anyone can make numbers show up, but that does not make them correct. If I could offer you a piece of advice, really think through what you're trying to accomplish with the numbers, consider what functions you might need and then also read about what's available. The more experience you have, the easier the research will be for you but don't worry, you'll always be studying something new. Relational databases Have you ever really thought about how systems store data? I bet if you're a new analyst, you have not gotten that deep into the idea of how data is stored, but you just know that it is stored. Relational databases have been around for a while and you will hear people talk about SQL databases or SQL scripts or statements. A data analyst doesn't have to be fluent to be effective. You can do a lot with just being somewhat literate with SQL. This is a key area that you can further study if you're interested. RDBMS stands for Relational Database Management Systems and server technology, like Microsoft SQL Server, can store these databases. There are others. Even something as simple as an Access database has relationships and relational data. We need to go back one step and discuss structured data. When you work with a spreadsheet that has column headings and data values, then you're actually working with structured data. This data has a field name, we see it in the column headings, and then it has a data type and a value. When we build relational databases, we build structured data sets that are stored in the form of tables. These tables then become connected through a relationship between key fields. These key fields are unique identifiers that help control the data that can and cannot go into a table. When structured data is defined and then stored into tables and then the tables are related, this creates a relational database. These relational databases are used to hold information and we as data analysts use this structured and stored data to build reports, visuals, and analyze data. One thing that is important to note is that you as the data analyst must understand the structure that is used to store the data does not always make it easy for reporting. Why? The rules for effective storage are different from the rules used to combine data for reporting. They are two very distinct roles and functions, even if they work with the same data. As an analyst, you do not have to know how to design large-scale data systems, but you will want to understand some database design techniques so that it makes understanding someone else's design easier. Modeling data for Power BI We will work with different data in different data sets or tables to do analysis and visualization. When we have multiple tables that we're working with, we'll want to model our data to get the most out of it. When you have an entity relationship diagram where the tables and relationships are showing in a model, you're actually seeing the model of the data. Now I've already connected data to my Power BI Desktop and on the right hand side, you see my fields list. I have different tables of information required for my reporting. It appears that I'm ready to go but I need to go one step further. These data sets are meant to be joined together. There are several ways to join and model data in Power Query for Power BI. When we perform merge queries, for example, we're actually establishing a join but we can also go to the modeling section and model this data from the very beginning and this allows the data to communicate through the joins, meaning if I reference an order, it knows what product and what order details are related to that order. In looking at the diagram, we see that there are some joins already established. Power BI as a convenience tries to join the data automatically, this is called auto detect and it tries to auto detect the relationships. You should always confirm that the relationships that it establishes for you are correct. Remember, it's easy to model data when you know what data is related to each other. Let's look at the orders table and the order details. These are joined together by the order ID. Also notice we have a '1' and a '*' or star symbol. This shows us the cardinality of this relationship, it's a one to many, meaning we have one order and many order details, not unlike when you place an order and buy multiple things, you have one order record and then the different line items and quantities for the products that you purchased. Let's look at the products information and the order details. These are joined by the product ID and again, it's a one to many relationship. There are other relationships when we refer to cardinality, there's one to many, many to one, one to one and many to many. One to one means that there is only one record tied to one record between the two tables. One to many and many to one, like our examples here mean that we have one record in one table that's tied to many records in another table. I do have a join that needs to exist but doesn't. Take a look at the employees, you see how there's no line to any other table, this means that the model doesn't know how the employees relate. I'll use the employee ID and drag it to employee ID, this establishes my relationship. I can go ahead and look at the properties of this relationship. I'll right click the line and go to properties. This shows me the orders table, which is the many side and the employees table, which is the one side and I see the cardinality is many to one. I'll go ahead and click OK. To manage all the relationships, I can go to manage relationships up top and work with each one of them. Okay, let's see the model at work. I'll go to report and I'll begin to build a basic visual. I'll start by just adding a table. I'll go ahead and bring in the company name from customers. I'll bring in the last name from employees. Okay, I'll collapse those so I can see. From order details, I'll actually go ahead and bring in the order ID. I'll bring in the order date hierarchy. I just want to actually show the order date so I'll right click that and just show the order date and then I'll bring in the product. I actually want to put the product in between the order ID and the order date and then I'll also bring in from order details, the unit price and the quantity and then I'll bring in my total after discount. Because I've modeled my data, I know that I have the correct company listed with the correct last name of the salesperson with the appropriate order ID and the order details for each one of their orders. Because we've modeled this data together, we can now explore the data using all the features that help us visually without having to create various merge queries to accomplish the joins. Master data management Have you ever been working with data and you see customer addresses all have different ways to reference the region? In some countries, we have states, provinces, or districts. And when they're used in the data and entered by different people, they may reference the full state, province, or district name or they may list the abbreviation. Data like our customer and their address information would be considered master data. We want everyone in the organization who works with this data to have the same consistent list of information. When an organization takes the time to design rules around the master data, this will also inform all the data analysts of what types of transformations apply. Using tools like Power Query, either in Excel or Power BI, we can easily make these corrections and save these steps so that as new data comes into our reports, it will conform to the standards. Master data is not just address information, though. It could be project names or product names. If we call a project something different, then it makes it difficult for the data analyst to report on this information with ease. There are tools that exist to support large scale organizations with master data management. But I would argue no matter the size of your organization, if you do not have a plan in place, the analyst will be dealing with it all the time. So as master data management aims to keep a clean, complete, and accurate list of master data for the organization, if you don't have master data management, then you will need to develop a plan to keep a nice, consistent list of data when you report. Let's take products, as an example. Two companies have merged. They sell the exact same products, but in both companies, they're not called the same name. As a data analyst, you can use a table that holds every possible name and the correct name so that when you report, you can leverage joints to give yourself a master table of information. When a new name pops up, you'll have to address it in your master table, but it's better to have that table than to not have it. Your data set being clean and complete is one of the most important parts of any project. Just remember that all of your data skills can apply to many types of data scenarios, not just the analysis or the presentation part of the job. Unstructured data Did you know that there is way more unstructured data in the world than structured? As a matter of fact, did you know we use structured data to produce even more unstructured data? Data that neatly fits into tables or spreadsheets is structured data, and unstructured data is literally everything else. When we post videos, take pictures, create PDFs of bills for our clients, we are contributing to the vast amount of constant unstructured data. The minute we had the ability to walk around with a PC, video camera, still camera and social media outlets in our hands like our mobile devices, the world of data exploded. Let's just take an image for example. This is unstructured data. You must look at the image to understand what the image is representing. Same thing for a video. You have to watch it, and it's an immense amount of data. With that said, there's also semi-structured data, which is a mix of both structured data, and unstructured data. Let's say you receive an image of the cutest cat ever on the beach via a text from your best friend. When you see this image, you see the cutest cat ever or at least someone's opinion that is the cutest cat ever, and you see the beach. A data professional sees much, much more. What I see when I look at the cutest cat ever picture is much more than a cat in the beach. I see the time of day, the weather, the location, the type of cat, the color of the cat, even the age of the cat. I also see the image type like PNG, and what's the image size, as well as the dimensions, and what's the quality of the image. Don't forget. We mentioned that we received this from someone, that's data. We received it at a certain time, and that's data. Did I mention when the picture was taken, and by who? I mean this list can keep going. Just think. It went from being the cutest cat ever to a lot of data really fast. Now, imagine people posting their favorite images on their social feeds. Multiple times per minute, and then others are sharing that image or they like it, or they look at it and move on. That's also data. Unstructured data requires our brain to review and provide context, and structured data fits neatly into designs. And semi-structured is everything in-between structured and unstructured. Depending on the organization you work with, and what they do as their product or service will determine what tools, and software you need to work with with their data. Just knowing that there are different types of data like structured and unstructured can help you explore the roles in data. Data is not going anywhere. It is only growing. And just think, there was a time when the name didn't exist. It should keep us all motivated for what's coming next. Visualization methods and best practices I read a post lately about how the person designed this beautiful dashboard and no one was using it. This left the data professional perplexed and frustrated. I get that. But it immediately made me start thinking of why. Why, if it is so great, are the users not using it? Well, have you ever heard of beauty is in the eye of the beholder? It could be as simple as the data analyst designed something that only the data analyst can use. Looks great, but no one else understands it. It could be that the data analyst just loaded this beautiful dashboard, sent a link, and said, "Here's your great, new dashboard." Best practice number one. For a moment, be the person you're designing for. If you want to see what this feels like, imagine driving a 10 or 15 year old vehicle and then go sit in the newest car on the lot with the most features. The dashboard will likely take you more than a minute to translate before you drive it. Now, imagine that someone hands you the keys and says, "Take it for a spin. It's amazing." What do you do? Depending on you, you freeze, go with it, or get out? In the same scenario, imagine that that car salesman came out and explained the differences between your car dashboard and this new dashboard, or at least hit the high points with you. What would you do then if they directed you where to look to make you feel more comfortable for going for a drive? Always take time to document and provide a little bit of training on your visuals. Be consistent. Use the same color for the same item all the way through. If my brain says, "This product is blue on this stacked bar," then every time I see a reference to this product, it will be blue. And then when I believe this new visual has the same blue, is the same item, and I realize it's a totally different product, I get stuck on why isn't this the same? And the data is not showing me anything. Don't overcomplicate to show your fancy vis skills. I do understand that people want to use advanced visuals to show their skills. But the point of the dashboard has nothing to do with your skills, but providing information. If you provide valuable insight through correct visuals and layout, they will believe you to be a visual magician, and they will not care that you presented it in simplified visuals. Be sure to title, label, and add tooltips appropriately. People should be able to read a title for context, be able to easily read the labels, and hover over to get additional insight, not just see the same thing the visual already shows. Remember that a picture is worth a thousand words. And if we could all make decisions by consuming thousands of lines of data, we wouldn't need visuals. Not all data visualization is a chart or graph. Make appropriate use of cards for high level totals and other aggregate functions. And remember, a table, matrix or pivot is also a visual presentation of data, and some people prefer that matrix to a chart. So it never hurts to give them both to meet the needs of the audience. Always remember that your visuals will be used to provide information. So make sure that it does it in a way that people can quickly understand and make decisions. Creating reports to visualize your data over pages [Instructor] Not all data is best consumed using a dashboard. Yes, dashboards provide valuable capabilities, but some reports can be valuable in different formats. When we have reports that have line items and that is the type of display this will produce several pages and a dashboard representation of that data may not be the most user friendly. There are tools like Power BI Report Builder that allow us to build what are called paginated reports. Paginated reports allow you to connect to data, not unlike dashboards. In fact, before the popularity of the dashboard most of the reports were paginated reports. Although some people think they are a thing of the past I think it's important to remember what determines the style of your report is the need for how that data is best visualized and how it's going to be consumed. If it's going to be delivered via PDF, or even printed for a meeting. In our role we've been asked to update an existing report that's currently a line item report and is many, many pages long. This report would simply benefit from some groups and summaries. Let's go to Report Builder and redesign this sales order meeting records report. This report is connected to AdventureWorks 2019, which is a popular sample report. And I have a data set here called sales records. And when I expand that, I see all of the fields that are available to me. But if I want to look at the underlying query I can right click and go to query. This lets me look at the different fields that are being used in the actual report. It'll also let me take a look at the relationships, which again, multiple tables means I need relationships. And it shows me the join type, which is inner. Now there are some fields that I need that are not in the data set. I can go right click and go to the data set properties. And this lets me work with the individual fields. I can go in and add a calculation, which I've already done here for order date. Let's go take a look at that function. This actually allows me to format that order date value in a date format that's a short date, which is perfect. I'll go ahead and click okay. Click okay again. This report does provide valuable information, but again, that multiple lines is not effective for the meeting. So we're going to replace it with the matrix. And even though we'll have line items, we'll just have fewer, and it will become more meaningful for the meeting. The matrix is just like a pivot in Excel. It has rows, columns, and summary values. So we want to look at a simple subtotal for the sales people for each product. So we'll go to insert, choose matrix, we'll click on insert matrix and then we can click in the body of our report. We'll drag name for product name to the rows. We'll put the last name of our salesperson in the columns and then we'll use the total due for our summary field. In Report Builder we build in the design view, but to see the data we have to run the report. I'll choose run. Now this report actually shows the high level subtotal for each salesperson across the top, and then also shows each breakdown of each product down the left hand side. We can go to the very last page and see that we went from 3,000 plus pages to 6 pages. Fantastic. Let's go back to the design view. We can make a few adjustments. Want to increase our product there, a little bit wider. Let's go preview it again. Definitely getting a little bit better. We'll go to our page set up. Because it's wide we'll make it landscape. We definitely want to adjust our margins to be smaller. And then when we're ready, we can export our report into various formats. But we can also publish these reports. Paginated reports can provide valuable reporting when your data expands over many pages. And remember it can easily be published, PDF'd, or printed. Creating a dashboards for reporting [Instructor] Dashboards can provide valuable insight into different data scenarios. And the scenarios are created by us, the users. Dashboards can be built where they show key performance indicators. And for us, it's the sales performance in the countries that we're interested in. Here, the size of the dot represents the amount of sales in the country. So if I click on North America, I will see on the left-hand side all the products. And because these headings are sortable, I can interact and sort the total after discount, bringing the highest products ordered to the top. But how does that look in different countries? So for example, if I choose Sweden, will it also be wines? As soon as I click Sweden, I see that the bratwurst takes the lead. I click off my map and brings all of my sales back to the top where I see wines and bratwurst and even peanut butter cups are the top of my sales. I also noticed that I have some formatting issues here, so I think I'll go ahead and address those now. If I go to order details, I can choose total after discount. I'll go ahead and make that two decimal places. Perfect. Okay. So we want to create a dashboard for our sales managers. They need to be able to work through several scenarios of the data. One of the key things they ask for are the order details. I'll go ahead and add a page here. And we'll name it sales orders. I'll start by adding a table and then adding various fields to it. So I'll start with adding the company name of the customer. I also want to bring in from the orders table the order ID. Go ahead and size it out just a little bit, so I can see it populate. I also want to bring in product name. I want to bring in the quantity. Because that's a number field, it automatically tries to sum it up. I'm going to right-click it until it don't summarize. I want to bring in the unit price. I also do not want to summarize it. And then I want to bring in that total. Okay, great. I have all the basic information my sales managers will need. One of the other requests that they had was to be able to see how the sales people handle different customers. Are they just working with one customer? Or are they working with a lot of different customers? We can visualize this data using a stacked bar. So I'll click the stacked bar. I'll bring in the last name from employees. I'll go to customers, make that my legend. And since we're looking at their sales volume, I'll go back and grab that total there. Perfect. Now I can clearly see that there's a nice spread of how we help our customers here. We have several sales people, and they serve several of the same customers. We can see the total of all sales at the bottom of the table. But it would be nice if we could see that across the top. I'll go ahead and decrease the size of my stacked bar. And I'll introduce a card into the mix. I'll bring it up here. I'm not going to size too much. And I'll bring in that total after discount. That information really stands out across the top. Really easy to see it. Okay. So let's see how this interacts. Right now, we see all sales, all order details. There's lots of them. We see our scroll there. But what if I just want to focus on Sergio? I can click Sergio, and now I'm seeing only his records. What about Blankenship? Choose Blankenship. I can see the total for Blankenship, and I can see all of Blankenship sales. So this is just one way these visuals interact with each other to filter other information. Okay, I'll go ahead and click in the corner there and remove that. There are times though that we want to see other filters. Like, what about the year? What about the country? What about the customer? Again, we can see that multiple customers are served by our sales people. So I'll go ahead and create some slicers here. Set that sort back to company. I'll choose my slicer. And go ahead and sort of size it. I'll save that intricate sizing for last. The very first type of filter I want to create onto the dashboard will be the order date. I'll grab my order date hierarchy and drag that into the field list. Now, I don't really need quarter or day, just month and year is fine. And I really don't want to take up the screen space, so I'll go ahead and make this a dropdown. And that will be the same for all of my slicers so that I can be consistent. Okay. I'll go ahead and add another slicer. This slicer will be for last name. Put that into my fields list there. And again, adjust it to a dropdown. If I want to focus on a particular group of customers, then I can actually go add that slicer. I'll put in that company name. Again, that's a healthy list, so I'll make it a dropdown. I'll go ahead and size it just a little bit. And if I want to create a country dropdown list, I can do that as well. Create my last slicer here. Place it where I think it's going to go. Go ahead and do some basic sizing here to fit it all in. Don't want them to overlap. All right, perfect. Now I need to use the ship country. I don't want to use the customer's country, but where they're shipping the information, so I'll drag that ship country to field. All right. And then again, to be consistent, I'll make it a dropdown list. Okay. Now size my card over. All right. Let's watch our dashboard interact. So I want to see 2022 sales. And I only want to look at Brazil. Okay. Let's look at Germany. Excellent. I'm seeing valuable insight. Imagine that we've gotten all this data collected, and we need to use it, maybe we need to email it to these particular customers. Let me show you a very valuable feature. Now that I have these filters set, I can actually export this data. This actually creates a spreadsheet of my filtered scenario. This provides a ton of value for the user. Exporting the data out in this way provides valuable access to information. The sales managers merely need to create the scenarios. Then they can work with this data or even copy and paste it to use it into emails to share information with customers when they're not going to ever have access to our dashboards. Dashboards are amazing. And when designed effectively, they can provide a lot of value. Remember, effective being the keyword. When we take data from reading it line by line on multiple pages to being interactive, we're giving people the ability to question the data, create different scenarios and then also providing actionable insight. Gathering requirements for visualizations We have all heard the stories where the entrepreneur designs their life-changing app on a napkin and then moves on to greatness. Well, guess what? You can apply the same approach to your visuals. Maybe it's not a napkin, but I can tell you from my experience, even starting with your customer and a napkin is better than guessing at the visual representation of the data on your own. People never know what they want in a dashboard or report until they can see what you see, and if that's all in your head, well no one is a mind reader. The best way to express your ideas is to create a mockup of the dashboard. Just lay out different objects, like a table, matrix or stack chart, add a few filters on the image, this will help everyone get on the same page about the design. And if it's multiple pages with navigation, then wireframing helps communicate the navigation of information before you build it. Wireframing allows you to build out a skeleton of the pages, it doesn't have to be designed with all the colors and final graphics, it's just a sketch. The mockup might have a little more visual styling than the wireframe, but even just a few minutes of investing time into these together will reduce tons of back and forth on the design process. There are many ways to produce mockups and wireframes, we can thank all the software developers and UX designers for these sets of tools. If you are newer, it might be hard to visualize the visuals needed because you may still be trying to determine the right visual for the data. You can look for inspiration through samples that you can find available in the software, like Power BI has a whole set of dashboards you can play around with to get started. In addition to getting on the same page about the look of the dashboard, we must consider other requirements. Be sure you're documenting these in every meeting and then following up with notes to all stakeholders afterwards. A few items to always address are, what type of filters do we need on the data? That way you're not bringing in more than what's needed. Example would be a 100 year old company doesn't need 100 years of data in the dashboard. I would call this a hard filter, that's because you handle this type of filter at the data level. What type of filters are needed for the consumer? Which is the user of the dashboard. What might they search and filter? These are soft filters and they're meant to be interactive. Common filters might be years and dates. If it's dedicated to products, it will likely have a product filter. And if it's dedicated to customers, it will have some customer filters. Never fail to find out who this dashboard actually is for, and also determine if they have the permissions to the data and the correct licensing to use the dashboard. Visualization is as much an art as it is a science, and these requirements are pretty standard to every type of visualization project. And you'll discover there are many more, but if you start with these, you'll be designing better dashboards right from the beginning. Presenting data challenges effectively to others There are some moments in meetings where I realize no matter how simple I make what I say next, all of the sudden people are going to be staring through me, trying to figure out what in the world I'm talking about. That's tough. It shouldn't be like that in every meeting. And if it is that way for you every time you talk about data, then you need to focus on communication skills. Again, a very important soft skill for a data professional. I find that talking to leadership through the process can really help their understanding of what we're working with on a data project. Here's an example. The data team has been tasked with studying a scenario that will have a major impact on the organization, and it's imperative that we get this right. It's a high stakes project. They've provided us all the access to the data, all the questions they need answered, and we have our approach and we're ready to go. In the first several passes of the project, we realize one of the key pieces of information that we need for the study has not been collected consistently. And there appears to be major gaps in the time for the data we do have, and it really makes us question if we can trust the data. What do you do facing this scenario? I can tell you very easily what not to do. Do not wait to communicate the challenges and make sure you're prepared to discuss them. Here are a few ways you can address this situation. Be sure to let the right person on the team know what data appears to be missing. People make mistakes. It could have been a bad file or even a missing file. Also communicate about what you see in the data you do have. This gives you an opportunity to confirm that they understand about the gaps you're finding in the data, this way there's no big surprise. And by the way, they may have a very sound reason for those gaps. You may just not know about it. This is part of the learning curve of any new data set. There are other scenarios. The organization is hoping that the data team will be able to show something very positive with the data, and you found the exact opposite to be true. This is truly a challenging scenario and not a fun one to face. So what do you do? When I find myself in this situation where there is a totally different understanding of the data reality versus the actual reality, I start by confirming that I'm not missing something. I double check everything. I confirm that I've not introduced an error in any way. If I find that this is the truth of the data, then I turn to the person in leadership and discuss my findings to get further insight into what I may be missing and get guidance from them on the next steps for me to take. Remember, we don't have access to all the data or even all the knowledge. Turning to your leadership is the legitimate next step. If you discover no errors, you have done all that you can have and the truth isn't going to be exactly what they planned. Having some communication skills on how to deliver information might be your next step. Remember, data is used to inform a business for improvement and sometimes delivering the results can be hard. As a data professional, just make sure you have thoroughly checked all your results, follow the chain of command of information, and by all means, communicate with your team. Finalizing dashboards [Instructor] Visualization tools give us so many features, including some that we really need to pay attention to, like automatically creating titles and built-in tool tips. These features are so nice, but they don't always really make sense to the users that are not involved in the back end of the data. Changing titles should be an overall part of your process. And when you're ready to finalize your dashboard, it should be one of the final things you check. You can change them at any time, but you certainly want to make time for it. Let's look at our SalesManagerDashboard here. There's definitely a few titles we can change to make things more meaningful. For example, we have TotalAfterDiscount by Last Name and Company Name, and really what this does is shows each salesperson and the total for each of their customers. Also, there's a couple of other little things that are not too meaningful, like this company name in the legend. It's really small, and there's a lot of different customers here. Okay, so I'll choose that option. I'll go to my format Visual, and I can look at the Y and the X axis. So first of all, I'll turn that Title off the Y axis. And you'll notice that the last name here on the left disappears. I'll go to the X axis and I'll turn the Title off here and it will disappear from the bottom. And then I really don't think I need a legend for this. There's other ways I can work with that information. So I'll turn the Legend off. Okay, now I'll go to General and I'll go to Title. Right now, the Title is turned on, but it's not really meaningful. So let's do Total By Salesperson For Each Customer. And I'll go ahead and center align this. Perfect. I'm going to bring it down just a little bit. And then I have my card up top. Let's go ahead and expand that. If I make it just a little bit bigger, I can see that it has a TotalAfterDiscount. Okay, that's called its category label. I've got that selected. I'll go to its format and I'll turn off that Category label. Okay, I'll work with this call out value. First of all, it's really big. So I'll go ahead and make it a size 30, make it a little bit smaller. But I want to change the way it displays, like I want the whole number there. I can go ahead and choose that. I'll go ahead and leave it for Auto 'cause these numbers get large when I remove the filters. And I'll go to General and then I can turn on its Title and have to supply that title. And we'll do Total Sales here. And again, we can make that just a tad bit bigger, and then let's center it. Okay, perfect. Now there's no question that that's the Total Sales and then underneath that is the Total By Salesperson For Each Customer. Okay, also notice that we have different slicers across the top. This is the perfect opportunity to provide some instructions. So I'll go ahead and click on Year and Month, I'll go to the format, I'll go to my Slicer settings. I want to leave it as Multi-select because I want people to be able to select multiple criteria, but I also like the Select all option, so I'll turn that on. Can take a look at the header. And then notice the title text. Here, I can change this to read Select Year and Month. Just the word select tells people, hey, this is something I can select. I'll go to a Last Name. Because I was on that area, it will automatically update. Okay, and I can do Salesperson. Company name is fine, but I'll go ahead and put Select Company Name. Now, I want to be consistent. So I'll go back to my Salesperson and tell it to be Select Salesperson. Looks much better. And then I'll go to my ShipCountry, and I'll change to Select Shipping Country. Now because we have two countries, the country that the customers' in and the shipping country, it is probably important to specify. Now, these changes are minimal, but they've already made a big difference. Okay, let's go here to our table here. Let's go to General. It's Title is turned off, so let's turn it on, and let's call those Sales Order Records. Fantastic. Adding dashboard filters [Narrator] One thing I've noticed is that we have these filters. Let me go ahead and clear them. And I'll clear this country filter. The dashboard, when it opens, it's actually going to look like this. And if people start to make changes we might want to give them the ability to go back to this original view. We can do this by adding a bookmark, I'll choose add bookmark. And I'm going to rename that as clear. And let me show you how this works. So I'll go ahead and choose Sergio and it updates to show me Sergio's sales, just perfect. And then if I choose the bookmark, it clears it back. If I go select 2021 I choose control select on these sales people. And then I choose clear, I go back to the original state. This is really, really great. This could be very handy for your end users, gives them the ability to clear all their filters and go back to the original state, but they may not know how to navigate to bookmarks. Let's go add a button onto our dashboard. I'll go to buttons. I'll go ahead and add a blank bookmark. I'll go ahead and move it over here to the right. Okay, I want to change it to a pill shape. So it'll look more like a button. I'll go to my style settings here and I'll turn on the text. Need the text to say clear filters and don't need the icon. Let's go back to that text and make it centered. Let's go ahead and make it black. We'll go to our style here and let's turn the fill of the button on and let's make that sort of a darker gray color. Perfect. So I have my clear filters button created. Now I need to apply my action. I'll tell it because I chose a bookmark to go to the clear bookmark. Okay, let's go ahead and close our bookmark pane or format pain. What I'll do now is I'll go ahead and select a few of my sales people. I'm holding my control key. I'll go ahead and say, let's see for 2022. And then what I need to do is clear my filters. I can just control, click, and I go back to the original state and notice all my filters are cleared. Modifying dashboard tooltips [Instructor] Let's hover over some of the information in our stacked bar. One thing I want you to notice is that we have some pre-built tool tips, which is great. Gives us a lot of information but it may not be all the information we'd like to have. So let's go ahead and choose that stacked bar. Let's go back to our visualizations and take a look at tool tips. By default, it'll bring in the tool tips based on what information has been supplied to the visual. This is why we see last name, company name, and total after discount. Okay, let's just go ahead and name that total after discount to total amount. And let's change this last name here to salesperson. Last name is all preference if you do that. I think we'll just do salesperson. Let me bring quantity to the tool tips. Now I do want to see a total quantity. Okay, so I'll go ahead and make sure that's set to sum, which is perfect. And then I want to count how many orders they actually placed. And I want to do a distinct count of the order ID. And I'll name this total order count. Now when I hover over, I can see the salesperson, the company name, the total amount, the quantity of what was ordered and the total order count. Okay, really don't need that quantity because again, that's related to each individual line item so I'll go ahead and take that out. I'll go ahead and put this total after discount in again. And I want to change that to an average. And then I'll do average of order amounts. Perfect. This gives me a lot of information just by simply changing a few things in the tool tips and naming things appropriately. One last thing as you finalize your dashboard is sometimes people want to have a little bit more background, different formats. I want to change it up to look a little bit more than just solid white. When you're in Power BI, you can actually go in and change a lot. For example, go to view. Let's go change this dashboard to a dark background. There are several different option here for you to choose from. You can just point and click until you find the one you like. You can also create your own custom themes. Okay, let's do that black background, like that dark background. One final step is the mobile layout. I'm in the page view, I'll go to mobile layout. This is how this Power BI dashboard will look on this page when people visit it. I'm going to go ahead and bring my card to the top. And again, it's a responsive design. So even if these look big, they'll work themselves out. I'll go ahead and move my stacked bar here. And then I'll bring in my sales orders. That way, if someone consumes this dashboard, this is how it'll look in that mobile environment. Okay, very last thing we want to do is bring in the filters. Want those to be at the top. And again, I'll just keep sizing. Go ahead and put this one here. I'll do two slicers per section. So now I have the mobile layout covered as well as the page view. Okay, let me go ahead and go out of the mobile layout. Check and make sure everything is labeled. Also check and make sure things are functional, so I'll go ahead and click on Sergio, Jeffers. Perfect. And then I'll choose to clear my filters and hold my Control key and clear my filters. Now I'm ready to save and publish my dashboard. There are certainly more items that can be adjusted and tweaked with these dashboards, but at a minimum, this is a great start. Data workers If you use spreadsheets every day and you create valuable insights for people through various presentations or reporting, you are a data worker. But you're not likely called that by your job title. You may have a job title that represents a department or the people you support, but you're not titled data worker. You just are one. I would also consider you a data worker if you find yourself exporting data out of systems, building some form of report or presentation weekly or monthly. You may also receive data from someone in another department, like IT, who has access to more data than you. You may frequently visit the company's data warehouse, or data system, to gain information for your reporting purposes. Data workers also work with functions and do some aggregate functions with the data. You may use some logical functions like an if. You're able to search for functions and find the ones that are relevant to your data work, you are likely a data worker. I believe that there are far more data workers than our organization realize, and if you're in this role, guess what? You're a great resource. And one of the first places an organization can turn to, to upscale and data. If you're looking for areas of growth, then make sure you're using tools in Excel like Power Query and other analysis techniques like PivotTables and basic visualizations. If you have more than average skill with these, you might be more than a data worker already. You can also build skills like PowerPoint because this is another way we visualize data for meetings and presentations. Documentation is a critical competency for any data role, so being a wizard at Microsoft Word doesn't hurt. Remember, like every other tool, it's powerful and often because we use it every day, we don't believe we need to explore training. Trust me, you should. For the soft skills, you'll want to focus on effective presentations and communication skills. Having these skills make you more than suitable for roles that require advanced skills in Excel and doing basic analysis. Data analysts I have spent years trying to define data analysts to people. I've come up with several ways to try to define this role and the skills. It's important to know that mos, do not have a job title that contain the words data analyst, but if you have a data department, then they are likely to be called data analyst. Not all organizations have a dedicated data department. So you might be called an operations analyst or a marketing analyst. Your title likely has analyst in the title. There are also varying levels of data analyst, and you can be a data analyst, and not know it. Or be performing the skills of an analyst, and have no idea that you are. A data analyst will have a deeper understanding of data systems and have more knowledge about database designs than a data worker. A data analyst will find they have a little more access to see tables, and views of the databases. They probably have some basic SQL querying skills and may write SQL statements to gain access to data all the time. This varies by organization, and access levels. A data analyst will have a better than average understanding of the data governance plan because if you're a data analyst, you are going to be working under the policies, and procedures that are established. Data analysts that are a few years in are likely to understand more about what questions to ask, and research in general. Data analysts understand how to clean data, and transform it to meet the requirements of the project. Data analysts also know how to create functions of varying types like conditional statements, logical statements. Data analysts work with statistics, and most certainly at the beginning of their career, basic stats and aggregate functions, and certainly have learned how to connect data in a way that they can just refresh their data, and update their visuals and reports. If you're looking for areas of growth, then you can go a little bit deeper into statistics. It's a must. Note that I said a little deeper, not a full statistician, which is another role entirely. You'll find the data sets you are developing might be used for different statistical tests. So it is important to have a basic knowledge. You can never have enough experience writing functions, and you definitely want to be able to write if functions, aggregate functions and simple lookups. You must understand joins, and how they impact data sets. And for the soft skills, active listening, data storytelling, and critical thinking. If you're realizing that you're a data analyst, then you might relate to being called a wizard at work. Data engineers It is one thing to refine and add to a data set. It's an entirely different skill to be able to build data sets. I personally believe what most people consider, as a data analyst in their organization, may be performing data engineering tasks more than analysis tasks. The crossover between the analysts and the engineering and skills are real. They share a lot of common foundational skills. A data engineer is someone who fully understands how to look at the data sets, knows how to refine them into smaller more sensible sets for people to use. You may receive data from someone who is engineering that data from a set of queries, and then providing it to you or others. A data engineer also is likely to have more access to data, which is why they're sending it to you in the first place. They also understand security and privacy of data through the overall data governance strategy. Data engineers can transition to data architect, which covers more systems, more server and more security strategies for systems across all of the organization. If you want to grow further in this role, you will certainly need to understand more about structured and unstructured data and how to convert it to usable data sets. You'll want to understand the design methodologies of relational database systems and you will need to understand how to design databases. You'll also want the shared skills of communication, effective presentations, critical thinking, and active listening. These skills will be used to learn how to take hundreds of tables to define them into usable tables for other processes using ETL or ELT, which is extract, transform, and load or extract, load, and transform. This is how data goes from a production system to a data warehouse, as an example. I believe there is a lot of opportunity for data analysts to pursue this role as they grow deeper in their understanding of data and infrastructure. Data scientists People often pursue data with the hopes of becoming a data scientist. And I believe it's important to know that not all data professionals grow into data scientists, nor do we need all analysts or engineers to turn into data scientists. Data scientists will likely have all the skills of the analyst engineer and they will have likely worked in those roles. However, a data scientist will have a heavier requirement for skills in coding, mathematics, and statistics. A data scientist will be instrumental in developing tools and instruments that provide valuable insight to the organization, but they can't do it alone without all the other roles, or well, maybe they can perform the task, but when you don't have all the other roles, the data scientists must perform them. Data scientists or data science teams comprised of all the disciplines will interpret large sets. They'll likely build machine learning models. They'll present outcomes and make suggestions as a portion of what they do. They'll likely be leaders in the data science team. They'll provide support and strategy to the overall data governance plan. If you want to further your skills in this area, you should consider gaining a better understanding of programmatic thinking. You'll want to dive deeper into learning code and maybe start with something like Python. If you have some stats experience, or not, you will definitely want to grow in this area. Remember, one of the key differences between data scientists and all other roles is heavier math, coding, and stats. It's also important to remember that for most organizations, having a data scientist and not having all the other roles means that that data scientist is having to perform all those roles before they get to the data science. This is where having a team of multi-discipline people serving all the roles might just be your next play.

Data Analytics Career Skills: Roles, Fluency, Governance

Related documents

Products

Support

Data Analytics Career Skills: Roles, Fluency, Governance

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib