IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Welcome Introduction Welcome to the IMA Data Analytics & Visualization Fundamentals Certificate course. We are at a dynamic time in the profession as new technologies are emerging and creating a lasting impact on the way we do business. The Digital Age is upon us, and bringing with it are challenges, but also opportunities. As finance and accounting professionals, we are in a unique position to harness the power of these technological advancements and expand our roles of value steward into creating value. When you're ready click to get started. Course Overview This course is based on IMA’s thought leadership found in its research and publications, along with contributions from various leaders across the organization. You’ll hear straight from industry experts who share insights into their proficiency and perspectives on topics such as the future of the profession, emerging technologies, and data analytics and visualization. Through four modules, you’ll be introduced to the impact of technology and analytics on the management accounting profession and delve into understanding and applying data analytics and visualization through a case-based scenario. The four modules are: Module 1 - Becoming Data-Driven will set the stage by introducing some of the changes that are happening in finance and accounting, how business processes will be impacted, and what skills you will need to succeed. You’ll also be introduced to concepts of data science, data analytics, and data governance as an overall foundation of the course. Module 2 - Visualizing the Present & Predicting the Future will discuss how to harness the power of data through analytics and effectively communicate the data with visualizations. All of this will be depicted using a fictitious case scenario that describes a dilemma in which data analytics and Page | 1 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script effective data visualization is necessary to resolve it. Throughout the module, as the case progresses, a video series will elaborate on key visualization concepts and techniques. In Module 3 - Applying Data Analytics and Visualization, you’ll learn how to apply some data analytics and visualization concepts covered in prior modules to solve issues presented in the case-study scenario. After that, you’ll be in the driver’s seat as you take your newly acquired knowledge on a test drive. You’ll navigate through various exercises relating to the scenario described in the Huskie Motor case study and apply the concepts to solve problems. This practical approach to data analytics and visualization will prepare you for applying this knowledge to your own organization. In Module 4 - Conclusion and Final Assessment, we’ll wrap up the course with a summary and provide you with some valuable resources that will enable you to expand your learnings and dig deeper into these topics. You will then be presented with a final assessment that will test your knowledge of all topics discussed throughout the course. Data analytics can be performed using a multitude of software available in the market. In this course, you’ll be introduced to many of these software tools. However, in order to perform the requisite analytics, you are required to have access to Microsoft Excel. It will be necessary for you to have working knowledge of basic functions, including pivot tables, in Microsoft Excel. Acknowledgements IMA would like to give thanks and recognition to Ann Dzuranin, Dean’s Distinguished Professor of Analytics at Northern Illinois University and co-author of Huskie Motor Corporation: Visualizing the Present and Predicting the Future case study, and Daniel Smith, the head of innovation and founder of TheoryLane LLC, for their expertise and contributions to this course. IMA would also like to acknowledge the other valued speakers and authors of the publications and resources used in this course. For more information on all of the contributors, refer to the biographies document within the Resources link above. Course Requirements Knowledge check questions are included throughout each module. The purpose of these knowledge check questions is to assess your grasp of the material. You’ll have two attempts for each question. You must complete all knowledge check questions before taking the final assessment. Page | 2 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script To earn your Data Analytics and Data Visualization Fundamentals Certificate and NASBA CPE credit, you are required to pass the Final Assessment in Module 4 with a 70% or higher score, which is based on content from all preceding modules. After passing the final assessment, you can access and print your professional certificate, NASBA CPE Certificate and digital badge. The digital badge can be uploaded to your LinkedIn profile as well as shared on social media. You have one year from the time you enroll in the course to complete it. You do not have to complete the course in one sitting. If you exit the course before you finish, your current location will be remembered, and you can pick up from where you left off when you return to the course. Course Navigation You can use the Main Menu as your guide as you navigate through the course. You may use the Script tab to view the full script to follow along with the narration. This course is designed to allow you to work at your own pace, either in one sitting, or by completing some of the course, returning later and continuing right from where you left off. By clicking on the three horizontal line icon in the upper left corner of the screen, you will be able to open and close the course outline. Here, you may jump to any aspect of the course outline by clicking on it. To close this outline, simply click the icon again. On the bottom right-hand corner of your screen, there are two arrows. In this course, we’ll refer to the arrow pointing to the right as the “next” button, and the arrow pointing to the left of the “back” button. Now, click the next button to continue. Course Resources The Resources link, located to the top left of the screen, includes many of the materials you will need for this course including: A course script The course contributors’ bios Spreadsheets to download And other helpful materials You may print the course script provided so that you can take notes as you proceed through the Page | 3 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script course. The spreadsheets found in the Resources link are to be used to complete and answer the exercises in Module 3. You’ll need to download these files before you begin Module 3. When you’re ready, click the NEXT button to learn more about this course. Course Learning Objectives Let’s take a moment to review the learning objectives for the I M A Data Analytics & Visualization Fundamentals CertificateTM course. At the conclusion of the course, you should be able to: Recognize the impact of technology and analytics on the accounting profession. Demonstrate how data analytics can influence organizational strategy. Identify ways data visualizations effectively enable appropriate business decisions These are the overall learning objectives for this course. Each module will begin with a set of unique goals that align with these overall course learning objectives. Course Roadmap As we previously outlined, this course consists of four modules. The beginning of each module starts with the module goals. These goals outline the topics that will be discussed in that module. Each module is divided into lessons. You will see a menu screen as you begin each lesson. The menu screen will list the sections that are included in that lesson. We highly recommend that you go through this course in sequential order for an optimal learning experience. Click Module 1 to proceed to Module 1 Lesson 1. Module 1: Becoming Data‐Driven Introduction Welcome to Module 1, Becoming Data-Driven! This module will introduce changes to the accounting and finance profession that are on the rise caused by advancements in technology, and discuss the potential lasting impact of these emerging technologies. When you're ready click to get started. Page | 4 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Module 1 Goals Upon completing Module 1, learners will be able to: Recognize the impact of technological advancements on the finance and accounting profession. Describe the typical life cycle of data. Explain the impact data governance has on business and its stakeholders. Define types of data analytics and how each progressively creates value within an organization. Module Menu Module 1 consists of two lessons, Lesson 1: Changes to the Profession and Lesson 2: Making Sense of Data. After completing the lessons, we’ll wrap up the module with some concluding thoughts. Click Lesson 1 to proceed. Lesson 1: Changes to the Profession Section 1: Technology is Changing As technology transforms the management accounting profession at an ever-increasing rate, are you concerned about having the necessary skills to further your career? Do you want to enhance your skill set but don’t know where to start? Through this lesson, we will define the essential knowledge and skills needed to adapt to the modern workplace and set the stage for how you will be able to develop these skills. Video: The Future of the Profession Now, let’s hear from IMA’s President and CEO, Jeff Thomson as he discusses his thoughts on the future of the finance and accounting profession. Video Transcript (Est. time: 13:28 min): Hello, I’m Jeff Thomson, president and CEO of IMA, the Institute of Management Accountants. And I’m proud and honored to be speaking with you about the future of the accountant in business, especially in a digital age. When you think about the challenge of digitalization of the value chain, it really is a challenge, but it’s also an opportunity. So think robotics process Page | 5 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script automation, think automation in general. Think about machine learning, cognitive computing. The challenge of this course is to how to harness all of these technologies to add value. But a challenge also relates specifically to our profession. Will this automation result in fewer jobs, displaced jobs? I would argue that it will result in different types of jobs. Jobs that will add more value of foresight and insight to the organization, but only if we commit to upskilling in areas such as data analytics and data visualization. So let’s talk briefly about the CFO team today and how it’s evolved over time. And as many of you know, I was a CFO in industry during this evolution of the role expansion of the CFO and the CFO team. So in the past, the CFO team had a very, very critical fiduciary responsibility and that was safeguarding of assets, what we would call value stewardship, protecting value, internal controls, accurately and fairly reporting on the financial condition of the corporation. But over time, that role has expanded to include those table stakes plus additional responsibilities in the area of value creation, merger and acquisition activity, financial planning and analysis, and much more. So the evolution has resulted in greater expectations for the CFO and his or her team. In fact, the CFO is often referred to as the chief futures officer or the chief value officer. In many cases, the CFO of large and small companies have responsibility for operations, for strategy, for IT, and for more. So with this expectation, with this challenge comes the opportunity, to add greater influence and greater relevance as individuals and as professionals in accounting. So technology and analytics is sweeping us, and moving at such an incredible pace that you almost can’t keep up with all of the technology, but really we must all embrace technology. Millennials are digital natives. It comes very, very naturally. But even in the middle of our careers, as we see some jobs being lost or displaced, it’s important that we learn new skills and new capabilities, especially in data analytics, data visualization, data governance, and even storytelling. When we’re trying to influence the organization on a business case or a new merger or a new product or service, we have to tell a story. We have to tell a story beyond the numbers that’s futuristic, inspiring, and leads to great outcomes. But the other aspect of learning and embracing technology is not just the Millennials and those mid-career in terms of upskilling and staying current, and harnessing technology, but it’s also tone at the top. It’s critically important that those at the top embrace analytics as the new science of winning in the market, of being a competitive differentiator, to know your consumers even better than they know themselves in terms of needs and wants and desires of future purchases. So it’s very important that senior leaders embrace technology, not necessarily to become experts in programming and coding, but to understand the value that analytics and technology can bring to a differentiated value proposition to support training, courseware, education, and more so, many would say that it’s crunch time for finance and accounting. You know, finance talent Page | 6 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script models are evolving quickly with a premium placed on data scientists, business analysts, and storytellers. Everyone’s talking about it, the big four, Downmarket, McKinsey, Accenture, etcetera, etcetera. And so this is the trend we’re seeing to embrace technology, but to also step up and upscale in the area of analytics so that we can add more foresight and insight. And the softer skills of us as humans-empathy, active listening, professional judgment, professional skepticism-really is not replaceable by a robot. So some key messages, automation’s been around for a long time. What’s different now? Look, we’ve been through multiple industrial revolutions. So why is automation the talk of the town, so to speak, whether you’re a consumer or a professional. Well simple, the reason it’s different this time is because we’ve programmed robots to be much smarter to think, to synthesize, to react, to adapt. However, keep in mind that it’s human beings doing the programming and harnessing the computing and cognitive power of the machine and not vice versa. Second key message, look, we have a choice here. We’re at a turning point in our profession. We could choose to be more relevant and influential or less relevant and influential. Think about it this way. As automation over time replaces or displaces more routine, repetitive tasks, it is a reality that some of our jobs, some of our tasks will go away, especially in auditing and transaction processing. That’s at one end of the spectrum. At the other end of the spectrum, if we say, you know what? This advanced analytics and providing insight and foresight is not what I was trained to do as an accountant. It’s really not for me. I’m going to outsource the analytics and the deeper thinking, the deeper judgmental capabilities to a consultant, to a machine, to whatever. Well, my question to you is what’s left? What’s left of our great profession. So this is a call to action for us to upskill in strategy management, data science, data analytics, and more. It’s a business imperative critical to the future of our profession. So nothing less than our relevance and influence is at stake. And when you think about the rate of change of technology, I mean just two years ago, who knew what blockchain is or was. Today we’re talking about more and more use cases of blockchain in financial services and insurance. And more, five years ago we weren’t even aware of what robotics process automation is all about. And now there’s hundreds of use cases in organizations large and small to create greater operational efficiencies. And then the final point, closer to home as your CEO, CEO of IMA, is we are absolutely committed to being leaders in preparing you for a great future. By the way, that future is now. So I’ve referred to data science several times, and it could be a little bit intimidating. Change is hard. Upskilling is necessary, but data science is really the overlap, the intersection of three broad domains that I’d like to describe to you in turn. The first is business context. You know, whether you are an FPA professional, financial planning and analysis, a statistician, an Page | 7 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script econometrician, regardless of your level of analytical horsepower, you have to understand the business question being asked. What is the problem we’re trying to solve? What new discoveries are we trying to seek? As we mine through data and understand data, what are the opportunities to learn something else to put us at a great competitive position? So business context, understanding how customers behave, understand how value and cash flows in the organization within your industry vertical is very, very important. To ask the right questions and to seek answers in the data as a data explorer, business context, critically important. The second element of data science is the technical modeling, some level of statistics, some level of knowledge of forecasting tools and techniques. And this can be a little bit intimidating and overwhelming, but it also is very manageable. If you focus on the problem at hand and the opportunities to be seized, I’m not suggesting that every finance and accounting professional be a Ph.D. statistician. But I am suggesting that the more data science, the more statistics you know and can apply, the stronger you will be in this brave new world of digitalization. There was actually a book written called exploratory data analysis written by a famous statistician. And that very much is what statistics is all about. It’s not just reporting averages and standard deviations, it’s to kind of seek new possibilities by mining through large sets of data and doing forward-looking analysis. In fact, according to Glassdoor, data scientist is the No. 1-ranked job in terms of satisfaction and entrylevel pay. So we need to infuse more and more of that at a reasonable pace, at a reasonable level into our profession. And then the third component of data science is the actual applications. It could be Excel, advanced analytics and data mining and Excel. It could be open source software like R, visualization, like Tableau, Python, Power BI. So learning these tools and application capabilities is also critically important. I made reference earlier to a book that while it’s over 12 years old is timeless and it’s called Competing on Analytics, The New Science of Winning. It’s one of my favorite books because the title speaks volumes for what we must be doing in a competitive age. When you think about winning in the market, you think about apps and things that face consumers, not things that are kind of in the background but make no mistake about it in a incredibly treacherous competitive environment with geopolitics at play, nontraditional competition analytics is and can be the new science of winning. But you know, analytics should not be equated with information technology. It is the human and organizational aspects of analytical competition that truly differentiates. You know, there is a language barrier though when we think about analytics, data scientists, IT professionals, finance and accounting. We’ve got to get over it. We’ve got to solve it. We’ve got to become data translators. Finance and accounting professionals need to learn and understand more about IT. IT and information systems professionals need to learn and understand more about the language of finance and accounting. Because guess what? At the end of the supply chain, consumers and shareholders are expecting that we’re doing that translation. So there’s plenty of opportunities to upskill in these areas and to speak the language of the machine and speak the language of business and who better than management accountants, CMAs, CFO teams to do that translation to create great, great outcomes. How to prepare for the future of work. Page | 8 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script In summary, look, there is no simple or prescriptive solutions. The environment is moving too quickly and too rapidly. Technology is evolving very, very quickly and rapidly, and we must not only keep pace, we must stay ahead. So as we embrace this new future, which is here today, we need a combination of technological know-how, problem solving, and critical thinking as well as the human skills, the soft skills as perseverance, collaboration, and empathy. We are absolutely committed to preparing you today and in the near and future, whether it’s five years out or 10 years out, that’s our obligation to you. Thank you. Innovation and Change These emerging technologies continue to drive innovation and change throughout the business landscape. While these specific trends can seem abstract, accounting and finance professionals do need to have an understanding of what these technologies are. Arguably more important than any specific technical knowledge, however, is the ability to leverage these innovations to become strategic partners. Section 3: The Changing Role of the Management Accountant As management accountants adapt to new technologies, they are well-suited to increasingly assume the role of facilitator, using the breadth and depth of their knowledge to find opportunities for improvement and craft a shared vision for product stakeholders. Management Accountants' Skills Management accountants already possess most of the skills needed to implement advanced analytics: They have a holistic view of business, they intuitively understand the interrelation between financial and strategic business decisions, and they have strong written and verbal communication skills. When management accountants can combine these skills with technological knowledge and increased aptitude, they have the power to add further value to their organizations. In this course, we’ll explore how to build on these on these skills by understanding some of the current technology trends. These trends lead to the creation of data that can be analyzed for value-added, strategic decision-making. Section 2: Keeping Up With the Trends As a management accountant in the digital age, it’s important to have a fundamental working knowledge of technological trends. Those include robotic process automation, artificial intelligence, and blockchain. Page | 9 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Important Technology Trends Defined Let’s take a moment to define each of these. Robotic Process Automation (RPA) is an advanced and intelligent form of process automation leveraging software tools (or software robots) to effectively and efficiently perform tasks traditionally performed by humans. Artificial Intelligence (AI) is a field of computer science concerned with empowering machines to think, behave, and act like human beings to draw conclusions and make judgements. Examples include speech recognition, machine learning, and natural language processing. Blockchain is a decentralized and distributed ledger of encrypted information. Uses for blockchain include cryptocurrencies, smart contracts, and transparent proof of work. Management accountants can leverage emerging technologies such as AI and RPA to streamline processes and decrease the amount of time spent on repetitive tasks. They can instead focus their time on strategy and decision making. Leveraging Technology For Leadership Once accounting and finance professionals are armed with the knowledge of which technology trends are currently generating attention and attracting investment, there are several approaches and tactics they should embrace as they leverage technology to elevate themselves as strategic thinkers and leaders. Click each icon to learn more. Communicate Implications Be able to communicate the implications of technology for the organization and its strategic priorities. Not every new technology platform or tool is going to be equally helpful for every organization. Regardless of whether it’s RPA, AI, or some other emerging technology, every tool has its own set of pros and cons. Taking an objective perspective on technology and articulating the pros and cons of these tools are critical steps for effective leaders to take in an increasingly digital world. Link Stakeholders Link different stakeholders across the company into the conversation. One of the most common pitfalls that can trip up even the most sophisticated and well-prepared organizations is when important dialogue occurs in silos. Discussing technology tools and options with only the accounting and finance team isn’t going to drive change and might very well lead to project failure. True leaders and strategic management teams understand that involving as many participants from different departments as possible creates a more diverse team, better solutions, and more robust use cases. Page | 10 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Identify and Present Easy Wins Identify and present easy wins. Having a conceptual understanding of, and a vision for, how technologies will eventually transform a profession, subset, or field is a great start but doesn’t usually generate actionable ideas for professionals to implement. After communicating the potential of a technology or process and linking in a variety of stakeholder groups, presenting goals that are achievable in the short term and documenting first steps toward achieving them will get the ball rolling within the organization, increase the likelihood of stakeholder buy-in, and help to create enthusiasm for future projects that are likely to be more complicated. Section 4: The Big Picture of Data The best way to begin is to learn the terminology surrounding data. The first term is “architecture,” and just like when designing a building, the architecture is the overall structure of whatever it is we’re trying to create. First term is “architecture,” and just like when designing a building, the architecture is the overall structure of whatever it is we’re trying to create. The next term is “solutions” because the only 100% consistent aspect of any analytics project is that we are trying to solve some problem or answer a question. “Solution architecture” may take many forms from a simple report performing calculations once a week to an automated artificial intelligence (AI) ultra-high-frequency trading application. “Patterns” are templates or guidelines that solution architects will reuse frequently. Architect a Solution In our new data-focused terminology, we hope to architect a solution using an established pattern. There are different types of solutions that may use different patterns and/or expertise. At the highest level (a.k.a. “least detailed” or “10,000-foot view”), we typically group solutions into domains. Just as a building architecture may require domain solutions in landscaping, interior design, plumbing, and more, an analytics solution architecture may include domains such as data storage, data creation, and different business specializations. To architect effective analytics solutions, we require knowledge and skills in multiple domains of business data. High‐Level Domain Design Pattern Next, we use the following high-level domain design pattern: data creation, data storage, and business domain(s). The business domain may include a single business domain, such Page | 11 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script as marketing, or many, such as marketing, supply chain, and finance. Data Governance, a set of processes and policies that help an organization manage and secure its data, which we will discuss in later lessons, is an integral part of every domain. Section 6: Data Driven Culture There are four essential elements in establishing a data-driven organization. These include datasavvy people, quality data, appropriate tools, and processes and incentives that support analytical decision making. Organizations attempting to adopt leading-edge analytics often face challenges in each of these dimensions. The result is an inability to effectively support managerial decision making through the use of analytical technologies. Much of the focus on implementation of advanced analytics has been on the tangible elements of a successful data-driven organization (people, data, and tools). Less attention has been paid to the fourth factor -- organizational intent. An organization committed to the goal of being datadriven will work to develop the people, data, and tools needed to accomplish that objective. Resolve to Be Data‐Driven Becoming a data-driven organization requires creating structures, processes, and incentives to support analytical decision making. It requires the organization to resolve to be data-driven and define what it hopes to accomplish through the use of Big Data and analytics. The top leadership of the organization needs to describe how analytics will shape the business’s performance. Six Factors For a Data‐Driven Culture While many organizations are striving to implement a data-driven culture, success isn’t assured. Achieving this goal requires that certain elements be present. There are six key factors for successfully establishing a data-driven organizational culture. Click each icon to learn more. Having the right tone at the top. Setting the right tone at the top is critical for most organizational initiatives, and this includes developing a data-driven culture. In most organizations, executives are championing the use of leading-edge analytics, although in some companies the initiative is being led from the bottom up, with various departments being first to embrace it. In addition to championing the use of leading-edge analytics, executives in a data-driven Page | 12 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script organization must consider ethics in the context of technology and analytics. While ethics is not covered in this course, it is important for the executive team to incorporate ethics policies for managing technologies. Having strategies for the effective use of technology. The ability to use leading-edge analytic techniques effectively is important for a variety of reasons. Companies whose decision making is reactive to the competition are less likely to have developed strategies for the effective use of techniques and technologies. Being reactive instead of proactive implies that these organizations lack the ability to predict trends or to turn customer data into useful insights that can be used to enhance the organization’s business. Having a commitment to collecting and using data from both internal and external sources to support analytics efforts. To harness the potential of leading-edge analytics, organizations need to utilize a wide variety of data sources. This is especially true when it comes to strategy development and execution. In this regard, about half of organizations use data from both internal and external sources. Of concern is that the other half of organizations are only using internal data, only using data to validate strategy post-execution, or (in a few cases) not using data at all! Using a wide variety of data sources yields better insights. Organizations that truly want to derive value from their data must be comfortable with complexity and remain flexible enough to respond to what the data tells them. Using both monetary and nonmonetary rewards to promote analytical decision making. Slightly more than half of organizations use incentives to promote analytical decision making. These can be monetary, nonmonetary, or both. Yet nearly half of organizations aren’t doing so. This may be a mistake: The use of incentives is key to conveying the importance of developing enhanced analytics capabilities throughout an organization. Those that do believe in the importance of developing such capabilities are more likely to create the appropriate culture by providing incentives to their employees. Having a willingness to adequately provide resources to the analytics efforts. Organizations often are facing resource challenges concerning the development of enhanced analytics capabilities. By far, the most frequently cited challenge is the ability to find staff with the necessary skill set. The next most common resource challenge is budget. A third challenge, related to the previous two, is a lack of staffing resources and competing priorities. Clearly, these four essential elements needed for companies to develop advanced analytical capabilities are Page | 13 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script interrelated: data- savvy people, quality data, state-of-the-art tools, and organizational intent. Alignment of analytics efforts throughout the organization. Responsibility for analytics can reside in various parts of an organization. It has been argued that CFOs should “own” analytics as they are regarded as impartial “guardians of the truth.” Most companies seem to agree, with finance being an owner (although often not the sole owner) of analytics. Other popular options include analytics being owned by IT, a dedicated analytics group, or operations, or having each department independently maintaining its own analytics capabilities. Of course, these options aren’t mutually exclusive, with a variety of possible combinations, the most popular being analytics jointly owned by finance and IT. The Benefits are Clear The benefits of implementing a data-driven culture are clear. Organizations possessing such cultures more effectively perform key business processes such as strategy formulation and performance evaluation. In implementing such a culture, establishing processes and incentives that support analytical decision making (i.e., organizational intent) is critical. When deciding to venture along the path of implementing leading-edge analytics, evaluate the extent to which the six factors discussed are present in your organization. By ensuring that they are, you can improve the chances of successful implementation and achieving the competitive benefits that come with being data driven. Section 5: Business Domain Pattern What does the interrelation of data creation, storage, and business domains look like? Consider a traditional business selling widgets directly to customers. Data is created by customer activity such as purchasing widgets or browsing websites. Data integration moves data generating application and into data storage. Raw data transactions are cleaned and shaped with other data into an aggregate data store. Aggregate data is used for descriptive, diagnostic, predictive, and prescriptive analytics by various business domains. The business domains use the analysis to make business decisions on how to sell more widgets. Business activity resulting from business decisions influences customer activity. The entire point of this cycle is for the business to better influence the customer. This could be through product design, marketing, support, warranty, shipping, inventory, etc. The business exists to make decisions that result in value to its customers translated into profit. Therefore, the Page | 14 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script goal of an analytics solution architect is to accelerate the cycle of customer activity to business activity. https://sfmagazine.com/post-entry/january-2019-leverage-technology-to-become-a-strategicpartner/ Data Science This is data science: the intersection of data, analytics, and business decision making. The current state of data science is to accelerate the data-to-decision process. The Management Accountants’ understanding of data science better facilitates the data-todecision process for their organizations. In this course, we will focus on the Statistics & Analytics component of data science. Enhancing Analytical Capabilities Is Critical to Success The Digital Age is upon us, and it brings with it challenges and opportunities for businesses. Management accountants have the opportunity, and need, to develop their data and technological skills so they can use advanced analytics and glean new insights from their data for their organizations. The collection, assessment, interpretation, and use of data are enabling companies to create new business models and make existing ones more efficient. Most organizations now believe that enhancing their digital and analytical capabilities is critical to their continued success and survival and are leaning on management accountants to take the lead in facilitating this strategic initiative. As management accountants develop their skills and complete their repertoire of strategic business competencies, they can share their holistic view of business and the interrelation between financial and strategic business decisions to communicate the importance of creating and establishing a data-driven organization. Lesson 2: Making Sense of Data Section 1: Introduction to Data Big Data is top of mind for many finance leaders. The question is: How can we leverage all Page | 15 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script this data to drive business success? How can we can turn data into meaningful action? The progression includes identifying key information from the data, turning that information into knowledge, creating valuable insight from that knowledge, and taking action. In this lesson, we will further define the progression and life cycle of data and introduce how and where data analytics fits in. Phases of the Data Life Cycle Data is an essential part of any enterprise. An enterprise that’s agile and innovative requires an understanding of data as it flows through the organization, interacts within various departments, and transforms itself. Though an international standard for the data life cycle doesn’t exist, the following phases, in order, are identified as typical during data life cycle management. Click each image in the order shown to learn more. Capture. Business is surrounded by data, but an enterprise needs to capture it in order to make use of it. Data capture occurs in three major distinct ways: · Data Entry. Manual or automated entry of data into the data warehouse to create new data values. · Data Acquisition. Acquiring or transferring data from an already existing data source or data warehouse. · Connected Devices. Internet of Things (or IoT), or the interconnection of computing devices enabling them to send and receive data, has will continue to transform the way data is captured by making it real time and continuous as devices listen to and interact with the environment and each other. These devices capture and transmit the data so it can be stored. Qualify. Have you ever wondered why month-/year-end close processes are prone to errors or why reconciliations take such a long time? Inaccurate or incomplete data may lead to major problems later in the data life cycle. These problems may include critical business processes being held up, bad decision making, or final reports running afoul of compliance because of erroneous data values. In this phase, data is assessed for its quality and completeness using a set of predefined rules. The Capture and Qualify phases are traditionally seen as under the purview of the IT team, which sets up the system architecture. But management accountants, with their knowledge of accounting processes and the way data will be utilized in later phases, have the ability and responsibility to envision the framework of the system in partnership with IT. Page | 16 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Transform. The advent of Big Data has led to enterprises being able to capture a seemingly infinite amount of data. Couple this situation with IoT, and soon you can be drowning in data. Thus, enterprises have to transform, synthesize, and simplify the data so it can be utilized by functional departments. This phase is commonly called “analytical modeling” in the financial world. A certain level of functional expertise is required at this phase as data from different sources is linked together to find the intrinsic value hidden beneath. Utilize. The final aim of data is to help enterprises make good business decisions, and, in some cases, data itself is the final product or service of the enterprise. Either way, the true value of data is unlocked in this phase, and the previous efforts made in data capture, qualification, and transformation finally bear fruit. Management accountants act as business partners during the Transform and Utilize phases. As business partners, we need to translate the data values into business stories that help enterprise leadership understand the magnitude of their decisions and their long-term impact. Report. This phase relates to external reporting. Internal management reporting for decision making is realized during the Utilize phase in the data life cycle. External reporting could involve quarterly/yearly financial reports, financial data sent to other vendors for bids, and other compliance reports. Reporting of data is a key phase of the data life cycle that is ripe for automation. Since rules, definitions, and requirements for these reports either rarely change or have slight changes year after year, automated processes designed by management accountants help to create and publish these reports in a more efficient manner. Archive. This is the beginning of the end for the data that the enterprise has spent a considerable amount of time and resources on to unlock its value. Data archiving is the transfer of data from an active stage to a passive stage so that it can be retrieved and reutilized as needed. Purge. The final phase of the data life cycle is the removal of the data (and any copies) from the enterprise. It occurs in the data archive and is sometimes accompanied by a communication both inside and outside the enterprise. Financial compliance rules within an enterprise or those imposed by regulatory bodies normally drive the Archive and Purge phases. And management accountants act as custodians to ensure that these compliance rules are followed within the enterprise for financial data. Page | 17 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Section 2: Introduction to Data Governance As users of systems, technology, and data, finance professionals are invested in the integrity of our information and its sources. IT and data governance can provide the structure and rules to ensure data accuracy and availability while managing the associated risks. Every organization with shared data is concerned with data integrity. Data governance, a specific sub-element of IT governance, parallels the capabilities of corporate governance and IT governance at the data level. This can be as simple as a set of rules specifying what data (for example system fields) is to be entered by whom, when, and from what source, to as complex as you want (as an example multiple levels of data entry, audit, and control structures). In this lesson, we will identify how organizations can build trust in data by implementing a data- driven culture with effective and efficient data governance. Data Governance and the Data Life Cycle Data governance helps an enterprise administer the data as it flows through the various phases of the data life cycle discussed in section 1 of this lesson. During the Capture phase, enterprises need to identify the capture points for the data and define the data that will be captured. As data enters the Qualify phase, the rules of data governance act as a check to ensure that inaccurate data is identified, assessed for completeness, and secured. At the Transform and Utilize phase, focus shifts toward adherence to transformation rules and the legal utilization of the data according to regulatory standards for decision-making purposes. As the Reporting phase is all about showcasing data to external parties, data governance lists the steps to take when inaccurate data is reported outside the enterprise. Archiving data relies on a set of rules that define what occurs, as well as when and how. And in the Page | 18 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Purge phase, it’s critical to set a purge schedule for the data as per the retention period requirements. Why Data Governance is Needed Effective and efficient data governance can facilitate powerful analytics and decision making across an organization. Demand for advanced analytics is increasing, and, as previously discussed in the course, so, too, is the expectation that management accountants will perform the analytics. But how often have we heard of an analysis resulting in a business decision that later proved problematic because of faulty data? As technology makes it easier for employees to access data, write reports, and conduct their own analysis, data governance becomes an even more important safeguard to ensure the integrity of underlying data. As businesses become more data-driven, data governance provides the foundation for growth into predictive modeling and automation. Data‐Driven Culture When asked, 99 percent of leaders of large organizations say they want a data-driven culture to maximize the value of data through analytics. As we also previously identified, they aim to make business decisions faster and more accurately through automation. Why emphasize culture? The limitation for achieving analytics maturity isn’t usually related to data or technology but, rather, people’s reluctance to use data and technology to answer business questions-in other words, using data analytics rather than intuition as a driver for business decisions. A shift is needed toward a culture that trusts that these data-driven decisions will be effective. Governance Attributes For this analytics process to function effectively, the data inputs (“raw materials”) must be consistent and reliable for the information outputs (“finished goods”) to be relevant and comparable. Relevant, reliable, comparable, and consistent are the four desired attributes of accounting information. Effective governance means that data used in decision making is of consistent quality and from reliable sources. Efficient governance leverages connectivity and technology to enable the comparison of data from many different sources and to deliver relevant analysis. Governance Problems Page | 19 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Unfortunately, only about one-third of data-driven culture initiatives succeed in larger firms. Often the reasons for failure stem from insufficient data governance. If an organization suggests its data quality is insufficient to use for decision making, that signals ineffective data governance. If the data can’t be accessed, that’s a symptom of inefficient data governance. Click each icon to learn more. Ineffectiveness. “I think you forgot to adjust your dates for time zone…”; “I’ve heard ‘churn’ defined three different ways today…”; “We can’t do that analysis; we don’t have good data….” All common phrases for emerging data-driven cultures who struggle with effective data governance. Issues with data and analyses, from little mistakes in calculations to instability in data sets, all build to create a culture of mistrust in data and, by extension, mistrust in any decision based on that data. The root cause of these issues is often ineffective data governance. Inefficiency. Certain phrases signal inefficient data governance: “Will you email me that database extract?”; “Why can’t I access that reporting database? There’s no way for me to get approval?”; “It’s going to take more than a week to access?”; “I’ll just get someone to build out a new data platform for my department.” Inefficient data governance is usually more difficult to resolve than ineffectiveness. Sometimes it’s necessary due to regulations, e.g., open-source language restrictions or the EU General Data Protection Regulation (GDPR); in other cases, it’s a symptom of an organization failing to commit sufficient resources to data governance. Often the easiest way to govern a data set is to block access. But blocking access can create inefficiencies: While no access means no one can compromise the data, it also means no one can use it to improve the business. Blocked Access. Often the easiest way to govern a data set is to block access. But blocking access can create inefficiencies. While no access means no one can compromise the data, it also means no one can use it to improve the business. Nonconnected Data Sharing is also inefficient. If users are constantly getting data from File Transfer Protocol (FTP) or email, then it will be difficult for them to create a report that updates automatically and impossible to automate decisions. Commitment to analytics governance means giving people access to data in a way that facilitates analytics maturity to create further value, even though that takes time. Principles for Implementation Page | 20 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Efficient and effective data governance provides clear ownership and standards for data and data processes to ensure data quality. Although many approaches exist for data governance implementation, most share the following principles. Click each icon to learn more. Accountability. There must be clearly defined ownership of, and accountability for, different types of data. Interconnected data managed throughout the organization requires consistent practices in order to maintain its effectiveness and value. In most organizations, data oversight doesn’t reside within one department. Human resources is the keeper of employee-related data, for example, while accounting maintains financial data. Shared governance brings consistency by establishing organization-wide policies and procedures. Standardization. Data is an asset and must be protected like one. Clear policies on access, definitions, privacy, and security standards are needed. The committee must define the policies, and each department head must ensure adherence. This approach will ensure that the organization is in compliance with regulations, such as GDPR (General Data Protection Regulation). Quality. Analysis is a critical tool for decision making and is only as good as the data upon which it relies. The quality of data should be managed from the time it’s captured. Good data governance includes defining one set of data-quality standards for the organization and establishing consistency in how that data quality is measured and recorded. We live in a fast-paced world where management accountants are asked to provide insight and foresight through analytics, often on short notice. Data governance helps ensure that our data is readily accessible and accurate. That’s especially true in situations where data is spread throughout disparate systems and departments. Often the analysis we’re asked to perform relies on data that we don’t oversee. By coordinating with other data owners in the organization, we can protect the integrity of data and spend more time on value-added analysis than on scrubbing data. Committee of Sponsoring Organizations' Standards for data and data processes to ensure data quality are outlined in various frameworks. The Committee of Sponsoring Organizations’ (or COSO’s) mission is to provide thought leadership through the development of comprehensive frameworks and guidance on enterprise risk management, internal control and fraud deterrence designed to improve organizational Page | 21 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script performance and governance and to reduce the extent of fraud in organizations. The COSO Internal Control - Integrated Framework, is a globally recognized framework for developing and accessing internal controls. The purpose of the COSO Internal Control – Integrated Framework is to help management better control the organization and to provide a board of directors’ with an added ability to oversee internal control. Internal control enables an organization to deal more effectively with changing economic and competitive environments, leadership, priorities, and evolving business models. The framework identifies five components that are of comprised of 17 principles. For more information, you can download the framework from the Resources link above. The COSO Internal Control ‐‐ Integrated Framework As a high-level overview of the COSO Internal Control - Integrated Framework, we’ll refer to the COSO cube and two of its principles that can be applied to data governance. These principles reside within the Control Activities and Information & Communication components of the cube. Click each icon to learn more about these principles and how they apply to data governance. Principle 11 states “The organization selects and develops general control activities over technology to support the achievement of objectives.” Ensuring an acceptable system of internal control over systems and processes will protect stakeholders and the data integrity itself. Management must carefully select appropriate control activities over the technology infrastructure that help ensure data completeness, accuracy, and availability of technology processing. Principle 13 states “The organization obtains or generates and uses relevant, quality information to support the functioning of internal control.” To achieve this, information systems should capture internal and external sources of data, and process and transform relevant data into information that is timely, current, accurate, complete, accessible, protected, verifiable and retained. Information must be reviewed to assess its relevance in supporting the internal control components. Although we pointed out two principles from two different components within this COSO Internal Control – Integrated Framework, factors relating technology must be considered throughout all five Page | 22 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script components and seventeen principles to provide reasonable assurance of appropriate controls. An effective system of internal control will demonstrate that all components are present and functioning to the achievement of objectives throughout the entire entity structure. Section 3: Introduction to Data Analytics Data analytics is the science of examining data with the purpose of creating actionable insight. It gives you the ability to react quickly to an increasingly complex, volatile, and competitive environment. Most organizations know that enhancing their analytical capabilities is critical to their success and survival-helping them gain a competitive advantage or helping them maintain their current market position by helping managers and organizational leaders make better business decisions. Data Analytics and Strategic Management Data analytics must be anchored in the entire strategic management process: strategic analysis, strategy formulation, strategy execution, and strategy evaluation. It is all about quantifying business issues and making decisions with more accurate and factbased data. Business analytics and business intelligence use data mining -- examining large databases to generate information and extract patterns, statistics, and modeling software to support datadriven business decision-making. Data Analysis Models There are many data analytics models available including regression analysis, classification analysis, customer segmentation, market basket analysis, and others. Let’s take a moment to define some of these models. Regression analysis is a statistical model used for obtaining an equation that best fits a set of data and is used to show relationships between variables. Classification analysis attempts to find variables that are related to a categorical (often binary) variable. Customer segmentation, also known as clustering, groups customers into similar clusters, based on the values of their variables. This method is similar to classification except that there are not fixed groups. The purpose of clustering is to discover the number of groups and their characteristics, based entirely on data. Page | 23 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Market basket analysis tries to find products that customers purchase together in the same “market basket.” In a supermarket setting this knowledge can help a manager position or price various products in the store. In banking or other retail settings, it can help managers to crosssell (sell a product to a customer already purchasing a related product) or up-sell (sell a more expensive product than a customer originally intended to purchase). It is important to note that Artificial Intelligence and machine learning also play an important role in performing data analytics. Now you will be presented with an overview of how data analytics can be used to create value in an organization. organization. Video: Value Creation Through Data Analytics Video Transcript (Est. time: 3:44 min): Technological advances in gathering and processing data are increasing at an unprecedented speed. Data analytics includes the extraction and analysis of data using quantitative and qualitative techniques to gain insights, improve predictions, and support decision making. For management accountants, the ability to exploit this data through meaningful data analytics is a critical component of the accounting and finance profession. In their evolving roles, management accountants will be required to acquire enhanced skills in data mining, analysis, and effective communication through data visualization. As business partners, management accountants become storytellers, providing relevant hindsight, insight, and foresight to their organization, effectively turning information into intelligence through data analytics. Let’s take a few moments to review the key aspects of data analytics and their contribution to the value creation within an organization. Value is added to an organization as the sophistication level of the analytics being performed becomes more robust. We begin by leveraging historical data of an organization to provide hindsight into what happened. This type of analytics is known as descriptive analytics. Many accountants leverage tools in Excel, such as pivot tables and graphing to perform descriptive analytics. There are also many other tools in the market being used to perform this type of analytics. Page | 24 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Although understanding what happened is very important, further hindsight explaining why things happened is even more valuable. The type of analytics that help us understand why things happened is referred to as diagnostic analytics. Tools often used to perform diagnostic analytics may include Excel’s customer segmentation, what-if analysis, and multi-variable regression. Here again, other software products in the market are also being used. Up to now, we’ve defined analytics that provide hindsight. What if we could gain insight into what is likely to happen going forward? Predictive analytics adds value to an organization by attaining this type of insight into what is likely to happen. Accountants can access Excel tools like exponential trend smoothing, and solver to perform predictive analytics. As software sophistication continuously improves, the market will continue to expand upon product offerings that provide this type of insight. Taking the value added through analytics even further, Prescriptive analytics help organizations attain foresight needed to decide what an organization should do or what actions should be taken to create added value. Applying some of the predictive analytic tools from Excel may also help in performing prescriptive analytics. Finally, Adaptive analytics help further gain value by providing additional foresight into how machine learning can help. As organizations continue to innovate new technologies with artificial intelligence, more adaptive analytics will be applied. Advances in technology and analytics is rapidly changing the role of the management accountant, how their work is done, and which types of accounting functions are becoming obsolete. Embracing the changes brought on by artificial intelligence and further technological achievements, will enable management accountants to play a more proactive role in providing insight, transparency, and foresight as valued business partners within their organization. Understanding what it takes to perform relevant data mining, value added data analytics and effective communication through data visualization are critical competencies associated with the future of the accounting and finance profession. Page | 25 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script ABC Electronics Data Analytics You’ve just heard about how management accountants use data analytics to make informed strategic decisions and increase company value. Data analytics provide hindsight, insight, and foresight for helping companies achieve their goals. Let’s learn about how the five types of data analytics can be used to tackle a declining sales problem using a fictional scenario. Let’s find out: How the sales decline was identified and why it was problematic. What they learned about reasons for the sales decline. What options they had to choose from for addressing the problem. Which options they chose to meet their goals. ABC Electronics – Sales Decline Data Mining ABC Electronics is a retailer of household electronics products including televisions, refrigerators, washers and dryers, and more. Through data mining, they discovered a decline in Brand C television sales. ABC Electronics has an interactive dashboard that provides insightful visualization for analysis and decision making. Take a moment to review ABC’s sales dashboard for their line of televisions. Then click next to continue. Data Analytics Phases Use the interactive analytics value chart to learn more about how ABC Electronics made the most of data mining and modeling to address the declining sales. Working from hindsight to foresight, click on each analytics type to discover ABC Electronics data revelations and decisions. Descriptive Analytics Descriptive analytics define a business problem. Raw data was cleaned, transformed, and summarized. ABC then used this data to create their television sales dashboard. Page | 26 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script The dashboard’s drill-down capabilities for descriptive information is how the Brand C sales decline was identified. The dashboard revealed overall television sales was up 5% but Brand C sales decreased 2%. Diagnostic Analytics Diagnostic analytics provides possible reasons for the business problem. It tries to find correlations of the product’s activity using data. In the case of ABC Electronics, Inc., diagnostic analytics tries to find possible reasons for the sudden decline in the sales of Brand C televisions by searching for relationships between data attributes and product activity. For ABC Electronics, attributes are factors that could have an impact on sales include: Time of year (seasonality) New comparable TV from competitor Product reviews Product price Social media mentions Statistical modeling looks at the relationship between the attributes and sales. Visually, the more linear the relationship, the better the results. What did ABC’s analysis find? Reasons related to the attributes are: 1. The decline happened in July and August, a typical slow sales period. 2. The competitor’s product sells for less. 3. Customer reviews gave average ratings Predictive Analytics Predictive data analytics can provide insight about what might happen given the current circumstances. This is done through building analytical models, such as regression or market basket analysis, to compare with actual results. Based on their analysis, ABC Electronics’s prediction is that Brand C sales will continue to decline Page | 27 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script due to the attributes of new competitor product, pricing, and customer reviews. This predicted decline will result in leaving ABC with twice as much inventory as desired. Is that what happened during the next sales cycle? The results from actuals in the next reporting period matched the prediction. Brand C sales declined 2%. With the prediction proven correct, the model will be adjusted by assigning higher weighted scores to the attributes associated with the decline in sales. Prescriptive Analytics As we just learned, predictive analytics are a company’s attempt to foresee how changes they make might address a business need. Taking it a step further, prescriptive analytics uses complex modelling to suggest actions to take in order to achieve possible outcomes. The accuracy of possible outcomes depends on two important factors: data quality and model quality. With the prediction of twice as much Brand C inventory than needed, what do you think ABC’s best options are? Two possible actions were identified: 1. Offer a 30 % discount which predicts sales will improve 10%. 2. Offer a package discount for purchasing a soundbar with the television because analytics showed customers frequently buy a soundbar when buying the television. Adaptive Analytics Adaptive analytics is another resource to provide foresight about what might happen given what has happened so far. Machine learning models incorporate actual results and continuously adjust based on new data received. Adaptive analytics collect Big Data into one central repository. Included in Big Data is information related to sales, marketing, email, websites, and content management systems. Machine Learning models use this data to make more accurate predictions. Analytics at this stage are useful in validating the accuracy of predictions. For example, if a prediction recommends a change in marketing strategy, adaptive data analytics tells whether the change is working and how to adjust the model if it isn’t. Page | 28 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Adaptive analytics can also help find unrelated trends such as customers who recently purchased homes and bought a television are also likely to buy five additional electronic products. What did ABC Electronics do? ABC electronics used adaptive modeling to build customer profiles based on customer behavior data collected from web ads, emails, text messages, etc. Using the information from these specific customer profiles, ABC Electronics will plan to offer personalized promotions such as pricing discounts and package deals to increase sales and effectively reduce inventory. ABC will continue to adjust their analytical models as new actuals data is received. With continuous monitoring and adjusting, ABC will be equipped to make the best use of the data available for making effective strategic decisions. Analytics Value at Your Organization Time and more data will tell if ABC Electronics is able to reduce the excess Brand C inventory. While thinking about your organization and data analytics, what information is already gathered that can tell a story about a business need? Or what information is not gathered that could be helpful? How can the types of data analytics help drive decisions in your organization? Data Analytics Competencies According to IMA’s Management Accounting Competency Framework, within the “Technology & Analytics” domain, data analytics competencies encompass extracting, transforming, and analyzing data to gain insights, improve predictions, and support decision making. While the required level of competency may vary, at a minimum it must include knowing what types of analytical models are available and to what business problems they can be applied. Beyond that, important, perhaps essential, skills include the ability to transform raw, unstructured data into a form more appropriate for analysis (data wrangling), the ability to mine large data sets to reveal patterns and provide insights, and the ability to interpret results, draw insights, and make recommendations based on analysis. At the conclusion of this module, we will begin to work with data sets and apply analytical models to begin making business decisions. For more information on the IMA Management Accounting Competency Framework, refer to “Management Accounting Competencies” under Resources on your player. Page | 29 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Concluding Thoughts No Longer a Human Endeavor Accounting is no longer solely a human endeavor. Similar to what has already occurred in manufacturing, transportation, and medical industries, robots are now performing previously people-driven accounting and finance tasks, including transaction matching, variance analysis, and reconciliations. Exponential Growth Technology is transforming the management accounting profession at an accelerated rate, but management accountants should not be concerned about job security or possessing the skills to adapt to their new roles in the digital age. RPA, AI, blockchain, and any other technological advancement that contributes to the exponential growth of available data can and should be properly leveraged to make strategic business decisions. Strategic Business Partner When financial data is available within minutes, management accountants have the time to serve as true strategic business partners and analyze and visualize the data so organizations can respond more quickly to the marketplace, capitalize on innovation opportunities, ensure continuous integrity, and, most importantly, uphold stakeholder and consumer confidence. In the next module, we will focus on the development and practical application of data analytics and data visualization for accounting and finance professionals. Module 1 Wrap‐up You have completed Module 1 of this course and should now be able to: Recognize the impact of technological advancements on the finance and accounting profession. Describe the typical life cycle of data. Explain the impact data governance has on business and its stakeholders. Define the various types of data analytics and how each progressively creates value within an organization. Page | 30 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Module 2: Visualizing the Present & Predicting the Future Course Roadmap Click Next to proceed to Module 2, Visualizing the Present & Predicting the Future. Introduction Welcome to Module 2, Visualizing the Present & Predicting the Future. The goal of this module is to gain a deeper understanding of data analytics and data visualization through the fictitious scenario depicted in the case study, Huskie Motors Corporation: Visualizing the Present & Predicting the Future. The case study is available in the Resources link of this course. You are not required to read the case study ahead of time, however, it is available to download if you prefer to print it and follow along. When you're ready click to get started. Module 2 Goals Upon completing these lessons, learners will be able to: Define data visualization. Describe how data visualization can impact the way data is communicated. Identify various data visualization tools and their different uses. Recognize the importance of choosing the right visualizations based on your audience. In Module 2, Knowledge Check questions are dispersed intermittently throughout the lessons. There are a total of six knowledge check questions among the three lessons. Be sure to answer all of the six questions in this module before moving ahead to the next module. Module Menu Module 2 consists of three lessons: Page | 31 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Lesson 1: Case Study Introduction Lesson 2: Huskie Motors Operations, and Lesson 3: The Data Dilemma. Click Lesson 1 to proceed. Lesson 1: Case Study Introduction Section 1: Introduction to the Case Study In order to effectively work with, analyze, and make business decisions from the abundance of data made available through automation, we have made it clear that solid data governance is vital to the success of an organization’s strategy. To establish a practical understanding of data analytics and data visualization, we will now examine the opportunities and challenges created by Big Data and demonstrate how management accountants can apply new competencies to the various data processes of an organization. We’ll now begin the case study Huskie Motor Corporation: Visualizing the Present and Predicting the Future. Before we begin, it’s important to note that Huskie Motor Corporation and all of the characters that are introduced are fictional and have no relationship to an actual organization. Big Data Organizations create and collect massive amounts of data as a result of their day-to-day operations. Frequently referred to as Big Data, it represents an important asset for the organization. Big Data presents both opportunities and challenges for accounting professionals, who are expected to know how this data is created, collected, stored, and accessed. As the custodians of the organization’s assets, accountants are expected to understand and implement controls over the storage and use of the organization’s data. Big Data and the Management Accountant Page | 32 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script As business professionals, accountants are expected to know how to use this vast source of data to make better business decisions and identify potential risks. Understanding how to use Big Data to formulate and solve business problems provides an opportunity for the accounting professional to become a forward-thinking strategic partner in the organization. The challenge for accountants is to develop the skill set needed to extract value from Big Data through advanced analytics. Data Visualization One skill that is becoming increasingly important for analysis of large data sets is data visualization. Data visualization is the process of displaying data to provide insights that will support better decisions. Gartner’s 2017 Magic Quadrant for Business Intelligence and Analytics Platforms states that “the visual-based exploration paradigm has become mainstream.” Gartner identifies three platforms as leaders in visualization software: Tableau, Microsoft, and Qlik. All three products provide relatively easy-to-use data visualization tools. In this lesson, you’ll also learn about some of the other platforms that are available in the market. As new software is introduced to the market, there may other data visualization tools that are leading in the industry. Throughout this module, as you progress through the Huskie Motors case study, you’ll be taken through a series of videos that discuss data visualization topics such as knowing your audience, selecting the best visualization in context of the business question, various visualization software, and much more. Dan Smith, accounting professional turned data scientist and leader in data analytics, will take you through these important concepts on data visualization. Video 1: Data Visualization ‐ Why Visualization Now let’s hear from Dan Smith where he further introduces the concept of data visualization, discusses why visualizations are important, and emphasizes the role and function of data visualization in analyzing Big Data. Video Transcript (Est. time: 9:34 min): In this series of videos. We’ll review the tools often associated with creating data visualizations and provide a little context of when you would use what tool or what visualization in what scenario. As the case study progresses, you’ll see situations where you need to understand for whom you are creating. The visualization for that is understanding your audience, and understanding your audience means, does this Page | 33 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script audience need to explore the data and find their own answers to their own business questions, or does your audience simply need to be informed of the answer, or do they want to report only providing a repeatable answer to a well-defined question? We’ll use these concepts of exploratory, informative, and reporting audience needs as a framework to talk about all the visualizations concepts associated with those visualizations. We’ll also dive into the source of the data and where your data visualization tools live. The data environment concepts, again, that’s both for the data visualization tools and how the audience receives the information provided by those visualization tools, the visualizations that you create. Finally at the end of the case study, we’ll discuss the future of data analytics and the role of the management accountant in data analysis. With that out of the way, let’s talk about why visualizations are important. The formal definition of visual analytics is defined as the creation and study of the visual representation of information or the process of displaying data to support decision making, but what does that mean? What the creation and study of a visual representation of information, quote unquote, is really saying is that we’re simply taking data and using the data, the raw data to create information using a visualization. Now there’s an important distinction between data and information. Data are the raw materials for the creation of information data. Our numbers, rows in a table there, transactions before anyone sees them. Information is processed. Data, data, which is processed in such a way to make some type of decision or gaining knowledge about the data or what that data represents. And what about the need for visualizations? Why are they so important? In modern analysis, visualizations are closely associated with how humans consume information, how humans take data and process it as information. You see, humans naturally use visual indicators to understand their environment. Way back at the beginning of humanity we would use visual cues to know if a plant was ripe or a saber-tooth tiger was about to attack us. Well, understanding a bar chart or a line chart is not exactly the same as understanding the threat of a wild animal visibility into the state of our environment. Specifically the business environment is nonetheless very important for the modern human, and because humans are so well adapted to processing massive amounts of information visually, we can use visualizations to communicate complex ideas and interactions between groups of data using visual representation of those and ideas and interactions. Using visualizations is much more efficient than if we had to communicate that Page | 34 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script information through text or speech alone. For example, imagine explaining double-entry accounting using only your words, no pictures, T tables, ledgers, or whiteboards. I remember my first accounting professor talked about debits and credits being on the right hand and the left hand side of a, of a, a T table over the general ledger. He even stood on the table and had debit and credit written on his shoes to reinforce that visual concept. You see, he was being a great professor because he understood that we had to have a shared visual context of what was being explained. Without visualization, without something tangible, you cannot know what the other person is visualizing in their mind. It’s very difficult to be on the same page to share a vision if you don’t have something to see in the first place. So visualizations, in other words, create shared context between the audience receiving the information and you the individual that’s communicating this context to them. Speaking of shared context, let’s do a quick overview of the visualizations that we’re going to be covering. Now, you may be unfamiliar with some of the terms that I use in this quick summary, factors, groups, aggregates, distribution trends, etcetera, or unfamiliar with the visualizations themselves. Don’t worry too much about that. Just make a note. We’ll define all of these later, either in the videos, in the course content, or through reference material which I’ll provide to you. So let’s get started on this quick summary bar charts for comparing single values or aggregate values between a small number of groups, a small number of groups that all share a common numeric baseline. Bar charts are very effective when comparing the composition of groups or factors within a larger group. A pie chart for a single larger group or a stacked bar chart if you have many larger groups. Are the visualizations typically used? When we display trends over time, we use lines or line graphs when we have two separate fields, two separate sets of values, and we want to explore the relationship between them that’s facilitated by scatterplots or principal coordinates plots. When we look for the relationship between many groups within those two values, so multiple factors within a set of two values, we use a colored size or shape scatterplot or we may use something like a heat map to indicate high vs. low values within a table. This is known as a heat map just because it looks like there’s hot and cold areas on your table. You’d say a way of getting a quick visual indicator of hot spots while at the same time providing raw numerical context. Going onto more advanced visualizations. When we want to look at statistical measures or Page | 35 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script traditionally statistical measures like average mean median values, these are known as measures of central tendency. We need to understand how the data is distributed around that central tendency, so we use visualizations like histograms to see the distribution of data around an average mean, median so that we can see how spread out the data may be. If we want to compare multiple groups and their central tendencies and distributions, we would use a box whisker or a candlestick plot. When we have a lot of information that we want to communicate and perhaps we need to add an interactive component to it, allowing people to explore that information, we would use a dashboard. A dashboard is a combination of visualizations oftentimes with either filters which allow us to modify the underlying data that populates the visualizations or, that is, filter the data. Either filters or we can include a drill-down component where you have one visualization interactively modify another visualization. I would be able to click on one so that I could see more details about that visualization. That was a very fast summary. We defined visualization as the creation and study of the visual representation of information. We learned visualization is important because it assists with informing an audience and the exploration of data. Finally, we introduced many visualization techniques. Again, don’t worry if you’re unfamiliar with some of the visuals or the language I mentioned. We’ll go into more detail later and also there may be some visualizations that you really like or that you’re really interested in that I didn’t mention, like a Sankey diagram, waterfall chart, or ribbon chart. We’ll dive into the core visualization concepts used across all of these visualizations, so the underlying shapes and lines, the mechanisms of how they’re communicating information. This will allow us to understand the fundamentals of all visualizations, visual analytics, everything from simple bar charts to complex infographics that you find online on websites. I look forward to this journey, individual analytics with you. Next up, we’ll be talking about the tools used to create visualizations. Video 2: Data Visualization ‐ Tools of the Trade While the case study identifies three platforms as leaders in visualization software, there are many others to be aware of. Dan will now discuss the various visualization tools, and the strengths and weaknesses of each. Page | 36 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Video Transcript (Est. time: 13:37 min): Let’s begin our discussion on visual analytics by discussing visual analytics tools. In this video we’re going to talk about what, exactly is a visualization tool and why are they becoming more and more prevalent in business. We’ll cover the general concepts applied by visualization tools so you can begin to feel more comfortable using a wide range of tools as opposed to feeling like you have to specialize in just one tool over another. And finally we’ll outline the general categories of types of visualization tools, the spreadsheet- focused ones, the ones that concentrate on visualizations first, the workflow-centric tools, and the code-based tools. In the case study, they talk about how the visual-based exploration paradigm has become mainstream. Now that’s a very consultanty way of saying that people now expect to be able to connect and explore their data in a visual platform. Well, what do I mean by visual platform? Well, literally anything that can connect to a data source and modify the data in a way that displays a visualization can be called a data platform. Okay, fine. So what do I mean by data source, and what do I mean by modify data? Well, we’ll get to all of that towards the end of the case study when we’re talking about the data and data environment concepts. But for now, understand that particularly when it comes to Big Data, you’re not going to be able to pull all the data into a single spreadsheet because the data is simply too large. But more on that later. Let’s focus for now on the tools themselves. So what, what do I mean when I say tool? Because any of the tools, Tableau, Power BI, Click, Excel, Jupyter Notebooks, R Studio, RapidMiner Nine, the list goes on and on. Any of these tools are able to create pretty much the same visual analytics. They just have their own way of doing it. And that’s because all these tools apply the same fundamental concepts. And I’m not talking about just visualization concepts, I’m talking about the highest-level analytics concepts. You see we tend to use words like tools, concepts, and technology interchangeably, but technology is the application of concepts using a tool. Visual analytics tools apply analytics concepts to create useful information. Competency in a tool is knowing how to build a bar chart in Tableau or in Power BI. Competency in the concept is knowing when to use a bar chart or should you use a line chart or should you use something else instead? How does the data need to be shaped so that you can do that? How do I connect to the data? There’s a bunch of underlying concepts that make these tools easy. And if you think about it, when you hire a contractor to oversee the construction of your house, for example, would you care what kind of hammer they used? Probably not. The tool isn’t as important as the person’s capabilities. Just like when you first learned to be a management accountant, did you have to learn Excel first? I doubt it. I Page | 37 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script don’t remember there being questions on Excel when I passed the CMA. Building an analytics solution, visualization or otherwise, can be viewed through the same lens. Our concern is how we are applying the visualization concepts so that they can be understood. Now, this may seem like an unimportant distinction. Why am I going on and on about this? Because it’s so important that you understand the difference between a tool and the concept the tool is applying. That’s how you can see me jumping from Power BI to Tableau to RapidMiner to all these different tools. It’s not because I’m a genius. No, I’m no smarter than anyone else. It’s just I know the underlying concepts so I can easily jump from one to another vs. having to learn specifically how to do something in one tool. We’ve talked in depth about why concepts are important. What are the concepts? Same analytics concepts used in data science. You connect to the data, you transform that data and information, and you represent the data in an understandable way. The specific concepts will be covered in the visualization concepts video, and the data concepts and connecting to data concepts will be covered in the data and data environments concepts video. But at a high level we connect, we transform, we represent. So what are the categories of these tools? Well, and this is purely my personal experience. I feel there are roughly four general categories to which visualization tools fall under. The categories that I found are a spreadsheet-focused, things like Excel, Domo, the, a visualization first, the ones where they focus on you being able to create a visualization. Those are the famous ones like Tableau and Power BI. Then you have ones that are centered more on the workflow, workflow process management utilities like RapidMiner and Nine. Finally, code-centric tools like Jupyter Notebooks, Zeplin. Even our studio notebooks like let’s dive into each one of these. So first are our spreadsheet focus tools and Excel is the bread and butter of accounting. Many visualization features have been added to Excel. Excel itself is not a terrible visualization platform, particularly if you’re using pivot tables and pivot charts instead of making new tabs and new data sets for each visualization. Pivot charts actually work a lot like the other visualization tools, so if you’re using those, you’re probably comfortable with the visualization focus tools already. Because of Excel’s market dominance, many apps have tried to mimic or extend Excel’s functionality. After all if you can’t beat them, join them. Apps like Domo, Pitch that you can do advanced analytics within Excel, but there’s a lot of operations happening on the back end. I’ll put up a little information in the course content to links to some of the platforms. But Page | 38 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script I’d wait until you’re finished with all of these videos before you start exploring them so that you can better understand the data and data environment concepts. I hear people say that I need to see my data as a common complaint on the other tool categories, particularly for people that started with Excel. I mentioned before, though, that sometimes you have terabytes of data. You have so much data, you can’t bring it into your computer. It’s just not practical to see all of the data. It won’t fit onto your computer. And even if it could, there’ so much of it that seeing each individual row isn’t going to provide that much value to you. There are plenty of ways where you can click on a visualization, a component of visualization, and see what the details are about that point. That’s called drilling down, which is a component in dashboards. And we’ll talk about that in depth in a future video. For now, sticking to the tools themselves, one of the primary ways of creating a dashboard, particularly interactive dashboards, are the visualizationfocused tools. These visualization-focused tools might be more likely called focused platforms since they tend to encompass both the transformation of data, data workflows, and the visualizations themselves, however, they’re visualization first and they tend to abstract a lot of those data transformations. So if I say abstract, it means that the tool is determining how to perform an operation, not the user, which may not always be ideal, particularly if you need to communicate how that tool is doing the transformation. Since you can’t see the back-end operation with visualization tools, though, as their name implies, they can look gorgeous. Those gorgeous visuals abstract a lot of complex operations down to relatively simple steps, so the good-looking visuals can be created quickly because of all that abstraction, you can create interactive data, connected dashboards. Those dashboards can be made into reoccurring data, connected reports. They can extract large data source queries from applications like Hive, Spark, Big Query, Redshift, etcetera, for those exploratory operations. The major players in visualization-focused platforms are Tableau, Power BI, Click, lesser-known players like Burst, Pentaho, Spotfire, and if you don’t have access to one through your workplace and you want to learn, both Tableau and Power BI have free versions, Tableau Public and Power BI Free License, respectively. In terms of the capabilities of these platforms, there’s pros and cons to all of them. If you want to use AR or create some of the more advanced visualizations, Power BI is considered by some to be a bit better, however, they use an underlying language called Dax, D-A-X, in order to create functions. So if you need to create custom functions or custom operations within the platform, there’s a pretty steep learning curve. Page | 39 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Whereas building custom calculations in Tableau, it’s generally easier because they have a more abstracted, a SQL language that makes it easier to create custom calculations. But that level of abstraction can make it hard to extend the platform beyond making a custom calculation. So due to the difficulty in exploring the workflow and the different operations used to create a visualization or an analysis, tools that are focused more on the workflow, the transformations that are done to your data set. So if you look at things like RapidMiner 9, they’re great for exploratory data mining and for creating visuals of the data exploration. All you really need to know is search the name of the concept you want to apply and you’ll see a module for it. They can be a little confusing for beginners and they often require a lot of local processing, but these are a good middle ground between the abstraction present in visualization tools and the more detailed operations found in code focus tools. So speaking of the code focus tools, things like Jupyter Notebooks, our notebooks is that plan. These are visualization tools second, code management tools first. They are fantastic for exploratory data analysis because you have a clear record of every transformation that happened to that data. You have a clear record of everything that’s happened because it’s right there in code. In the notebooks, you can write your research notes and your findings directly in the notebook and have the code recompile and be completely reproducible on demand. However, because you need to know the fundamentals of coding and that’s beyond the scope of this course, we won’t be covering code-focused tools in these videos. However, I know you are a data scientist and I know you already know the basics of coding even if you don’t think so. I highly recommend you watch some introductory videos on BASIC, Python, or R to learn the concepts. So in conclusion, in this video we covered what is a visualization tool, how did they apply the same analytics concepts, but focused on different aspects of the analytics workflow and or/user experience. The analytics concepts they apply are connecting to data, transforming the data into information, and representing the information in a usable format. Visualization tools may focus on spreadsheet data, the visualizations themselves, the workflow to connect, transform, and represent data and information or the underlying code behind all those data operations. The different tools all have advantages and disadvantages, and you’ll be able to accomplish much the same in all of them with varying degrees of difficulty and effectiveness. What’s more important is how you use the analytics concepts to provide information on solved business questions as well as explore what they do to find new business questions to solve. We’ll introduce concepts around communicating results in your various audiences for Page | 40 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script visualizations in our next video. Section 2: Huskie Motor Corporation: Background Huskie Motor Corporation (HMC) is an automobile manufacturing company with production and sales throughout the world. Automobile manufacturing and sales is a complex and highly competitive business. Let’s explore the Huskie Motors background in detail to learn key business information about the organization. Huskie Motor Corporation Although the automotive industry has a broad global reach, only 15 companies produce 88% of the world’s vehicles. HMC is a new and a smaller player in the automotive manufacturing market. If it is to survive, the company must fully understand its markets, customer base, and costs to keep profit margins positive. It has some very popular brands and high customer satisfaction-both are critical assets at this stage of the game. HMC: Miranda Albany Miranda Albany was hired at HMC as a senior cost analyst three years ago, when the company first began operations after a spin-off from Blue Diamond Automotive, a large auto manufacturing company. Recently promoted to assistant controller, Miranda is anxious to make a good impression on her boss. Communication with HMC Executives Although Miranda is sure that the data she has collected can help her management team make better decisions, she does not have the time, or expertise, to figure out how to organize or use the data effectively. Miranda communicates with HMC’s executive team on a weekly basis to convey vital information regarding marketing strategies, sales targets, and production needs. Yet she feels that her information is often “lost in translation,” as the executive team struggles to digest the numbers. Miranda believes that data visualization may be a crucial component in helping her effectively connect with HMC executives. Page | 41 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Video 3: Data Visualization – Know Your Audience Miranda’s struggle to effectively communicate to her executive team highlights the importance of understanding the audience’s needs, and how visualizations answer business questions. Dan will discuss how to select visualizations based on your audience. Video Transcript (Est. time: 13:34 min): In this video we introduce the concept of understanding your audience interest. That is a business question and how it determines the types of visualizations you should use. We may create visualizations for audience interest of exploring data or perhaps informing the audience or simply reporting a value. In our previous video, we introduced high-level analytics concepts. In this video we overlay those concepts of connecting to data, transforming data, and representing the information created by transforming the data, we overlay those concepts with the data analytics process introduced in the case study as well as the audience needs. Furthermore, the audience needs, the audience interest for visualizations also overlaps with analytical maturity concepts. The analytical maturity concept stages are descriptive, diagnostic, and predictive. Finally, we spend a little time on what exactly do we mean when we say business question and what does that have to do with the audience interest. So first let’s talk about the audience. It’s easy to assume that visualizations and the concepts associated with visualizations are one size fits all. We will soon learn in an upcoming video that there are rules for what visualizations work best in a given scenario. So it’s only natural to assume that those scenarios are based on the underlying data or information only. Who’s receiving that visualization should be irrelevant, right? And sometimes there are boilerplate best practices. Sometimes there are universal best practices for visualizations. However in most cases, context matters. The audience matters. For example, if the visualization explains information, we discovered in data that is an informative visualization. The visualization would be very different than the numerous visualizations we would have used to explore the data in order to discover that new information that we are now informing someone else about. An informative visualization is simple. It's to the point. And exploratory visualization is usually quite complex. Page | 42 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Understanding which visualization to use and for whom is an important concept when it comes to visual storytelling. So visual storytelling is what you’re trying to get across. The idea of visual storytelling means that your audience is receiving the information that they need in order to best use that information. So somebody who would need a more exploratory type of visualization or even an interactive dashboard would be someone like an analyst, a manager, a consultant, somebody who really understands that data and needs to understand the entire process of how some insight came about to really have by and to really understand what’s happening. Alternatively, someone like an executive, a VP, a director wouldn’t really care that much about how you reached some conclusion. They would just want to know what that conclusion was. You would still need the underlying evidence of how you reached the conclusion, of course, but it’s just the facts. Tell me what matters. Those are simple informative visualizations, and when the exploration has informed people so much that they now are repeatedly asking the same question, now that becomes something more along the lines of a report, very similar to informative but now we’re talking about something that’s reoccurring, something that’s about stability. So these would be, for even an external audience, would be a reporting-focused audience, but often it would still be the same people as exploratory and informative, just a very, very, very, simple type of visualization. And if you notice that I talked about that progression, that general process of taking data and creating information for an audience, well that’s similar to the data analytics process that we outlined in the case study. You’ll see all these steps identifying the business question connecting to data, cleaning it, etcetera. All of those involve exploratory visualizations. Once we’ve gotten the data and explored it, cleaned it, explored it again, and believe we have an indication of the answer to the business question. Well, then we use more informative visuals. Informative visuals that clearly explain the answer that we’ve determined through our exploratory effort. Many times the information that we’ve obtained isn’t a one-off answer. The information can provide an ongoing solution. For example, many of the financial metrics which we’ve all learned in earning our CMAs or our basic finance courses when we first started our job as a financial analyst. Those metrics, those measures, like price-to-equity ratio, like CAPM, like earnings before interest, debt and taxes, even debt-to-equity ratio. These are not metrics or KPI, that, KPI that have always existed. Somebody originally discovered that they are an indicator of business performance so whether informative visualizations answer a question that is asked repeatedly, often that informative Page | 43 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script visualization will become a report. A report again is a visualization, a reporting visualization. Even a table is a visualization, needs to be extremely precise and communicates the answer to a business question over and over and over again reporting. Like financial reporting and visual reporting these are very rules-driven, it’s very datadriven and requires more capability around consistency and the inference structure of the data environment rather than necessarily the competency of how effective the visualization may be at communicating the report. The effectiveness of the visualization has already been demonstrated in the exploratory and informative steps and conceptually the idea of reporting informative, exploratory, etcetera, that also aligns with the concepts of predictive capability, descriptive, informative, the steps of analytic maturity in analytics maturity is a business theory indicating that as an organization becomes more capable of using their data to make informed business decisions, i.e., analytics, they will gain more value from their data. The steps in analytics maturity are descriptive, diagnostic, and predictive. Many of you will have also seen this before with prescriptive at the end, beyond the scope of this video but it’s out there when you look it up. Going back descriptive analytics, descriptive analytics merely communicates some historical event and leaves interpretation wholly up to the end user. Just like a financial report. Consistency and speed are typically the most important aspects of descriptive analytics. Diagnostic analytics, this goes a step further than descriptive. It’s not just describing some attribute of data, it informs the audience with additional information either to help the audience reaching more informed conclusion or to inform the audience about the analytics conclusion. As you can see, this is similar to informative visualizations. You will come across this in financial analytics. When they’re taking a metric found in a diagnostic measure and they’re adding additional information to provide context as to why that metric may have gone up or down. Predictive analytics uses statistical models to forecast new information. Given historical information such as changes in a KPI and the additional data around it, it’s using that to provide a forecast of what could be a future value. Obviously, financial forecasting is a good proxy or is a good example of predictive analytics. Now, let’s talk about business questions. A few times I’ve said understanding the business question or defining the business question, what does that mean exactly? Pretty much what it sounds like. It’s the question we need to answer to help achieve our objectives, the objectives usually solving some problem or fixing a situation. But as we know, a simple definition, a simple question, can hide a great deal of complexity. A business question can be interpreted as what problem are we trying to solve? And in Page | 44 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script many cases, though, if we can define the problem, somebody has already figured out how to solve it. Identifying the business question is by far the hardest part of this entire process. Case in point, I once had a client in the course of six months, this client rejected two different visual analytics contractors I had placed at their firm. I personally went to that client at my company's expense, shadowed the third contractor for a week, and during that time I found that while the client had asked for a highly technical resource, somebody that was proficient in the visualization platform, they weren’t really defining their business questions. The client wasn’t defining what they needed out of the visualization platform. Their process of defining their business questions was just a kind of have a roundtable discussion with the contractor about what their problems were in general. The meeting was more of an airing of grievances rather than a proper request requirements-gathering session. This was my mistake. I was young and I thought if somebody was asking for specific skills, for example, a Tableau developer, then they may only need those skills. I looked only at the problem as it was defined, not the root problem itself. The lesson here is that just because somebody communicates their problem does not mean they are communicating the problem. The question that needs answering, this is the real skill of an analyst, is identifying what question needs answering, not just the problem as it’s stated. Trying to solve a problem vs. determining the actual problem which needs solving is the difference between a junior developer and a senior analyst. It’s the difference between a purely technical skill and a broader awareness of the business. It’s the difference between being reactive and proactive. That may seem obvious to you but that’s because you’re probably one of those senior resources, you’ve already accomplished the hard part. In this course it’s assumed the root business question is already defined. It’s like how introductory physics assumes everything is in a vacuum. Assuming the root business question is defined is a necessary assumption for you to learn the technical competencies to answer said business question. And many of you are taking this course because you feel the need to increase your technical analytic skills. Those skills are important. They are the price for entry. Take it from me, management accountants are probably the best people I’ve ever met at understanding the real problem, at defining the actual question. So use my hard-earned experience, if you’ve already mastered the hard part of this process, and that’s understanding the bigger picture, the bigger needs of the business, you’re already well on your way to being a data scientist, a Page | 45 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script data analyst, whatever you want to call yourself. In conclusion, in this video we’re beginning to categorize our visualizations based on the intended audience for said visualizations. We may create visualizations for exploring data, informing our audience, or simply reporting a value. The audience for visualizations overlaps with analytical maturity concepts. Exploratory visuals are often used in predictive and diagnostic analytics, while informative visualizations are primarily in descriptive and diagnostic analytics. Finally, because of the goal of analytics is ultimately to either answer a business question for your audience or identify new business questions to answer, in which case you are often the audience, we discussed what a business question actually means. In our next video, we’ll dive deeper into those visualization concepts we keep talking about. Stay tuned. Miranda Hires a Consulting Firm To help her utilize the massive amounts of data at her disposal, Miranda has interviewed consulting firms that specialize in information technology (IT) and data engineering. Ultimately, Miranda chose D & A Consulting because of its automotive industry expertise and its focus on data analytics and visualization. Video 4: Data Visualization – Visualization Concepts D&A Consulting’s expertise in the automotive industry will enable them to choose the best visualizations to effectively communicate their findings. Now we’ll hear from Dan as he discusses some basic visualization concepts and the various ways data can be visualized. Video Transcript (Est. time: 6:32 min): Now that we have some awareness around how our visualizations will be received by different audiences, let’s move our discussion towards the visualizations themselves. This video concentrates on visualization theory, namely how shapes, lines, and points are used to display information. Shapes represent a single value or groups of values. Lines represent change or connection. Points represent the intersection of two or more values. Page | 46 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Just as even the most advanced statistics are still based on addition, multiplication, and division, even the most complex visualization is formed through some combination of shapes, lines, and points, and in this video I’ll be white boarding, because white boarding is how we often start defining questions or creating visualizations. In practice it’s good for us to see some of those real-world principles in action, and yes, that means that [we’ll] basically be drawing shapes in this video, and that may make visual theory seem a little childish at first, but I assure you we are not in elementary school again because we will quickly move from basic shapes to much more advanced concepts. Let’s get started. We’ll begin our discussion on visualizations with a little theory, followed by more practical examples in the upcoming videos. Generally we have shapes like bars, blocks, or circles, sometimes separated into segments. These shapes tend to be single or single aggregate value and aggregate value as in sum or total. The shapes, the pattern of data within the category, or the relationship between the categories is not important. We’ll talk more about what that means when we cover distributions and relationships respectively. Next, we have lines. Lines tend to represent a change or difference between values most often associated with a change over time. Lines are visually effective at showing that one point is lower or higher than another when on the same axis. It’s also useful to reinforce size differences or position differences when shapes alone may be ambiguous due to their ability to indicate direction or trend. Lines are useful to simplify complex patterns as seen in forecasting and regression. Due to their ability to show difference and changes between shapes, they’re useful for things like data distributions as we see in histograms. For points, the position of a point in a visualization represents the intersection of two values, which is why points are most often used in scatterplots. That level of granularity is useful for recognizing larger patterns within the category or group, as well as between the categories or values. As I mentioned, you’ll see lines used to simplify trends or relationship within a collection of points like with a scatterplot and a regression line. In fact, you may see shapes or colors used to distinguish patterns between groups of points. You can see that we mix the concepts of different shapes and lines and colors within visualizations, so as a gentle introduction to some of the more complex visualizations, let’s look at how points, lines, and shapes are used together to visualize more complex patterns within and between groups of data. Visualizing patterns, the data itself is called looking at the distribution of your data. Distribution visualization such as histograms and Page | 47 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script box whisker plots combine both lines and shapes. There are even points showing outliers at times, and when I say distribution of your data or frequency of data, it means how often a given value appears within a given data element, or a group, or category, or column. It can be the sale price of cars, age of customers, time between purchases, etcetera. The frequency distribution or shape of data gives us more insight into the underlying data which would not be apparent if we were only given the average or median data value. The boxes in a box whisker plot represent the middle 50 percentile of your data with a line that represents the median or the middle value of your data. Now you’ll also notice that there’s a line above and below that isn’t a line so much as it is a T shape or whisker. Those whiskers represent typically the top and bottom 25% of your data, but you may also see lines between box whisker plots to show how different one box whisker is to another, so you get a sense of how much overlap there are between the two. And keep in mind, histograms and box whisker plots represent the same thing, the shape or distribution of your data. You can look at box whisker plots as a histogram simply turned on its side. Alright, in this video we covered how simple geometric objects can be combined to represent complex information. Shapes indicated some aggregation of values. Lines showed change over time and/or trends, points and intersection of values and shapes, lines and points all combined show distribution of values with a data set. Moving forward, we’ll see how the data visualization tools apply the concepts within the context of your case study and look in detail on specific visualization types, both informative and exploratory. HMC Staff Miranda asked Megan Martinez, a senior staff accountant at HMC, and Adam Green, a staff accountant at HMC, to work with D & A on the project. HMC: Megan Martinez Megan has been with HMC for two years and has recently relocated to the corporate headquarters in Dearborn, Mich. Megan’s corporate transition is almost complete, and she is anxious to move forward in her current position. HMC: Adam Green Page | 48 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Adam is a young, aggressive employee who began with the company eight months ago, since he graduated college. He has a good sense of judgment and is eager to make a good impression on upper management. Miranda believes the two employees will provide a good mix of experience, dedication, and teamwork. Section 3: D&A Consulting Group: Background D & A Consulting was started by Doug Chan and Arlo Paxton five years ago. Doug and Arlo have been friends since college, having graduated with accounting degrees from the same university 15 years ago. Although they initially went to work at different accounting firms, they both followed similar career paths. After becoming managers at their respective firms, Doug and Arlo decided that their real passion was in teaching clients how to use data to make better business decisions. They started their own consulting firm with one primary focus: helping clients better understand their businesses via the use of data analytics and data visualization. Video 5: Data Visualization – Visualization Selection D&A Consulting’s primary focus is to help clients understand their business better by using data analytics and visualization. Dan Smith will now discuss how to determine the best visualizations in the context of the business question. Video Transcript (Est. time: 5:20 min): In this video, we’re going to look at the different types of visualizations and how to select a given visualization for a given business problem, and provide a simple flowchart to aid you in visualization selection. All visualizations have a function. You may simply be comparing two different collections of objects such as total revenue vs. total expenses. You may be looking at the pattern of a single value over time, such as profit over time, or you may be exploring a complex relationship between many different elements such as total revenue vs. expenses by country by month. Regardless, all visualization functions are ultimately used to answer some form of business question, and you can use a simple decision tree once you know what types of questions you need to solve. So let’s dive deeper into defining these patterns in visualization functions. The pattern that we see in most, if not all, visualization decision tree diagrams, it starts with a question of are we trying to visualize a comparison, composition, relationship, or distribution? Page | 49 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script And are we looking at a single point in time or a change over time, are more informative. Visualizations are the comparative and composition visuals. Comparison is when we want to compare differences between our groups as a whole, so the aggregates of profit between countries or between even sales channels, something along those lines. The standard bar charts are typically associated with comparisons and are probably the most common visualization you’ll see. So many of you may already have experience using them when comparing categories, factors, or groups. Again, those terms tend to be used interchangeably. When, when comparing those categories over time, particularly when you have many categories, it’s almost always a good idea to use a line chart. We do often see bar charts used when it’s a small number of time periods like in financial statements from year to year, but if we have a lot like weeks or days, a line chart is far more effective at representing those more granular changes. Composition, visuals on the other hand, these are typically shapes or combinations of shapes as well. They show the categories within a larger group, think population per city, within a state or sales per region within a company. You’ll also see composition visuals associated with pie charts. We can talk about those. They have their use, but we’ll dive into that more in the details of our informative visualizations. Let’s move on to the relationship and distribution visuals. Relationship visualizations tend to focus on how the individual values of one group change given the values of another. The last visualization family are distributions. These are scatterplots, histograms, and box whisker plots we already mentioned. We’ll dive a lot deeper into the relationship and distribution visualizations when we talk about exploratory data, but let’s talk about theory again. Why do we use these visualizations when we do? Who decided these rules? Well, we did. It’s simply the way that humans process information. Decades ago, a famous study by Cleveland et al tested many, many people with a series of visualizations including pie charts, bar charts, line charts, etcetera, all the things we just talked about. The study found a few general trends in the way people interpret information visually. Visualizations that share a baseline that start from the same point which are only viewing a single distance or direction, for example, a bar chart, which is ultimately representing the distance of that bar from a baseline. Those tend to be the least confusing, confusing visuals. While people struggle to interpret the differences between area that is one circle bigger than the other and even more so, they struggle with interpreting difference between color, shades of color in particular. So Cleveland has a hierarchy of which visualization to use, when. We will dive deeper into that in the upcoming videos. Page | 50 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script In conclusion, in this video, we saw how taking the basic concepts of visualizations and how we can use that to find the best visualization for most business questions. Using a simple flowchart, we reviewed how research established using these visualizations resulted in more interpretability and less human error. In our next two videos, we’ll take an in-depth look at the informative visualizations, line, pie charts, as well as the exploratory visualizations, scattered tree bubble and histograms. Stay tuned. D&A's Business Purpose Though data analytics is not a new concept in the business world, the amount of data available and the number of sources from which it can be captured has skyrocketed. Driven by lower storage costs and more “user-friendly” analysis, software businesses have vastly increased the amount of data they collect and store. Yet finding the talent needed to transform that data into useful insights is what businesses find challenging. D&A Consulting helps companies fill that void. D&A Consulting Doug and Arlo are excited about the opportunity to work with HMC. They have assigned their automotive industry expert, Kevin Lydon, as the project lead, along with a D & A new hire, Jan Morrison. Kevin has been with D & A nearly as long as the company has been in existence. He and Doug had met each other on a consulting project where Kevin was working as an IT engineer. As project lead, Kevin is enthusiastic about mentoring Jan on her first assignment. He is equally enthusiastic about the potential for improvement at HMC. He reassures Jan that this client will be a great opportunity for her to test her data analytics and data visualization skills. HMC: Miranda Albany, Continued She has advocated a “data-driven” strategy for decision making at the company by capturing a vast number of product-specific details relevant for both production and marketing. The problem is that the company has grown so quickly that Miranda is having a difficult time keeping up with the massive amounts of data that continue to accumulate. To further complicate matters, there is a growing need for reporting detail and in-depth analysis of product lines given the availability of additional data. Video 6: Data Visualization – Informative Visualizations Page | 51 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Next, to further understand how to communicate ideas and discover business opportunities through data visualization within analytics, Dan will now go through informative visualizations and how they can ultimately tell a new story for the business. Video Transcript (Est. time: 9:46 min): This video covers visualizations more focused on informative and reporting functions, namely bar, pie, and line graphs. In our visualization selection flowchart, bar, pie, and line focused visuals are associated with comparative composition and time series visualizations, respectively. Before we begin exploring these informative and reporting visualizations, recall that different audiences require different types of information communication, so our first thought with visualizations should be to consider if our task is exploratory, informative, or repeated reporting. In your case study, they mentioned the data analytics process. Determining the correct visualization begins the same way. We need to first understand the business question before we can determine the correct visualization. However, you may find there is no clear business question or you are the only one who is supposed to determine the business question, but for now let,s start with the assumption that we have discovered a clear objective and the question we need answered is well defined. Clear, well-defined question with quantifiable answers means that we should approach our visualization from an informative or a reporting perspective. And most well-defined business questions involve comparing the performance of one group to another. So let’s look very quickly at the theory of why we use the visualizations we do for comparative visualization. We talked about Cleveland before. In a further analysis using the concepts that Cleveland uncovered, researchers found clear differences in people’s ability to interpret comparative visualizations. The diagram on your screen shows distributions of errors based on the perceived differences in comparative visuals. Looking at distance from a common baseline. That's the first grouping of perceived errors, the lowest amount. Next we saw difference in length or radius such as we would see with a, with a pie chart. And the final one with the most error were visualizations that had a difference in shape without a common baseline. So these are things like bubble plots or tree, tree graphs. And this reinforces the concept that we should use bar charts for comparative visuals whenever possible. And this is to reduce errors. Whereas bubble and tree map visuals may look cool, but they’re not ideal for business critical reports. It helps us to find some rules of thumb for our visualizations. So now we can finally move into some examples of visualizations, such as if our business question is what percent each region contributed to total sales. Now, because there are Page | 52 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script only three regions, we could look for a pie or bar chart. We could pick either one, but if the question was how much did each country contribute because there are more than four, only a bar chart is really appropriate here unless we do some type of additional grouping. Bar charts and pie charts are used when comparing single or cumulative values between groups. As a general rule, use a bar chart over a pie chart, particularly when you’re talking about actual values. We probably want to know the absolute value or actual value of a given country in these examples rather than the percentage of total. If we were comparing one country to another, then potentially the pie chart would be useful, but we want to know the actual values so a bar chart makes more sense. So some rules of thumb, you always want to use a bar chart. If you have more than five groups, if you have more than 10 groups, consider combining some of the groups or creating a detailed visualization that we’ll talk about in dashboarding. So now let’s look at when we are focused not just on comparison but changing from one group to another. If the business question is “How much did each region/country of their revenue or contribution margin change from year to year?”, it is possible to use a stacked bar chart or a hundred percent stacked bar if it’s percent contribution. However, given the time series nature of the business question and just the gross quantity of them, it becomes very difficult. It’s been shown in all the research that we’ve listed that lines connecting points over time are more effective at communicating change. Therefore, a visualization with connecting lines is preferred. So you can also say and suggest that of an area chart. I’ve heard some people say this as well. Area charts don’t work very well with negative values. So you’d see a much better representation of a change over time with the line. If the business question is comparing group contribution to a whole over time, because this is a compound question, that is, what is the total amount contributed by group and how much did that amount change over time? My preference here is to use two separate visuals, a bar chart to show total amount and align charge. This showing percent change of total. In this situation it’s okay to use a stacked bar but more as a visual cue to our audience. We need to consider the groups making up our total and this is so they have a better frame of reference for the associated percent of total visualization. Also notice I placed the first visual above the second. In the case, the second visual is difficult to interpret without context from the first when visual interpretation depends on another visual. While some cultures may read left to right and others read right to left, most if not all read top to bottom. So put your simplest, highestlevel visual on top and the details on the bottom. Page | 53 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Moving onto more reporting-scoped business questions. What if you need to know how much of this year’s variable cost budget has been used year to date? More generally, how is a given KPI performing against its target value? Now, for the visualization alone, some would only want to report the value using something complicated like a gauge. People as soon as it gets into a KPI, they want to stick a gauge on there. However, in my opinion, the next question when we’re talking about any KPI with a performance target is almost always, well what was it doing last period? What about last year? So why not just start with a time series graph and insert a line for the target budget. This way you know both how it’s performing now and if it’s above or below that target. Whatever data manipulation you have to do to insert a line will be exactly the same as with a gauge or other visualization, plus you’re already answering questions about historical performance. Now as a final piece of advice when talking about informative vs. reporting visualizations, when the business question is established and requires repeated refreshing-a refresh just means updating the data behind the visualization-this is almost certainly a report rather than an informative analysis. Look for keywords like KPI, SLA, metric, defined business rules. When these are spoken about the needs of the visualization, your organization may have a dedicated reporting team. This reporting team may specialize in maintaining such reports. If you are not part of that group, consider reaching out to them to maintain the informative analysis that you now want to turn into a report. Personally, I would leave that up to the professionals. All right. We covered a lot in a brief period of time. Some key takeaways from this video. Bar graphs work well when comparing values between a few groups on the same scale. Pie charts may be okay when showing percent of total for a few groups; however, a bar chart can show the same information answered both as a comparative and composition visual. Use line charts for change over time. Don’t be afraid to combine visualizations if you need to tell a more complicated story. And finally, your organization may have a data and reporting specialist to help standardize your information into a more consistent report. Our next video, we’ll move into more complicated exploratory analysis and the associated relationship and distribution visualization. Video 7: Data Visualization – Exploratory Visuals As a next step, we’ll now hear from Dan as he discusses best practices and tips for using exploratory visuals, relating back to the business question, so one can learn about the data and find new stories to tell. Page | 54 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Video Transcript (Est. time: 13:04 min): Now let’s move on to exploratory visualizations. In this section, we will look at more advanced visualizations such as scatterplots, tree maps, bubble charts, principal coordinates, plots, histogram, and box whisker plots, and let’s start with some financial and sales metrics. What if we want to know what is the relationship of material costs to after-tax profits? Well, as a starting point, even the most advanced analyst will use scatterplots to start visualizing patterns. Here you can see the material cost vs. the after-tax profits with a point for each VIN number, so that’s what did it cost to make the car in terms of raw materials and how much were we able to sell that car for. This demonstrates how scatterplots show us the intersection of two related series of values. Think about two values in the same row, but different columns. In Excel, you have a row of data, you have your column, and then you have another column. That’s the same way you create a scatterplot. In a pivot chart, we may be able to observe a general upward or downward trend in data points between the two factors, or we may be able to observe groups in the data set. In the example on your screen, you can see Bose, you can see the relationship between material costs and after-tax profits, as you expect. Generally, we see profits decrease as material costs increase, that big downward slice in the middle. However, there are two major exceptions. There is a group in the upper right with really high material cost, but also high after-tax profits. And then you see a group in the lower left with negative profits but also unusually, extremely low material costs. So how do we examine this? Well, within the scatterplot we can start adding colors or other shapes to start trying to pick out patterns. We can look at a model and see that the CHARE model, C- H-A-R-E, constitutes the entirety of that upper-right-hand cluster. So we can imagine that this is the luxury brand with high margins and high material costs. Unfortunately, the lower left is a little more ambiguous. That looks like there's a bunch of other models within that grouping. So you see a scatterplot may not always tell enough of a story. In fact, it rarely does. So let’s look at some other factors that might be influencing these unusual patterns in groupings of a cost vs. profit. We saw there were business questions related to channel, so let’s explore viewing multiple channels against models against their profits. Let’s explore viewing groups in tree maps. These are similar to pie charts but with squares instead of slices. So I guess it’s more of like a brownie or a lasagna chart that I don’t know. If you look at seeing after-tax earnings by channel one, two, three, and model, that’s a lot of information. There are some groups here and there that may stand out. Um, but it’s not that descriptive. Similar to a bubble chart, which a bubble is sized by the value. And then Page | 55 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script colors delineate the different channels or models. So for each slice you get a bubble. And the value is the size of that. I’ll be honest with you. These charts aren’t very useful by themselves. They’re good for exploring things in a dashboard because you can mark them. However, by themselves they don’t typically tell much of a story. It’s often more effective to use a bar chart for this type of stand-alone visualization. But when exploring in a dashboard, they can be rather effective. More on that later. So as strange as it sounds coming from me for even an exploratory visualization, but also one that can be very effectively moved from exploratory to informative, I feel a table with a heat map is a better visualization for this scenario, and often in many scenarios, the example on your screen, you just see a simple, well, basically, a pivot table. It’s your sales channel by model, by the average after-tax profits. This represents the same thing you saw on all those other diagrams. Just it’s more condensed. You can still see the same information based on the color, but you also see a value. You see a raw number. This is tangible and in fact, speaking of exploratory visualizations and an example of how you would combine these so that you can explore but also inform, you can create a simple dashboard of a drill down, and we’ll describe what a drill down means in the dashboarding section, where you can look at your scatterplot and see how the different clusters or values within that scatterplot relate to another visualization. Like in this case, the table diagram that we created enough about dashboarding and groupings in scatterplots. Let’s move on to say a business question about does an increase in sales for one sales channel take away sales in another? So does an increase in sales for one sales channel take away sales in another. This is a common question in the real world, and it’s an extremely complex question. This question is probably better served by machine learning and some complex modeling, but for the sake of visualization, we only want to compare the percentage of sales in one channel vs. these percentage of sales in another and how that changes over the course of months. We want to look at it month to month in this example because we are exploring the relationship between two groups with multiple occurrences. In this case, the differential of percentage sales in one channel vs. the another. For a given month, we can use a special variation of a line chart, a principle components plot, so the specific criteria, it has to be the same scale. Two groups, some relation, and this tells us how one moves vs. the other. In fact, there’s some additional research on this showing that a principle components plot is better interpreted in highly ambiguous scenarios. Where you’re trying to show there is not much of the relationship vs. places where there’s a strong relationship. A scatterplot is well interpreted on the fringes. A principle components plot helps people not make false assumptions of relationships. Scatterplots are very easy to do Page | 56 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script that. If you draw a line through it, you kind of draw a picture in your head. It’s very hard to make a fake assumption or false assumption of relationship when there isn’t one with a principle components plot. Off my soapbox for principal components plots and now onto my soapbox for measures of central tendency. Now, in many cases I’ve seen people use bar charts or even pie charts to communicate average value between groups. But an average isn’t just a single value. It isn’t a value that, that means a single thing. So like some total sales, I can sum up my sales. There’s no misinterpretation about what that value means. It’s a bunch of numbers put together as a total average or arithmetic mean is a measure of central tendency. It’s describing a group of values average only suggest the central value, but it doesn’t tell the whole story. Average can be skewed, meaning the same average can represent vastly different underlying values. Look at a normal distribution vs. a skewed distribution. I’m going to show you my process of looking at a distribution for a few different factors as they relate to contribution margin. And when we look at the frequency of our contribution margin, we see a lot of hills. It looks like a mountain range. We see when we see a multipeaked distribution like this, we refer to it as multimodal. Multimodal means that there are some values that occur very frequently. Multimodal suggests there are multiple categories within this trust distribution that there are groups, so we may see effects similar to what we saw in that scatterplot. We may see clustering groupings of things that vary by country, region, sales channel, etcetera. Our distribution may not be comprised of a single group of data or information. There may be multiple blobs of, of information, of groups that we need to tease out. You can see that some of these patterns emerge when we overlay different categories in contribution margin distribution, but first I need to take a step back and talk about what is a distribution. What do I mean by distribution? You can see a histogram, but what does that really talk about? Let’s, let’s simplify the scenario and let’s say we have two car models, A and B. Let’s say for model A, we have contribution margin of 100, 95, 90, 87, 85 or whatever. It’s on your screen as model B. We have similar contribution margins and this vastly simplified example, we can see the average for model A is around 83. The average from model B is around 73. However, if we add some additional values and model A, say something that’s well below the, the existing average, we’re in model B, we pick just two values that are similar to the average. Our new average is completely different. If we just look at the average for model A vs. model B in the scenario, we would think that model B had a far higher contribution model than model A. Whereas it’s only these outlying values that are driving things, driving Page | 57 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script things down for model A, which is one of the primary reasons why histograms and box whisker plots are so important. What we’re looking at when we see a histogram are not the numbers themselves. It’s a frequency distribution of frequency of how often those numbers occur within given groups. We call those groups bins. In this example, let’s say we have bins of 190, 80, and so on and we can look at the frequency count in model A and model B, we can see that there is an outlier. Two outliers in this case, way down at the bottom for those values of 10 and five histograms are so important for an exploratory analysis because we see these patterns. We can see when there’s outliers, we can see when there’s different patterns in data. And the example I gave it with some outlying value that could have been from, I [don’t know], a different country, it could have been from some new sales initiative. Uh, it could have been from who knows where, but there’s definitely something going on that wasn’t represented with just the data that we saw. And finally, when you see yourself clicking through a bunch of different histograms or a bunch of different scatterplots or a bunch of different stuff and just exploring, exploring, there are other visualizations that you can use, visualization specifically designed to tease out these patterns. One of the more powerful visualizations for this. In fact, an analytics technique is a decision or regression tree. However, that is a little beyond the scope of this course. Exploratory visuals can be a lot of fun. You know, I love them personally. They help you see your data in a brand-new way. We learned a lot, talked about histograms, talked about, uh, scatterplots, understanding how these exploratory visualizations are about looking at the data itself to find new patterns, and we even did a little dashboarding. And in our next session, we’ll cover the underlying dashboarding concepts to help you become the best you can at data exploration and informed visualizations. Lesson 2: Huskie Motors Operations Section 1: Huskie Motor Corporation: Operations HMC is currently selling in 15 countries in three regions: North America, South America, and Europe. Table 1 provides a breakdown of the countries within each region. Huskie Motor Corporation: Operations, Continued Page | 58 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Automobile manufacturing and sales is a complex and highly competitive business. Though the automotive industry has a broad global reach, only 15 countries produce 88% of the world’s vehicles. HMC Models HMC is a fairly new and small player in the automotive manufacturing market. HMC currently offers three brands: Apechete, Jackson, and Tatra. Each brand has several models as detailed in Table 2. The models available fall within seven segments of vehicle types: compact, sub-compact, full-size, mid-size, luxury, minivan, and sports utility. HMC's Manufacturing Options HMC offers several series for each model for a total of 34 different series. A breakdown of the available series offered by model is provided. See Table 3. Each model is available in various body styles, engines, drive configurations, transmissions, trim, color, and seat types. Since various engine and transmission builds, see Table 4, come from one division and finishing is done in another division, these options are described in different tables. Click each icon to learn more. Packages and Options Like many automotive manufacturers, HMC offers a variety of packages and options that buyers can add to their vehicles. Packages include a specific set of bundled options, or buyers can choose individual options separately. Table 6 provides a list of the six packages along with the detail of products and services contained in each package. Table 7 provides information regarding options that may be purchased ala carte. At least one option, however, is contained in each available package. Click each icon to learn more. HMC Data Movement Recall the data life cycle from the previous module and consider how Huskie Motor Corporation’s data ages. To again explain how HMC gets and uses all their data, let’s take a look at a short video to further illustrate how data is created and stored in business. Video 7: Movement of Data Page | 59 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script To conclude the topic of data movement, we will now move to a presentation from Dan Smith where he further illustrates how data is created and stored in business. Video Transcript (Est. time: 3:00 min): First, we’re going to look at data creation and storage. And I’ll start way back in 1967 when Dr. Melvin Conway identified that any system designed by an organization will match the communication structures of that organization. What that means in plain English is that a business will create processes and solutions based around how they are structuredmarketing, supply chain, accounting. They’ll all produce their own applications and typically their own data as well. And who were the first people to start storing and recording data all the way back in Sumeria, the accountants. In fact, I would argue that our current data architecture is largely founded on accounting practices. Accountants use transactional data created by customers and business operations produce reports that are then used to inform stakeholder decisions. So what does this look like? Well, there’s three primary domains, business data, the creation, storage of data, and the business decisions. Now, how is it, where does this data live? Well the customer activity generates transactional or raw data, that data is stored in a transactional database, which is then and then shaped into a reporting data set. The reporting data is used to make business decisions, and the resulting activity from the business decisions create more transactional data. Now that we’ve seen how that data is created, let’s look at how it gets from one place to another. You can see the data movement cycle over here on the right of your slide. When that data is first created, the information messages are typically transferred through a bunch of layers in applications, so load balancers, temporary cashes, firewalls, etcetera. The process of getting activity data into your data system is typically called integration and it moves data into a transactional data store. Now, once it’s in its raw form or sometimes as part of integration, the data is shaped or cleaned into a form usable by people in analytic applications. This is generally called ETL, extract transform load, and the data output is stored in the reporting database, data warehouse, or data lake. The transformation of data into business-usable information is called analytics. We will talk about a lot about this in the next module. Although Guy said, simply put, analytics is what facilitates data informed decision making. Page | 60 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script And finally, business decisions attempt to influence customer activity. Making this one little arrow right here, influencers in the upper-left-hand corner of the business data creation and movement cycle. The entire reason that business exists and the entire reason that the data analytics cycle exists to help the business make more informed business decisions. So let’s look at this process in action. Lesson 3: The Data Dilemma Section 1: Data Dilemma Prior to calling Miranda at HMC, Kevin meets with Jan to go over some basics regarding the project. Kevin uses a five-step model as a data analytics framework. As a first step, he explains this model to Jan, because he believes it will help her understand his process. Data Analytics Process Kevin: “So, Jan, this should be an interesting client and project. I know you are a bit nervous about your lack of experience with data analytics design, but I have been studying this stuff for a long time, both from an applied perspective as well as a theoretical perspective. There is a great book on this subject, Data Analytics, by Warren Stippich and Bradley Preber. In this book, the authors describe a five-step approach to the data analytics process: Step 1: Define the question Step 2: Obtain the data, Step 3: Clean and normalize the data Step 4: Analyze the data and understand the results, and Step 5: Communicate the results. I think that understanding and following these steps will help you tremendously as you find your footing on this engagement. Our client, HMC, would like to have the project finished within a fourto six-week time frame. How do you feel about that?” Page | 61 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script The Data Dilemma Jan: “That seems like a pretty short window, Kevin. Do you think we can finish within that time frame?” Kevin: “Well, I do not think we can set a realistic time frame without knowing exactly what our challenges are and what the data looks like. Remember, we have to be able to trust our numbers. We do not know yet whether the data is clean. We will have to do some validity testing first. Remember, follow the steps, think it through, and keep a calm head.” The Data Dilemma, Continued Jan: “What exactly is our goal for HMC?” Kevin: “Basically, HMC captures very detailed transaction-specific data. For example, HMC has extensive cost data, marketing information, and plan data for each car sold. The amount of data it has is overwhelming for both the HMC management team and the executive board. Our job is to help Miranda and her team figure out how to use the data to better understand costs and profitability by vehicle model. They also need detailed, relevant feedback regarding sales volume and sales location for planning purposes. Miranda would like to be able to predict sales at least three quarters out so that the management team can better plan production schedules.” The Data Dilemma, Continued Kevin: “An equally important goal is to help them understand the benefits of data visualization and give them some ideas about how to present the data to their executive board. As you know, Jan, data visualization software, such as Tableau and Qlik, allows us to turn large volumes of raw data from various sources into easily comprehensible graphical representations of information. Data can be accessed live or extracted from some other source. Data can also be presented in summary fashion while preserving the underlying detail, which can be instantaneously viewed as desired. This type of technology could be instrumental in helping HMC improve and maintain its competitive advantage.” The Data Dilemma, Continued Jan: “How does the management team store the data, and how will we be able to integrate it with the visualization software?” Page | 62 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Kevin: “I believe that the plan and forecast data are on Excel spreadsheets. The actual sales data will likely come from its enterprise resource planning (ERP) system. We will need to confirm that at our initial client meeting. We should be able to get transaction-specific information at the vehicle identification number (VIN) level, since the VIN is a unique identifier. I am not sure how much information is being collected at the individual vehicle level, but that could be a rich data source for us if we can get it.” The Data Dilemma, Continued Jan: “Is the data they have ready to be used, or will we need to verify and organize it?” Kevin: “We will see just how clean it is once we start pulling it in. We will initially build the analysis using a representative sample of HMC’s data. The sample represents about 25% of the total company data. The four- to six-week time frame should be feasible if we receive clean data. Our contact at HMC, Miranda, is assigning two accountants from the controller’s department to pull the sample and help clean the data. Once the client approves of the proposed analytics and the dashboard, we can roll it out using all the data.” Data Sampling and Analysis Jan: “Will using a sample be enough to convince the client of the value of the analysis?” Kevin: “Absolutely. The sample allows us to determine which type of analyses we can provide, by looking not only at the data, but at the format of the data as well. We can use the sample data to create demonstration dashboards for the client. The analyses we perform with the sample can then be recreated for the full data set.” The Data Dilemma, Continued Jan: “I am not sure I understand what you mean by a ‘dashboard.’” Kevin: “Basically, a dashboard is a screen that consolidates visualizations, graphs, charts, and so on to concisely display the metrics and key performance indicators for a business. Summarized data can come from a variety of sources and can even be presented in real time.” Page | 63 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Jan: “I cannot wait to get started! This is going to be a tremendous learning opportunity for me.” Video 8: Data Visualization – Dashboarding As Kevin helped Jan understand the purpose of a dashboard, we will again turn to Dan Smith as he further defines dashboarding and introduces concepts and best practices when using dashboards. Video Transcript (Est. time: 18:22 min): With the popularity of data-connected visualization tools like Tableau and Power BI, data visualization is becoming almost synonymous with data dashboarding. Well, some, myself included, would dispute that association. It’s still very important that we include dashboarding when learning about data visualization. In this video, we will define dashboarding, explore dashboarding concepts, and recommend some dashboarding best practices. But first, what is dashboarding? When someone thinks of a dashboard, they usually think of a car or other device full of gauges, numbers, meters, things that tell us the immediate state and relevant information required to understand how something is doing related to data and information. Dashboarding can be defined as an information management tool used to organize and display information typically with an interactive element and connected to an underlying data source for rapid updating. And let’s, let’s think about why in your case study a dashboard seemed like such an appealing prospect for Huskie Motors and consider also if it’s the right choice for the overarching problem statement and the requirements of our stakeholders. In our case study we’re asked to determine how data will be represented for several different stakeholders. Let’s explore how we could use a dashboard to display this information, the dashboarding concepts in each, and if we should use a dashboard to display this information. Our first task is related to the dashboard concept of displaying KPI and metrics. We’re tasked with displaying the overall performance analytics, a question such as how Huskie Motor Corp. is performing globally, how are the various brands performing in various channels, and what are the least profitable models. Here we can apply a core concept of dashboarding displaying KPIs and metrics. I mentioned before that lines are very effective for displaying KPIs. Presumably there are a set of KPIs, key performance indicators, that they can use to define performance. I would assume these were defined by somebody, and we can include these in a dashboard that was connected to a data source that had Page | 64 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script these KPIs defined. And let’s think of our audience for these high-level performance metrics. Anyone who, who is interested at metrics at this level are likely going to be executives, senior vice presidents, maybe senior directors, depending on the organizational structure and title responsibility. In other words, metrics at this level are foreign audience unlikely to have the time or inclination to go digging around their own data? They have trusted advisors to give them guidance on the pertinent pieces of information. In other words, these are informative visualizations, probably more reporting than anything, so a dashboard may be appealing and a dashboard and the purest definition of data connected and immediate results, but it’s not going to be an interactive dashboard or if it is interactive, it’ll be for someone reporting results with the dashboard, not for exploration. By our reporting results, I mean sitting in a presentation and clicking on things in a predefined script, so the dashboard will probably be for a presentation, not something that somebody is going to be regularly accessing online. Our next concept, filtering and drilling down the next series of questions. In your case study, the financial analytics here, we’re looking at prices per model and how they changed over time or which model has the most variability in variable cost, etcetera. While these are well-defined questions, there is a degree of slicing and dicing exploration available. If someone wants to look at different packages or regions or country or models, sometimes there will be complicated questions which may be hard to represent in a single visualization. Therefore, the financial analytics questions have more opportunity for exploration. So a more interactive dashboard is probably useful in this scenario. And by the way, by slicing and dicing, I mean a dashboard where a user may wish to view the data in a different way than represented by a static dashboard or table. For example, they may wish to highlight an area of one visualization to learn more about the data point contained in that visual. That would be a drilling-down operation that’s often used in scatterplots and other complicated visuals like a map and tree charts to give more information about a given data point or points. That would be like a dicing of data. And filters are often used to change the dashboard. In contrast to the drill down, filters typically exclude all the data for portions of the dashboard. They exclude the underlying data. A filter is exactly the same as a filter in a pivot chart or pivot table. And a pivot table is also an excellent example of slicing and dicing data. You just pull in different elements and you see data in a different aggregation or context than you would, and it’s an original form. And fundamentally, if there is a demand for data exploration, just a bit of caution, you will never be able to satisfy all the data exploration to band with simple visualizations, Page | 65 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script you have to include some interactivity. So a dashboard makes a lot of sense for the questions from the finance team as a, for the operational analytics questions, those are much more complex. They’re asking what are the top values overall in multiple groups. Now, asking what are the top sellers over is easy. You can just sort it. But when you start comparing the top sellers from one group to a top seller in another group, your analysis becomes much more difficult because you’re not just ordering a column in a table. You have to select the top individuals or regions or models or whatever it is from a group and adding to that complexity, the calculation for the number of model options. How many days are the various models on the lot sale? Uh, you have the potential for a very complicated dashboard. The questions ask for by the operational analytics folks. Those will likely require custom calculations and/or data transformations. Uh, we’ll talk about those concepts shortly. But this is something when people start asking those types of questions that you would probably want to call in a data expert, to help you create that type of dashboard and those visualizations forecasting our final set of options where we look at the forecast. I know I'm giving a lot of warnings here. Primarily because this is an introductory to visualizations course. So I put out cautions when people are, what they think are engagement, where they’re building a visualization, but it’s actually a mathematics data problem, when we start talking about forecasting, this is another one of those yellow flags. So as a caveat, I would not, as a warning, I would not suggest a business, particularly one the size of Huskie Motor Corporation make a significant business decision based solely on the forecast that is built into a utility of a visualization tool. Not to get into the details, but there’s a lot of additional information that can be included in that type of mathematical model. That being said, the built-in forecasting utilities can still be informative for identifying patterns over time in your data and creating a degree of confidence in those types of pattern predictions. In a forecast, we are able to show confidence intervals above and below the predicted values. This is the case of a visualization so we can visualize our confidence intervals and that’s similar to a box whisker plot where you have the predicted values 80% of the time vs. the actual 80% of the values. The high/low markers are equivalent to the whiskers and a box whisker plot if you will. Moving on from forecast, let’s talk about connecting to data. Well, you always hear the term data-connected dashboard. Let’s expand the dashboard conversation to introduce what a dashboard is doing, a dashboard is doing under the hood. The first concept is connecting to data. A dashboard works best when the dashboard can update as its Page | 66 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script underlying data set. Also updates, we’ll cover this more in our data platform video, but for now understand the visualization. A tool does not and in most cases should not change the underlying data. So here we can connect to an online Google sheet. Here we can connect to an O-data online data source or a JSON data source. So these are web data sources. They’re separate. Contrast that to something like Excel where Excel houses the data. In many cases, especially the cases we’re accustomed to working with, Excel is the data you can type in and modify the data. In many cases you have to modify the data to create a visualization. So you end up with a lot of disconnected data tables, pivot tables, and pivot charts are more like other visualization tools as long as you don’t touch and edit the underlying data. In fact, I would say if you can use a pivot table effectively, you can probably use most of the other visualization tools with minimal training, aggregates, and values. This concept is probably the most important concept when building a dashboard. We’re at least using a visualization-focused dashboarding tool and that is the concept of categorical values and numerical values. Tableau calls them dimensions and measures, Power BI columns, measures or aggregates, and measures in a pivot table or called rows or columns and vs. values whenever you want to call them. Your categories are what determines the number of slices in a pie chart, whereas measures determines the size of each slice category determines the number of bars. Measure is the height of the bar. Again, categories or factors. These are the same concepts as when you use an Excel table column to make separate rows in a pivot table measures the same concept as values in a pivot table or some mathematical operation. Typically, sum is applied to the values corresponding to the pivot table. Row values in the data video, I’ll talk about more about this concept. Even including the SQL structured query language aggregation code that’s being created by the visualization platform. That sounds a little scary. I’ll walk you through it. It’s not going to be as bad as it sounds. Calculated values, all dashboarding tools let you create custom calculations for aggregating your values. These can get pretty complicated. In this example, I’m relatively simple when I’m creating a new column to create an address and I’m merging a bunch of other columns concatenating them as what that’s called to create an address value for geocoding. Yeah, that’s a simple example. And the hand calculated values are one of the hardest aspects of learning a dashboard tool because calculated values, in particular, calculated values like rolling average or cumulative totals, tend to either obscure or be very specific to that platform to get the exact calculation I use to create the value you see on the screen and honestly calculated values, especially the nuances of them. They aren’t Page | 67 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script something you need to understand for this class. But as you learn dashboarding and because it’s an important element of visualization, keep in mind that the measures you see visualized in the dashboarding tool are calculated not only using the underlying data but they will also change based on the visualization, visualization used and the categories used in the visualization. Yeah, I know, I’m sure that didn’t make any sense. It takes a whole class so you don’t need to worry too much about this concept. I’m telling it to you so you’re aware of it. This is a potential stumbling block, and it may be something that you struggle with cause I struggled with it when I started. So as you’re learning and, and honestly in general, it’s a good idea to have a, your data team how, validate your calculations or have predefined calculations, even have a data set. If you have a particularly complex calculation you want to put into a visualization tool, have some data, people will create a new data set for the analysis instead of relying on the built-in dashboard capabilities. Alright. Moving on. Dashboarding best practices. The first best practice is to know you do not always need a dashboard. Yeah. Think about it. How many of you had a boss that would not look at a presentation unless it was shown to them at a meeting? You could write an email saying, next week's winning lottery numbers. We’re in the attachment on that email and they still wouldn’t open it. It’s reasonable to assume if they will not click an attachment to open a presentation, they are not going to click a link in your email, log into our reporting server, assume they remember the username, and then navigate to the dashboard themselves in a dashboard. You want to keep, infer it. Keep your informative visuals at two or three major points at most, so you’ll be tempted to cram everything on one dashboard. Look at that information density. Oh, you’ll say, and no one will know what you mean. Or maybe you will. Somebody will be confused, and you just go up to them and you say, Oh, you just need to, you know, click this. Then this, then this, so highlight that. Pick this filter and voilà. Here’s your answer. No, and informative dashboard allows exploration, but it should not require it. The line chart dashboards are great examples. They may seem simple, and they answer a question very simply and succinctly, but there’s little room for misinterpretation and you can still drill down and filter. If you want to compare that to say like the geographic dashboard that we saw earlier. It looks cool, don’t get me wrong, but it doesn’t say much without digging around in it and it doesn’t even follow some best practices about high-level visualizations basically, form over function. We should be doing function over form, my next best practice. Make your own darn dashboard. I’ve seen people spend weeks working with consultants Page | 68 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script to get a dashboard, and they’re still not happy. I’ve seen that time and time again, and it’s because there’s always another question. That’s the whole point of exploration. You want to find new questions. So if you find yourself working hours and hours with a consultant or contractor or someone to build the dashboard, consider it spending that time watching free training videos, all the platforms provide them, that way you can build the dashboard you want. And if there are still complex data operations, like joining tables, rolling time series aggregations, work with a data expert or reporting expert to create a new data set or fix that one calculation rather than trying to read your mind or predict the future about what questions you’re gonna ask because you don’t even know what they are yet, it’s really not that hard. I promise they’re there. They’re easy to make. [inaudible] That’s what they’re designed for. Next best practice. Keep data operations to a minimum. Just because you can do complex calculations doesn’t mean you should. Dashboards and tools like Tableau, Power BI, even Excel, can be unclear how a value is calculated. When I was talking about that before in the calculated values. Then when you throw in different terminology, what they’ll call it, partition window table calculations, those are calculations that are aggregated over several categories or dimensions within the visualization. It gets confusing and isn’t transparent. So again, leave data to the experts. A data team-approved data set with verified calculations that’ll ensure proper governance. It will ensure procedures are followed. Give you the peace of mind that you’re not making a small and opaque calculation error that could lead to $1 million incorrect decision. In conclusion, in this video, we use dashboarding as an information management tool used to organize and display information. Typically, a dashboard has an interactive element, and it’s connected to an underlying data source for rapid updating. In our next video, we’ll give you a little more visibility into the underlying data environment so you’re able to communicate more effectively with your data stakeholders and you are better able to understand what’s happening under the hood of popular visualization tools. 2.64 Section 2: The Data Dilemma, Continued After reviewing the project proposal and meeting with Jan, Kevin calls Miranda to introduce himself and confirm their initial meeting the following week. Page | 69 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Kevin: “Hi, Miranda. This is Kevin from D & A Consulting. I just wanted to touch base with you to set up our initial meeting.” Miranda: “Hi, Kevin. I am really looking forward to your team’s visit. We need help as soon as possible. I have so much data coming in that I barely have time to look at it, much less analyze it. Also, our data is located in various databases, spreadsheets, and our ERP system, making it difficult to integrate and fully utilize. We have data visualization software available on our server, but we have not really tapped into its capabilities just yet. I am told that we can easily pull together large amounts of data from various sources with user-friendly, adaptable output using dashboards. Is that something you can help us with?” The Data Dilemma, Continued Kevin: “Absolutely. What kind of data visualization software do you have?” Miranda: “Tableau. It is supposed to be very user-friendly. I am hoping you can get a dashboard up and running in a couple weeks.” HMC Dashboard Needs Kevin: “Do you know what you want on the dashboard?” Miranda: “I know that we want to be able to see profitability by brand and model, since solid profit margins are crucial if we want to stay in business. We also want to keep a pulse on sales by country and region. Ultimately, we want to do a better job of planning our production schedule, but as you know, this requires up-to-date information on many moving parts.” The Data Dilemma, Continued Kevin: “I cannot say whether a couple of weeks will be enough time until we take a look at the data and map out the specific decisions you hope the dashboard will enable you to make. I will have a better idea after our initial meeting next week.” Page | 70 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Miranda: “Okay, fair enough. We’ll see you on Monday at 1 p.m.” HMC and D&A Meeting The following week, Kevin and Jan meet with Miranda and her team for their initial meeting. Kevin: “So, let’s get down to business. Let’s talk about the dashboard. Help me understand what questions you are trying to answer? Or what story are you trying to tell? Who are the users going to be? Will different users need different dashboards?” The Data Dilemma, Continued Miranda: “I have a list of questions that I would like to be able to answer using our data. There are several areas where I believe we can gain greater insight from looking at our data. Specifically, we need overall performance analytics to tell us how we are performing globally. We need to know which of our models is profitable, where we are selling well, and, perhaps, how sales channels are driving sales volume. It is also important that we are able to see information from a top-level perspective with the ability to drill down into the detail. The profitability information is crucial for our executive team: chief executive officer (CEO), chief financial officer (CFO), chief operations officer (COO), and chief marketing officer (CMO). This team is ultimately responsible for the direction of HMC and profitability for shareholders. “I would like for our management accounting group and financial reporting group, including the CFO, to have a financial analytics dashboard that would give them information about contribution margins, total costs, and sales volume. Both groups will also need to monitor changes in costs and contribution margins for all of the models we offer. These are the most important metrics and, therefore, the ones we need to work on first. We will expand our analysis for the management accounting group to include other efficiency measures as soon as we can get these initial dashboards up and running. “Our CMO and the sales team will need operations analytics to help them understand turnover and demand. They will need to know which models are selling and how long it takes to sell them. They will also need to understand how packages and options impact sales. They will need to have a handle on which of our package/option offerings are popular and which ones are profitable. Page | 71 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script “Finally, our budgeting and production teams will need to utilize forecast analytics to predict sales and margins at least three quarters in advance. “Do you think it is possible to make the dashboards ‘real time’?” HMC Data Kevin: “I think it is possible, but first we need to understand the data that you are currently collecting. What can you tell me about the extent and magnitude of your data?” Miranda: “Well, we track details down to the VIN level in our ERP system. So, we have a lot of data. For each vehicle sold, we know the sales amount, marketing expense, and all variable and fixed costs. We also track a lot of nonfinancial data.” 2.71 The Data Dilemma, Continued Kevin: “What kind of nonfinancial data?” Miranda: “Well, there is vehicle data such as brand, model, model year, series, segment, body style, drive configuration, engine type, and transmission type. We also have the detail for each vehicle sold as far as color, trim, and so on, as well as any package or options purchased.” Jan: “Wow! I can see why the data set is so large!” Miranda: “That is just the vehicle information. We also capture the region and country of sale, the number of days that any given car was on the lot prior to sale, and the type of marketing campaign in place at the time of sale. In addition to that, we track sales channel information.” Page | 72 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script The Data Dilemma, Continued Jan: “Can you give me a little more detail about what you mean by sales channel information?” Miranda: “Sure. We identify sales using three sales channel dimensions. Sales channel 1 identifies whether the sale was made through our dealers, fleet, or retail operations. Sales channel 2 identifies the type of customer account. We have commercial accounts, employee/partner programs, government accounts, nonemployee accounts, and rental accounts. The third sales channel identifies whether the sale was cash, financing, or lease.” Kevin: “Okay, so ideally we will want to use both the financial and nonfinancial data in our analyses.” Miranda: “Exactly! Other potentially informative data sources that we have are social media platforms like Facebook, Twitter, and Instagram. We have just started collecting data from these sources about our vehicles, but we are still trying to figure out the best way to analyze it.” The Data Dilemma, Continued Kevin: “Eventually we can bring that data into the analysis and dashboard as well, but for now let’s stick to the data we already have.” “So, the actual sales data is in your ERP system. Where is the budget data?” Miranda: “We keep track of the budget data in an Excel spreadsheet we call ‘plan data.’” Video 9: Data Visualization – Data Ecosystems In the case, it is evident that HMC data comes from and is stored in a variety of places, making it difficult to organize and analyze it. In Dan’s next video, he will go through the concept of data ecosystems and walk us through the process of aggregating data to see how data is connected Page | 73 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script and transformed so it can ultimately be visually presented. Video Transcript (Est. time: 10:22 min): As you become more familiar with creating data visualizations, you will quickly understand if you haven’t already, that data is just as important as the visualizations themselves. In fact, data’s probably more important than the visualizations. Furthermore, having taught hundreds of people, how do you use visualization tools and even more advanced analytics? I know that understanding how the data operations and data connections are enabled by those tools, that provides a more conceptually understanding. Understanding the data concepts allows the students to use their skills across multiple tools rather than just the one they’re currently learning. So I’m going to let you in on a big secret. Those complicated calculated values and dimensions that I described in the previous video. Those are just SQL structured query language or sequel, they’re just SQL aggregations. They’re so complicated because they have to figure out how to abstract relatively simple code, but it’s still using a visualization to create code. This is a slide that I use when I teach people data concepts. For many of you, it may look too technical because it shows a little code. That’s a common reaction. Please just bear with me. I hope to show you that much of the SQL, much of the structured query language generated by dashboarding and visualization tools, that code is similar in function to an ordinary pivot table. And by extension a pivot chart is just like a dashboarding tool. So the example that you see on your screen is just a table from a database. I’m glad that it is a different data set because it shows you that these concepts extend across any data. So to be consistent, though, with the language in the case study, the two tables here, these are the difference in what they mean when they say normalized. That is, there is a unique row for each individual data value. We see the same types of coverage and education repeated multiple times and our normalized table in this example, if you need context, it doesn’t really matter what the data pieces are when you’re talking about code. But here we’re just talking about the coverage and education for different insurance premiums for folks. And if our task here is to show what the sum of claims for each type of coverage and we do not want to know the sum of claims for coverage where education does not equal graduate, if that’s the business question, we would create an aggregate table with one row for each type of coverage and its associated total claim amount. If this was in Excel, the simplest solution would be a pivot table. We put coverage for each row and move our claim amount to the aggregate value. It even defaults to some of that value filtering the education. We put education into our filters and select all except graduate. Page | 74 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Our SQL statement is exactly the same as the pivot table. It reads like a sentence select coverage, some claim amount from normalized table where education does not equal graduate group by coverage. Looking at the corresponding colored squares between our pivot table and the SQL statement, we can see that selecting a field or column and placing it in your group by statement is the same as putting it in a row. Meaning that we have a value or group or row for each of the associated values within our coverage column. One row for basic, one for extended, and one for premium for some claim amount. The green box in our SQL statement makes the same calculation as our aggregate value. Some claim amount in the pivot table, some sum, the claim amount, and finally where education does not equal graduate in the purple box is the same as the filter in the pivot table. Although you can’t see it seeing the side-by-side comparison like this. I would argue that the SQL statement is actually clearer. It’s more transparent, it’s more understandable in what the data transformations and filtering are actually doing. It's rather than the uncertain filtering and source for the pivot table, we don't, we don’t know what the sources for that pivot table by looking at it, we have to click and sort through all sorts of check boxes for the filter. We may not even know which exact calculation is being performed on the value. So remember when I talked multiple times that abstraction was both a strength and weakness of visualization tools. This is a common example. So you like the hidden filter. In this case, the education filter in the pivot table may lead to mistakes. If someone was unaware of that filtering logic, it’s easy to assume that the calculation, your aggregate table would have all the education in it. Whereas with the SQL statement, you see it right away. And to reinforce the point, just to bring it up again, the pivot table that you see there, if this was connected to a database, if we were using power query, this is exactly what it’s doing. If we had an ODBC connection, open database connection to a database, which we connected a pivot table to, it would create this SQL statement. That’s all these things are doing. They’re abstracting the creation of SQL code. When it’s very simple, there’s not a lot of harm. It’s much faster to use the visualization tools, but when it gets complicated, it’s typically faster and better governed to have the SQL statement. But either way, understanding these basic SQL concepts makes it very easy to understand how to use the different tools. So that’s on the transformation side. Let’s talk about the data connection side, the separation of concerns. Why? Why is it so important that we pass this code to a data source anyway? We’ve used Excel for decades where we just transform the data in the Page | 75 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script spreadsheet. Why all of a sudden is that not okay? Well, there’s a few reasons. I already mentioned the data is often too big to fit in a single spreadsheet, but you know, memory is pretty big now and most data sets that we use still aren’t more than a few gigs. So now why is everyone saying we shouldn’t pull our data into Excel? First it boils down to governance. Accountants are at their core data managers. We respect financial data and as you know in any general ledger, you don’t erase information. You don’t erase a transaction, you reverse a transaction. You want to keep that data unchanged. If you’re creating a financial statement in a spreadsheet and you just backspace that number, you’ve just changed it. You’ve deleted information, you’ve deleted data, the ability to easily audit all the transformations to the data as we created that information and the opportunity to share that, create an information rapidly, securely into a wide audience. That’s why you keep the data separate from the visualization tool and the transformations we need to be able to audit it. We need to have a consistent mechanism of transforming it and we need to leave that data unchanged. With Excel, it’s extremely hard to ensure that data pooled into a spreadsheet remains unchanged. Even with all its cells locked down, it’s still possible to copy that information to a new sheet, make a new report, and then distribute all that information. If all that data remains in the original sheet, there’s no assurance. Somebody who’s not gonna update the original report as well or the other report as well. You have two versions of the truth, so when all the data stays in one database, instead of getting shipped around through email attachments, the analysis, we’ll always update any changes to the database or any updates I should say to the database. We’ll also update any analysis and if there’s an error, we can either update an error in the data or update an error in that singular analysis sheet that we have that everyone else is just connecting to. So see that wasn’t that hard. Right. Okay. Yeah, that’s a little difficult. I know. I know. It’s a lot to take in. Unfortunately. It’s just impossible to talk about data visualization without talking about data. However, it’s my strong belief that a conceptual understanding, just the simple conceptual understanding. If you watch this video a few more times and just really get your head around those basic SQL statements and the concept of keeping data separate from the transformations to that data, you are going to have such an easy time picking up all the different data visualization tools. It’s kind of that light switch that gives you the skill set to, to just understand what these things are trying to accomplish as opposed to just feeling like you’re clicking something and just memorizing a process rather than learning a solution. If you can connect to a data source in Tableau, you can do it in Power BI. If you understand that the dimensions in Tableau are the same as a group Page | 76 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script by statement in SQL, and if you understand that the measures in Tableau are the same as some or select average in SQL, then you’re going to understand that in Power BI as well, you can understand it in any platform. The Data Dilemma, Continued Kevin: “Okay, if you give Jan and me a sample, meaning at least 25% of the most recent data, we will make sure that the data is ‘clean’ by running some data validity tests, and then we can start putting together some analytics. Once we have a better feel for the data, we can give you a better estimate of how long it will take to develop the dashboards.” Miranda: “Great! My team will make sure you get all the data you need, and they will be available to help as well.” Video 10: Data Visualization – Data Wrap‐up We’ll now move to the last video in Dan Smith’s Data Visualization video series where he summarizes key takeaways pertaining to data, data analytics, and data visualizations. Video Transcript (Est. time: 3:12 min): Congratulations if you’re watching this, it means you’ve completed all the videos on data analytics, visualizations. We covered a lot of material over the course of these videos. We started with why visualizations are important for analytics. Following that, we looked at some of the tools of the trade, the leaders of creation software and platform for creating data visualizations. Next, we looked at the different types of audiences and how visual analytics answers the different types of business questions that members of that audience may have. Exploratory visuals for audiences looking to get a better grip, a better understanding of data, and find new ways in which they can use that data to help the business informative visualizations for the higher-level questions or for demonstrable proof of how those visualizations of how those exploratory visualizations can help the business. And finally, reporting visualizations to keep track of ongoing activities and answer the repeated Page | 77 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script business questions which we’ve already demonstrated can help the business. We looked at the low-level concepts of visualizations such as how shapes can be used to represent an aggregate value such as a sum, how align represents changes over time, and how a point represents the intersection of two or more values such as in a Cartesian plot, a scatterplot. We saw how combining those low-level concepts of shapes, lines, and points can create much more complex visualizations like histograms and tree charts, bubble charts, ribbon charts, etcetera. We walk through a flowchart which allowed us to select the correct visualization based on our business question, such as are we looking at a comparison of groups or a composition of the whole? Are we looking at time series or a single point in time or do we have multiple factors which we want to include in our visualization? From there, we took a closer look at the types of visualizations that we use for exploratory, informative in reporting business questions and audiences. Then we looked at some of the best practices on how a dashboard can provide our end user, our audience with data connected visualizations and add an exploratory component to those visualizations. And finally, we explore the data component of data analytics visualizations, looking at some of the data operations performed by data visualization tools and the data environment in which those tools live. I hope you all enjoyed learning these data visualization concepts as much as I have enjoyed teaching them. Again, thank you so much for watching. I'm Daniel Smith. Goodbye. HMC Conclusion You have now completed the case study, Huskie Motor Corporation: Visualizing the Present and Predicting the Future and finished the series of 10 videos with Dan Smith. Module 2 Wrap‐up You have completed Module 2 of this course and should now be able to: Page | 78 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Define data visualization. Describe how data visualization can impact the way data is communicated. Identify various data visualization tools and their different uses. Recognize the importance of choosing the right visualizations based on your audience. Module 3: Applying Data Analytics and Visualization Course Roadmap Click Next to proceed to Module 3, Data Analytics and Visualization Practicum. Introduction Welcome to Module 3, Applying Data Analytics and Visualization. The goals of this module are: To explore the data analytics and visualizations discussed in the Huskie Motors case study, and To apply acquired knowledge of data analytics and data visualization to complete scenariobased exercises. In the previous module, you were taken through a scenario of Huskie Motors Corporation’s dilemma of the abundance of data available to them and the need to use it to make business decisions. In this module, you’ll begin in Lesson 1 with a tutorial of how some of the Huskie Motors analytics can be performed. Through a series of five videos, you will be walked through how the data can be loaded, transformed, and analyzed, and ultimately visualized for effective communication and decision-making. Following each video, there will be a set of knowledge check questions that you’ll be required to answer before moving on. After that, in Lesson 2, you will have the opportunity to apply the knowledge gained by performing in-depth, scenario-based exercises on data analytics and data visualization. Before you begin Lesson 2, you’ll need to access the data files that correspond to Module 3 from the Resources link. You can download them directly onto your computer. Read the question presented in each exercise, use the data files to figure out the best answer, and then come back to the question and select your answer. You are allowed 2 attempts per question. If you still do not get it correct, you will be able to click the ‘Show Answer’ button to view the correct answer. All exercises in this lesson are based on the Huskie Motors case study reviewed in Page | 79 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Module 2. When you are ready, click next to continue. Module Menu Module 3 consists of two lessons, Lesson 1: The Data Analytics of Huskie Motors Corporation, and Lesson 2: Data Analytics & Visualization Practicum Click Lesson 1 to proceed. Lesson 1: The Data Analytics of Huskie Motors Corporation Huskie Motors Data Analytics Dr. Ann Dzuranin, co-author of the Huskie Motor Corporation case study, will present a series of videos showing how to perform some of the data analytics required by Miranda and her team in the case study. Dr Dzuranin will take you step by step, using Microsoft Excel, through a demonstration of how to work with the large amounts of financial and non-financial data that the Huskie Motor Corporation is tasked to analyze. In the video series, you will see how to: Identify, extract, load, and verify the data; Analyze the data to identify the most profitable brands; Calculate profit margins, contribution margins, cost trends over time and more. Huskie Motors Data Analytics: Video 1 Let’s now take a look at a video on identifying the issue and the data, extracting and loading the Page | 80 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script data, and then verifying the data. Video Transcript (Est. time: 16:58 min): Hello, welcome to the Huskie Motors data analytics Video 1, ”Extract, Load, and Verify Data.” Evaluating source data. I am Ann Dzuranin, and I will be discussing the material in this video. In video one, the issue that we have identified is that we need to load Huskie motorcars 2015 sales transaction data and confirm gross sales by region. The data that we’ve identified we need is the 2015 sales transaction data. To extract that data, we contacted the client and asked them to provide us with that data. The client has provided us with a file, and the file is in a .TXT format, so it is in text file format. This is not an unusual way that data will be downloaded and I should say extracted from databases, for example. We need to take that data and import it into our evaluation or analytic software, in this case is going to be importing that text file into Microsoft Excel. Once we have it in Excel, we’re going to have to verify that all the data that the client had told us was in that file is actually there and has actually loaded correctly into our Excel file. We’ll confirm that by comparing what we’ve loaded into our Excel file to check figures that were provided by the client. So let’s get started. First thing that we need to do is take a look at the text file that the client has sent to us. So if you notice here we have what’s called the HMC client data file. When you try to open that, it will open up a notepad and in this format it’s not really very useful for us to work with. So instead we’re going to close that. We’re going to go ahead and open up Excel and open a new workbook. Once I’m in Excel, now I go to the data tab and I go over here where it says, all the way to the left, where it says get data. You click on that get data, and notice you have a lot of choices from which you can bring data into Excel. We know that we have a text file and so we are going to click on ”From Text\CSV” file format. This will take us to our computer where we need to then go to our files and find where we had saved that client data. Go ahead and import that. Now if you have 2000 Excel, 2016, or a newer [version] you will have the option to actually transform the data before you load it into your Excel spreadsheet. This can be a really useful tool to use and I’ll show you on how this works. So if I look at my data right now, I can click on, I can kind of page through it. It only shows me the first 200 rows just so I get a feel for what’s come over. If I click on transform data, what that will do is, essentially Excel will look at each of the data points and identify what type of data that is. So for each column, for example, we would identify the VIN number as a text. ABC stands for a text Page | 81 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script and that makes sense, right? Because it’s not a continuous number. The same thing for brand model. Notice model year is just a number and that makes sense also because we actually have, that’s really a category. It’s not the year when it was sold. Same thing for seat and then series is a category segment, body style, etcetera. And so as we go through these, you know, look and see if everything kind of makes sense, and it does make sense to me, you know, month, day, and year. We don’t need those to be deep fields because we have a sold date field here. Again, as we go through it makes sense that sales volume is a number. It makes sense that the options and things like that are categorical or text fields. And if we want to be sure though that all of our financial information is accurate. So if we haven’t been through this, we haven’t been through this file before. Um, this file is essentially sales for all of 2015 for Huskie Motor Car Company. And it starts with VIN numbers who each row is a specific transaction, right? So it represents a sale of a unique vehicle to an into a, uh, an individual. And all of the information that’s in here, um, essentially describes everything about the vehicle that was sold. If we scroll all the way to the left, we will eventually get to our financial data. And I wanted to point out a couple of things that happen when we click on that transform, bringing over this text file into an Excel. Um, we see that we have, um, currency is for gross sales and marketing, net sales, labor tooling materials with and look at option costs. Again, the option costs. And we can see that Excel has identified this as a text field, which initially doesn’t make a lot of sense. Same thing for packaged cost, but notice what’s happening because the data set when it was downloaded from the client’s database, but if there was a Z, instead of putting zeros, it put dashes. And when it brings it over as a currency field with a dash in it instead of a zero, Excel doesn’t know what to do with that. And so what Excel does is it says, no, this is not currency because clearly we have text fields within here. We can in this transform mode change this data type to be currency. And what will happen is we have the choice to either add another column. So essentially duplicate this as the new column type or I, we can replace it. So I’m going to go ahead and click replace so you can see what happens. So notice now that wherever there had been blanks or dollar signs with dashes, the word error comes up. If I do the same thing for package costs and change that to currency, I will get the same issue and we’ll talk about that in another, in just a minute. I just want to quickly cruise through, the rest of these looks like tariffs, also was identified not as currency. And so we’re going to go ahead and change that data type to currency and say, okay, and everything else looks alright. So now that I’ve done this, I can go up here and say close and load. And what will happen is all of those places where it said error, I’m going to get a notification over here, a query that says, hey look, you’ve got 72 errors. Page | 82 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script What you would do at this point is you would click on that and you would look and see what the errors are. So it just brings over the rows where there are errors. And if you scroll across, you will eventually find where the error is. So remember it was an option, costs and package costs as well as tariffs. If they didn’t add an option, it should be zero. If they didn’t add a special package, it would also be zero. So what you would do at this point is you would actually go talk to the client. You would confirm whether or not these should be zeros or whether they should actually have dollar amounts in them. For our purposes today, we are just going to go ahead and load this anyway and because now what’ll happen is it will just come through and it will just be instead of an error when it loads into my Excel spreadsheet, it will just show up as a blank. Okay. All right. So we have our data in Excel now and now the next thing we had wanted to do, I’ll take you quickly back to the PowerPoint. What we want to do is check to see if our file came over correctly. So what we want to do is confirm the number of transactions are 1,308 that the gross sales total, $38,675,912, that the total number of blue cars sold was 340 that our first sale happened on January 17, 2015, and then our last sale happened on December 20, 2015. We can use pivot tables as a kind of quick and convenient way to do a lot of these verifications. So for example, if I click anywhere in my worksheet and click on pivot table, it being Excel will highlight my entire table range. I’m going to go ahead and put this in a new worksheet since this has a lot in it already. Now when I get to this worksheet, the first thing I wanted to do, remember, was confirmed the total number of transactions. So a really quick way to do that would be to say, okay, I could just count for that values, the number of VIN numbers in that column, because each transaction is a unique VIN number. And so when I do that, I see that I get account of 1,308, which if we glance quickly back is exactly what we want. So it looks like the number of transactions is correct. The next thing I would want to do is confirm sales. So what I can do for that is I can continue to work in this same pivot table if I wanted to, but for an audit trail I’m going to go ahead and do each confirmation on a different page. And I’m going to call this one count of transactions. And then I’ll go ahead back to my first sheet and I’m going to actually rename this so that we know exactly what it is. And this is 2015 sales. And now I’m going to go ahead and insert another pivot table on a new worksheet. And for this one I want to look at total gross sales. So I’m going to find my gross sales and I get my, some of gross sales. I can change the format of that by going into the values, clicking on the down arrow and picking value field settings. And then number format. I’m gonna go ahead and Page | 83 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script change that to currency with zero decimal places. So it’s easier to read. Click OK. So $38,675,192 and if we look back at our check figures, that is accurate. The next thing we can using a pivot table is how many blue cars were sold. So we’re going to go ahead back into our sales transactions. I’m going to go ahead and click anywhere in there. Insert pivot table. Okay. And I am going to find color passed it. We’d go color. So these are the various colors that they offer for their cars. And remember I want it to count how many they I had of each color. Essentially what I really want I accidentally, that went into rows. I’m going to drag it over to values and notice that I get a count of the VIN numbers for all of the cars I could if I wanted to also bring color as a filter and then only choose blue. And you can see that my account of blue cars sold is 340. I can go back to my PowerPoint and confirm that that is accurate. Now to confirm the first day of sale on the last day of sale. Probably the easiest way to do that in the sales table is to make sure that your sold date column is sorted from oldest to newest, meaning that the first one should be the first sale of the year and the last one should be the last sale of the year. So the 12/20/2015 was our last sale 1/17/2015 was our first, and that checks out with what we have over here. So what we’ve done is we have gone through and we have confirmed the total number of transactions by counting the number of VINs that we sold. We confirm the total gross sales of $38,675,192 by creating a pivot table that summed gross sales. We confirmed the number of blue cars sold was 340 by creating a pivot table where we had color as the rows and then we filtered it for only blue. And then finally we went and looked at our data, made sure that our sold column was sorted from oldest to newest, and confirm that the first sale date was the 17th of January and that the last sale was made on the 20th of December. Remember when we first talked about what we were, our issue was today, one of the things we wanted to identify was the total gross sales by region. And so after we verified the data, we needed to provide that, those results. So we verified our data, we feel comfortable that the data that the client gave us is complete, and accurate, and we went ahead and totaled gross sales. I did this by using, again, another pivot table. So if we want to pop quickly back into our Excel spreadsheet that we just created and we want to insert a new pivot table here, I can pick region and another way that you can quickly find what you’re looking for, you can start to type in the name of what you’re looking for. So I know I’m looking for gross sales. I just typed in G-R and took me right to gross sales. And again I click that down arrow, changed my field settings so that my number format is currency with zero decimal places and it’s okay. And then we can see Europe, North America, and Page | 84 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script South America and our total, which agrees with what we have in our visual here. So the key takeaways. First, we had to identify your issue, which we did. We were looking to verify our data and then, um, show the results for sales by region. We identify we were using 2015 sales data. We extracted it via the client and loaded it into our Excel database. They, we were given a text file by the client. We imported that into Excel. We looked at how the transformation options are available in 2016 version of Excel and… and higher. We talked about the issues of you have blank or missing data. We then verify that we had the data. We were supposed to have confirmed our results and provided the summary of sales by region. So thank you for watching Video 1, “Extract, Load, and Verify Data.” Huskie Motors Data Analytics: Video 2 In the next video, we’ll see how to transform and analyze the data in order to evaluate Huskie Motor Corporation’s financial performance by brand. Video Transcript (Est. time: 12:31 min): Hello, welcome to the Huskie Motors data analytics Video 2: “Transform and Analyze Evaluating Brand Performance.” In this video, we are going to identify a business issue facing Huskie Motors. We will identify the data we need to evaluate that issue. We’re going to transform the data by creating a measure that’s going to help us address the issue. We will then analyze the data and then communicate our results. In this video we’re going to be evaluating the HMC financial performance by brand, and the data that we’re going to use for that are the 2015 through 2016 sales transactions. The transformation that we’re going to do is to create a measure that’s going to help us evaluate financial brand performance, and we’re going to use the profit margin ratio calculation as that measure. Once we create that ratio, we’re going to go ahead and analyze the data by preparing a pivot table to summarize profit margin by brand, and that will allow us to communicate the results as to which brand has the highest profit margin ratio for 2015 to 2016. As a review, I’d like to cover what the profit margin ratio calculation is. So we take for a profit margin ratio, we take operating income and we divide that by sales. There are probably other measures that you might be thinking would be also good measures of Page | 85 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script financial brand performance, but the reason we’re choosing profit margin ratio is because that helps us eliminate any type of volume differences between the brands. So the profit margin ratio allows us to break down by brand what their profit margin ratio is regardless of how many cars they sell of each brand. So let’s go ahead and open up the Excel spreadsheet that we’re going to be working with in this video. This is the Huskie Motors sales transactions from 2015 to 2016, and if you’re not already familiar with the data, I’ll give you a brief overview of it. Each row in this data set represents a single sale of a vehicle. And so you can see that the VIN number should be unique in each row of this data. There are approximately 2,674 transactions that we’re going to be working with. The other information we have are typical things that you would think about if you were buying a car: the brand name of the car, the model of that car, the model year, perhaps there’s a series number for that vehicle. Is it a full-size car or a compact? For example, what is the body style? And then we have sold date information. We also have, sales volume will be one for each transaction since you can only sell unique vehicle once, drive configurations, engine, etcetera. If we continue to page through or scroll over, I should say, we will also see that we have a lot of information about options that can be added to it. We have the region in which the car was sold, the country in which it was sold, how many days it was on the lot, etcetera. So if we’ve continued to scroll over, we will eventually get the financial information, and that starts with the column of gross sales. I do want to point out that there is also a column called net sales. And what that column represents is the difference between the gross sale amount and any variable marketing expenses that were incurred. So these variable marketing expenses are based on specials and deals that may be happening at the time that car was sold. So cash rebates, low-interest financing, special lease terms, those are types of variable marketing campaigns being used. And then this number represents what that cost was. We're going to go keep scrolling to the end of our financial data, and you can see I already created a column for profit margin ratio and we’re going to recreate that so that I can walk you through how I did that. It’s fairly straightforward so I’m just going to type in profit margin ratio. So we said that the calculation is operating income divided by sales. So the equivalent in this data set is going to be our net revenue, and we’re going to take net revenue instead of after-tax revenue. The reason we’re going to do that is after-tax revenue is driven by the tax rate that is in the specific countries in which the cars are being sold. And so we don’t want to muddy the waters by also having to account for differences in tax rates. So we’re going to go ahead with net revenue, and we’re going to Page | 86 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script divide that by gross sales. Now the reason we’re using gross sales instead of net sales is because gross sales is a better measure of sales. Since net sales is really just net of a marketing expense. Now if this was a retail company and net sales represented the difference between gross sales and sales returns, well then you would want the net sales, but that’s not the case here. So we go ahead and we take our net revenue divided by our gross sales, and we get our profit margin. Now I would recommend going and changing this so that it is a little bit easier to read so you can change up to percentage. I actually probably would add another decimal place to that. Now we can use the autofill feature by just double-clicking in the corner here and that will autofill all the way down to the end of our data. What our profit margin ratio is. I do want to point out a word of caution on autofill. Autofill will only work if there is a corresponding data point in the column immediately to the left of where you’re filling. If there was a blank, for example, somewhere in this profit margin ratio column then the autofill would stop there. So what's a good way you can tell if you’ve got everything? Well, if you notice down here after I did autofill, it gave me a count. So it gave me a count of how many rows it autofilled that formula to. And we know from looking at the number of rows we had, right in the beginning there that we have 2,674 so it looks like it autofilled correctly. All right, so now we have our profit margin ratio, and what we want to do now is create a pivot table so that we can evaluate which model has the highest profit margin ratio for sales from ’15 to ’16. So what we can do if you click anywhere in your worksheet and go up to insert pivot table, notice that Excel will automatically find the whole parameter of all of your data. So you just have to be clicked within a cell for that to work. We’re going to go ahead and put this in a new worksheet because we already have a lot in this worksheet. So click OK. And now notice that you have your fields here that you can start to drop. And you’ve got it pretty much got your canvas here that you can start to drop in what you want in your pivot table. So we’re interested in profit margin ratio by brand. So if I click on brand over here, you can see that it drops in the Apechete, the Jackson, and the Tatra. We also want profit margin. So if we look down to the bottom, you can see, and these go in the order in which you have your columns in your worksheet. Notice I have two profit margin ratios because remember I had already created one and then we created a second one as a demonstration. It doesn’t matter which one we use. I’ll go ahead and take the second one we just created. So now notice what we get. It defaulted to give me the sum of profit margin. And that doesn’t really make a lot of sense, right? We don’t want to add up all the profit margin ratios. That really doesn’t give us any kind of number that makes any sense for us to evaluate. What we really want to do is look at the average. So if I click on the down arrow and I go to value field settings, you can see here where you get choices of how you want to summarize those, the fields. So in this case we were going to look at the Page | 87 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script average of the profit margin, and I’m just going to click OK. Now if I want to change this so that it’s in a nicer format and it has the percentages, instead of, the way it looks right now, I can go back to value field settings and click number format. And here if I pick percentage, I’m just going to use one decimal place and then I’m going to click OK, click OK again. And now I have a much neater-looking table here. Grand total doesn’t necessarily make a lot of sense, either. So if you want to take out grand total, you can just click on it. And then go to remove grand total. We can also change what we want to call this if we wanted to call it something different as well. So now you can see that we have created a pivot table of the average profit margin, and as we can see, the Apechete is performing the best followed by the Jackson and then by the Tatra. Now before we call it a day, let’s think about what we talked about in the very beginning that there might be other measures that you may have been tempted to use to evaluate performance. Let’s just say for sake of argument that we want to also look at net revenue. So if I click on net revenue and I’m looking now at total net revenue, some of net revenue, and I’m going to go ahead and change the format on that. So that’s a little bit easier to read. So we can see putting in net revenue that the Apechete also has the highest sum of net revenues. So the highest total net revenue followed by the Jackson and then the Tatra. But notice that the Tatra has very close revenue to the Jackson, and yet if we look at which one is more profitable, we can see that it’s the Jackson of those two and that the Tatra actually has a negative profit margin ratio. So if we go back to our slides, essentially what we have is the results here on this slide shows you the results that we just calculated in the pivot table where the Apechete has the highest profit margin followed by the Jackson and then the Tatra. So what are our key takeaways? Well, what we did in this video is we identified the issue of what is the financial performance of the brands. Can we identify which brand is performing best in the years ’15 to ’16, we took the data, which was the sales data from 2015 and ’16. And we transformed that data by creating a measure of profit margin ratio. We then took that profit margin ratio and conducted a pivot table analysis, which helped us to identify that the Apechete was in fact the best-performing brand. So thank you for watching Video 2: “Transform and Analyze. Page | 88 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Huskie Motors Data Analytics: Video 3 Now that you saw how some data analytics are performed using Microsoft Excel, we’ll move on to some exploratory data visualization using Tableau to understand the Huskie Motor Corporation’s variable costs. Video Transcript (Est. time: 22:20 min): Hello, welcome to Huskie Motors Data Analytics Video 3, “Exploratory Data Visualization,” understanding variable costs. I am Ann Dzuranin, and I will be discussing the material in this video. What we’re going to cover in Video 3 is first to identify the issue that we’re going to examine, which in our case for this video is going to be evaluating Huskie Motor cars’s variable cost. We have to identify what our key questions are. So what are we going to explore in regard to Huskie Motor cars’s variable cost. Specifically, we’re going to be examining which models have the highest variable cost, which costs comprise the largest percentage of total variable costs, and how have variable costs changed over time. The data that we’re going to be using is the data we’ve been working with in the other two videos, the 2015 to 2016 sales transaction data for Huskie Motors. We’re going to explore the data in two ways. We’re going to use Excel, and we’re going to use pivot tables and charts to help us investigate the identified questions. And then we’re also going to use a visualization software called Tableau to investigate the identified questions. So let’s go ahead and get started. The first thing we’re going to do is get our Excel file opened up. So this is the sales data for Huskie Motors. It’s the same data we worked with in the last couple of videos. We’re going to be using this data to create three visualizations. We’re going to explore it using both pivot tables and some pivot charts. What we want to do is identify model variable cost. So this is our goal, to create this data, pivot table, and this visualization. So that’s our first step. So let’s go ahead and start a new worksheet, and we will go ahead and do that. So we want to know by model what total variable costs are. To do that, we can go back to our sales data and click anywhere in the data sheet and go to insert pivot table. So Excel will automatically highlight all the area, and create the table that you’re going to be using. We’re going to go ahead and say to put it in a new Page | 89 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script worksheet. So we get to our new worksheet. And now we have our canvas here to work with to develop our pivot table. And from that pivot table we will develop a pivot chart. So we know that we want to look at variable costs by model. So let’s go ahead and bring model down to rows and then let’s bring years sold into columns. We’ve pretty much set up how we want our rows and columns to look. We want each model listed as an individual row. And then we want to look at the cost in 2015 and 2016. So now what I need to do is grab that cost information and bring it over to values. So I’m going to go ahead and scroll down until I come to total variable cost. And I’m going to bring that down to values. It automatically creates a sum, and that’s okay. We’re interested in sum this time, but we do want to make those fields, the number format look a little bit better so we can better read these numbers. So I’m going to go ahead and choose currency, and we don’t need decimal places for this analysis, and I’ll hit OK. And then once I hit OK again, it’ll format my numbers. It looks much easier to understand. So pivot tables are really useful. We’ve summarized all of our total costs by model for each year, but you know, it does get to be a little confusing to now try to look through here and make sense of, okay, which one is highest, which one is lowest, which ones changed the most? So often visualizing the data is going to enable you to get to those answers a lot more quickly. So we can go ahead and work with what we have and we can actually create a pivot chart if we would like to. So notice I’m in pivot table analyze and I have this choice over here for pivot chart. If I click on that, it’s going to open up a box for me here to decide what type of chart that I want. I’m okay with the column chart for now. So we’re going to go ahead and click OK. And now we have our column chart. So as you can see, I’m going to close our field tables over here. As you can see, we now can much more quickly identify which model in 2015 had the highest total variable costs. Just by visually looking. We can see that it’s Jespie and we hover on that point, we can see how much that is in total for that year. And then if we wanted to see which one had the highest for 2016, we can again visually pick that out pretty quickly, and it is the model called Pebble. We can also see visually which ones are going up or down, right? And which ones have stayed the same. So very quickly we can get to the information that we would like to understand better just by taking that pivot table and changing into a pivot chart. That’s the power of using data visualization for exploratory analysis. Page | 90 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script The other question that we wanted to answer was which costs comprise the largest percentage of total variable cost. So for that we want to look at the model variable cost, but we want to look at it broken down instead of by model. We want it broken down by the type of cost. So we know from the data set that our costs, that go into our total variable costs include labor, materials, overhead, freight, warranty, package cost, and option cost. So let’s go ahead and create a pivot table with those and then we can use that to create this pivot chart. So again, I’m going to go back to the sales data set and I’m going to click anywhere within my worksheet and go to insert pivot table. Now you might be thinking, why don’t I just start with a pivot chart? And you could do that. You could just go right to pivot chart, but it often is, it’s good to have the numbers in one place and then you can manipulate the chart from those numbers vs. continually having to change the chart. I’m going to go ahead and click OK to put this on a new worksheet. All right, so remember now we want to look at the breakdown, by year of variable costs of the components of variable costs. So what I would like to do is find each of my components and bring those over. So let’s start with, let’s see, we said that it was labor, materials, option costs, package cost, freight, overhead, and warranty are all in my variable cost. So notice now it’s just putting a sum here. I also wanted it to be by year as well, right? So, what I would want to do then is say, maybe I want these as rows. So I just drag that from columns, the values from column to rows. What that did, this value represents all of the values that I’ve put over here. And then I’m going to take years sold, which I believe is up here somewhere. Right? And bring that over to my columns. So now I have all of my variable costs components and how much they are for each of the two years and how much they comprise of grand total. Now we could go back into each one of our fields here and go ahead and change the format of those. Or we can change the format here. The benefit of doing it within the values is then it stays within that. If you change your pivot chart around, add things or move things, if we just change it here, the way we would do that, if we were, in any spreadsheet on the home sheet, in the number section, obviously that will work but again, if we change or add a column to this, that column would not be formatted at the same way. Okay. So we’ve got all of our components. And again, now how do we want to visualize this to be able to see what comprises most of the variable costs. I think even just from the grand total, we can see that that’s going to be materials which makes sense in the production of cars. But let’s go ahead and click in your table, anywhere in your pivot table, on your top ribbon pick pivot table analyze, which should be highlighted once you’re in that pivot table. And go ahead and pick pivot chart. And again, we have a clustered, column is the default Page | 91 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script that Excel is suggesting for us. We could choose another one. If we would like, it doesn’t really always give us the best choices. So typically what I would do is stick with the clustered bar. And so then you can see what we’ve done is we’ve brought this, I’ll bring this over here and close our fields. So again, you know, we can kind of confirm quickly that materials does make up the majority of our variable cost, and we can see followed really closely by labor and option costs, warranty costs. We can also see some, a little bit of a trend on which ones are going up and which ones are going down. Okay. So now that we’ve, taken a look at this in Excel, why don’t we think about how we might look at this in a different, well actually before we even move from Excel, let’s go ahead and do that comparison. The year-to-year comparison. We just want to see, how things- our third question was how have variable costs changed over time? So there’s a couple of ways you could do that, right? We can look at it in total. And so let’s just go ahead and create a new pivot table that looks at how variable costs have changed from ’15 to ’16, let’s say. So let’s go ahead and again, click anywhere in our data, go to insert pivot table, and now we can see that we can go ahead and hit a new worksheet. So what do we want to know here? Here we wanted to know by years sold, don’t want to sum the years sold. What is our total variable cost? Okay. So, if I wanted to go ahead and change this so that it was formatted in currency, I can go ahead and do that. I’ll do zero decimal places. So now we have basically our total variable costs and we can see what it is each year. Go ahead and click OK, so it changes the format. What if I wanted to create a pivot chart of this so that I could compare year to year. So again, click anywhere in your pivot table and then click on pivot chart and you will get your option here for your pivot chart. And so we can see that, very quickly that total variable costs have gone up from 2015 to 2016. I want to point out a couple of things in the default visualizations for Excel. Notice that we’re not starting at zero. That can always be a very confusing type of visualization to create. So, this would not be considered best practice for visualization. But again, we’re just using this for exploratory. We’re not using it to explain or communicate our results yet. We’re just using this as a way for us to better understand what’s happening. So it’s okay for that reason. So we can see that we’ve got an increase in total variable costs, but what drives variable costs, sales volume, right? So we really need to compare this to sales volume so we can go ahead and create another visualization that gives us sales volumes. So let’s go ahead, back to our data again, click anywhere in the data, click on insert pivot Page | 92 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script table. Now in this case, I don’t want to put it in a new worksheet because I want it in the worksheet I was just working in so I can compare the two. So I’m going to click on existing worksheet location and then click in that location box. And what you need to do there is go back to the sheet that we were on, that had that new visualization and it was this one and click on a cell in there for it to go to and say, okay. All right, so now we’ve got another option here for another pivot table. What I would like to do here again is look at sales volume now by year. I’m going to go ahead and pick my year sold. I want a sum of my years sold. And let’s go ahead and look at sales volume. Now we had a sales volume number, but there’s another way that we can do that. We can actually take the VIN number into values and it’ll automatically do a count for us. Okay, so it’s counting the VIN numbers for us. So now we have the sales volume. I’m going to go ahead and click within that pivot table and create a pivot chart and click OK. So just briefly looking at this, and again, our scales are different though, right? But we can see that at least we know that sales volume went up. In addition, right to our total variable costs. But wouldn’t it be nice if we could see all of this in one visualization. I’m going to show you how to use a visualization software to create an image like that and also to get a better understanding of what your total variable costs might be. So I’m going to go ahead and use a program called Tableau. Now I’m going to open up the Tableau file that I’ve already started for this. So in this Tableau worksheet this looks very similar as you can tell to a pivot table, right? We’ve got areas where we can drag our rows, areas where we can drag our columns to create our visualization. The power of a visualization software is that generally it has many more visualizations to choose from. And in addition, you can manipulate things, a lot more easily. So let me show you as an example. So one of the things we want to see was total variable costs. We’re going to see a breakdown of total variable costs, right? So if I were to take my individual cost, I could do that by again just clicking on each of those items. Now in this particular program, it will alphabetize the data vs. having it in the order it was in the worksheet. There’s pros and cons to that I suppose. So we know that we want, freight, and if you double-click, it’ll bring it over. We want labor. We want materials, we want option cost, overhead, packaged costs, and warranty. Now notice we get something that we’re not really sure if this is exactly the visual we would like to look at. We also want to be able to look at it, by year as well, right? So if I drag my sold date up to my filter and pick years, then I can choose which years I want. In this case I Page | 93 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script want both. So I'm going to hit OK. I can also drag that over to be in columns, let’s say. So that then I’ve got, all of my information in here that I would like, now I can just go to the show me button and pick a visualization that I would like to see. So what I’ve done is I’ve picked this column chart and I can see in 2015 which items made up the majority of my costs. And in 2016 which items made up the majority of my costs. We knew from our previous analysis that it was going to be materials, but again, very quickly, visually we can see that. Now, back to our question of, well, what if I wanted to see, 2015 total variable costs and compared to volume. So let’s go ahead and start a new sheet and let’s think about how we would want to do that. Well, we know that we would want to look at total variable cost. We also know we would like to look at it by year. And we also know, that we would be interested in seeing what our total sales volume was as well. So we can see I have total variable costs and I have sales volume, and it’s not really all that much more meaningful to me except I can see they’re both going up. But if I wanted to, I could, take my sales volume number and I can use that as a dual access. So if I click within this second visualization, I can pick dual access. Now what this does is allows me to look at sales volume and variable costs. These are two different measurement scales, but it puts them on the same plane so that I can see if my variable costs are rising at the same rate as my volume. And so after doing that, I can feel pretty comfortable, right? That they are moving in sync, which is what we would expect variable costs to do. So again, just to summarize, this was our total variable cost by modeling year. This was the one that we prepared in Excel, and again confirming that we had the Jespie was the highest. Jespie was the highest in 2015, and Pebble was the highest in 2016. We then looked at a breakdown of variable cost in Excel so that we could see which of the variable costs were contributing the most of the total. And that clearly came out as some of the material costs. We also split it by years that we could see which ones are increasing or decreasing. And then we wanted to look at the change over time and we did it initially in Excel, but then decided that really to evaluate whether or not your variable costs are rising at the same rate as your volume, you need to see both together. And so we used the visualization software Tableau to take both sales volume and variable costs for both years. And using a dual access, we were able to plot them both together in the same graph. For our key takeaways today, we identified our issue, which was to evaluate Huskie Motors’ variable cost. We identified the questions, specific questions, which models had the highest variable cost. It was the Jespie in ’15, and the Pebble in ’16. Which costs Page | 94 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script comprise the largest percentage of total variable costs. And that was materials for both years, changed over time. And although they’ve increased from ’15 to ’16, they’ve done so in sync with sales volume. And then we explore the data in both Excel and Tableau. So thank you for watching Video 3, “Exploratory Data Visualization.” Huskie Motors Data Analytics: Video 4 In the next video, we’ll perform some statistical analysis to evaluate total variable costs and prepare a histogram. Video Transcript (Est. time: 23:02 min): Hello, welcome to Huskie Motors data analytics video for statistical analysis, “Understanding Variable Costs.” I am Ann Dzuranin, and I will be discussing the material in this video. So let’s go over what we’re going to cover in the video. The first thing we’ll do is identify the issue. We’ll then identify the data we need, we’ll analyze the data and then evaluate our results. So the issue that we’re tackling today is the use of descriptive statistics to evaluate our total variable cost. The data that we’re going to use is the 2015-2016 sales transactions. We’re going to prepare descriptive statistics and prepare a histogram of variable cost as well. And then we’ll summarize our key findings. So let’s go ahead and get started. We’ll look at our Excel file that we’re going to be working with today and that’s the sales data from 2015 to 2016. And what we’re trying to understand better are our total variable cost. And we’re going to do that by preparing some descriptive statistics and a histogram of the distribution as well. So this is what our goal is going to be to create these descriptives and to create a histogram that gives us an idea of what our distribution looks like. So what we’re gonna do first is create this, these descriptive statistics. Each of these statistics can tell us something really important about our data. So let’s just go ahead and get started in and recreate this. If we go to the 2015-’16 sales data, and I’m going to scroll over until I get to my total variable cost, that’s what I’m analyzing in this video, total variable cost. So I’m going to go ahead and copy this column of data and put it in a separate worksheet so that I don’t have it, all in one large spreadsheet. Go ahead and Page | 95 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script copy that and put it into a new sheet. So this is the data that we’re going to be working with in this video. A total variable cost. Each row represents a specific car that was sold and the variable costs associated with that vehicle. Now descriptive statistics can be done using the data analysis tool in Excel. So if I click on data in the toolbar, I can see in my ribbon up here that in my data ribbon, that there is a section called analysis and I have a tool called data analysis. It’s possible that you don’t have that in your Excel. It’s possible you need to add it. So if you don’t have it and you need to add it, what you’re going to do is click on file and then go all the way down to options. Once you’re in the options screen, you want to go down to add-ins. Once you’re in add-ins you want to go to the bottom where it says manage Excel add-ins and click go. Now you should see a pop-up box here that gives you the ability to add, some analysis things to your data ribbon. So make sure you have analysis tool pack checked, that’s the first box. And then click OK. It should pop up right in your data tab. If it doesn’t, then just refresh. Go out of your data tab and go back to your data tab and it should be there. So once you have the data analysis tool, go ahead and I’m going to click on that and show you what’s in here. So this data analysis tool essentially summarizes all of the statistical functions that Excel has available for to use. We’re going to be working with today two of these, we’re going to work with the descriptive statistics tool and then also the histogram tool. So go ahead and click on descriptive statistics and click okay. And now you get, a box here for you to fill in the information to calculate the descriptive statistics. So your input range is going to be the range of data that you would like to, get the descriptives on. So I always like to take the column name because then it returns that in my results. I’m going to click on the first row and I am going to highlight all the data in that row. Because I picked the label, I’m going to make sure I click labels in first row. If you don’t do that, then Excel is going to tell you there’s non-numeric data in the range. Because there are words at the top. We’re going to go ahead and keep that in this worksheet. So I’m going to click on output range and then click in that box and I’m just going to go ahead and scroll back up to the top here where I want to put my results. Then I’m gonna click summary statistics. Otherwise I won’t get any printout at all. Once I’ve done all that, I can click okay. And now what you see is a report of variable cost. One thing that I like to do, because sometimes it can get confusing, there’s many decimal places and no commas. I at least like to go to my formatting here for numbers and click common style, just makes it easier to see the, the statistics, and read the numbers more easily. Okay. So, what I’d like to do is kind of point out what are the important, results in this variable Page | 96 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script cost our most important descriptives. So if we look at these, I’m going to go ahead and highlight them, as I talk about them. So the mean is the first one I want to talk about. We all understand what a mean is, right? It’s a sum of all the observations divided by the number of observations that we have. But one thing to keep in mind with a mean is that it also can be influenced by outliers. So you have, you know, one or a couple of very high numbers, it’s going to pull that mean toward that high number. Conversely, if you have a few, extremely low numbers compared to the rest of yours, it’s going to pull the mean down. So you always want to keep that in mind. The other measure of location that we’re going to talk about is the median. So the median, essentially, is what is determined by taking all of the values arranged from smallest to largest. And then the median is that middle value. So the median is not influenced by outliers. And that’s an important thing to keep in mind. If there are outliers in your data, then you probably want to go with the median instead of the mean. The mode is also in here, I wouldn’t exactly highlight that as, as something that’s important for this data set use of the mode is, it’s essentially the observation that occurs most frequently. It’s really best used with small data sets that have a small number of unique values and not really useful if you have a data set that has a lot of repeating values. So those are the measures of location that that we’re going to be, using or interpreting in our data. Measures of dispersion are also important. And so measures of dispersion include the standard deviation and the variance. So standard deviation and variance and range would also be considered, to be measures of dispersion. Of these three the one that’s going to be most frequently used is going to be the standard deviation. The standard deviation is an indication of how the observations in our data set are spread out from the mean. So if we have a low standard deviation, then that would imply that the observations are all close to the mean. If we have a large or high standard deviation, that would indicate that the observations are more spread out, we get to the standard deviation by calculating the variance. So the variance is the average of the squared deviations from the mean. But the reason that we generally use the standard deviation is the standard deviation, it’s more practical to use because it’s the same measure dimension as the data. So in other words, 5,573.89 is 5,573 and 89 cents of variable cost. Whereas the sample variance really is not that interpretable. The range is also an indication of dispersion. Obviously the larger the range, the more possibility you have for your data being dispersed. However, it’s really not a good measure; as good a measure as using standard deviation. The other things that we can, glean from our descriptive statistics are the measures for the coefficients for Kurtosis and skewness and what these measures do, I'm going to Page | 97 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script highlight them in different colors. What these measures tell us skewness and kurtosis and skewness tell us is the shape of our data. So skewness is an indication of how symmetrical our data is, how, what’s the symmetry or lack of symmetry of the data. So if we have data that is symmetrical, then that would mean that is all spread around the mean. If we have data that is not symmetrical, it’s either going to be positively skewed or negatively skewed. And with that suggest is that there’s more data at one end or the other of the distribution, kurtosis is a measure of how peaked our data is. So what we mean by that is we look at, and when we look at the histogram, we would have seen this. That gives us an idea if we have one area of highly peaked data and then all the rest is, spread from there. So it gives us an indication of whether or not there’s a somewhat flat dispersion or a wide degree of dispersion. That compared to whether or not there is a peaked amount of dispersion. Okay. So, the other interesting thing to look at here is if we look at the minimum, the maximum, these are going to be useful for when we’re trying to determine what our bins are going to be or what our groupings are going to be for our histogram. The minimum is the smallest observation in the data, and the maximum is the largest observation. And then finally count as an important thing to look at because it tells us very quickly whether or not we have all of the data observations that we are supposed to have. So we know, that we are supposed to have 2,674 observations, and we do have that. So it gives us some confidence that we’ve got all of our data. All right, so now we’ve done, we have these descriptives. What we can tell is that the average or the mean variable cost for a vehicle is $17,841, the median is $15,516. We can also tell if there’s a standard deviation from that mean of about $5,573. So in determining whether or not a standard deviation is large or small is relative to the amount of your mean. So I would say that if we’ve got a mean of $17,841 and our standard deviation is $5,573, there’s a pretty good amount of deviation from the mean. So what we want to do next is we want to go ahead and do a histogram of our total variable cost to get a feel for where we have buckets of data, of observations. The way you want to do a histogram, generally you don’t want to have more than eight bins in a histogram. We’re going to go ahead and create our bins over here, which is essentially the columns of data that we’re going to have on the column bar chart. So to create bands, you generally want to start close to where the minimum is and then increase it by equal amounts until you get to the maximum. So that’s where the minimum/maximum come in handy. I’m going to go ahead and start my bins at 9,000 and then I’m going to increase those by Page | 98 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script 5,000 for each bin. So if I put in 9,000 and then 14, which is the increase, I can just drag that down until I get up 39,000. These are the bins that I’m going to use to create my histogram and the groupings of variable cost observations. So let’s go ahead and create that histogram. If we go back to our data tab, and we click on data analysis, and then we click on histogram and click OK. What Excel is asking us to do is tell us, okay, what is it you want me to put in your histogram? So that’s our variable costs. Again, I’m going to pick the label and I'm going to pick all the data in there. The Bin range is going to be those bins that I just established. So I want to see my groupings. I’m starting at 9,000 and going up by 5,000. And I’m going to put the output range here on this worksheet so that we can see it right next to what we’ve created. Now down here is where you’re gonna want to chart your output so you actually do get the histogram. Go ahead and click OK. Now you can see what I get. I get my bins repeated and then the frequency, how often I see an observation that is in that range. So 9,000, that essentially is that minimum one of 8,912. Notice that I’ve got the highest amount in the bin between 14 and 19,000, which makes sense because our mean is 17. Our median is 15. So it makes sense that this would be the highest, the highest amount of observation. So it’s basically saying that variable costs between 14 and 19,000 occurred 889 times. So it gives us a really good indication of where, we have our, most of our observations distributed. It also gives us maybe a hint that perhaps this is an outlier. We might want to go back and check and be sure that data is accurate. And then when we’ve got only three that are above the 39,000, we might want to make sure that data is accurate as well. So I’m going to go ahead and call this sheet descriptives. So our goal though is to understand how total variable costs are, per model. And so we know now that we’ve got a pretty significant standard deviation, we know what our overall mean is for all of our vehicles that we sold, but we really want to get a little bit more detailed. So we really want to understand better what’s happening by each model. So to do that, we can create a pivot table. And what we’re going to do is go back to our data and again, as we did and in other videos, we’re going to click anywhere in that data, go up to insert pivot table, and we’re going to go ahead and put that on a new worksheet. And then click OK. So what I’m interested in is looking at by model what my total variable costs are. But I want to think of it in the same way that I just did my, other variable cost analysis. I want to look at the average or the mean and the standard deviation. So if I click on total variable cost, it brings that down into my values as a sum. And I’m not interested in sum right now, I really want to understand better what’s happening by looking at the average. I’m going to click on average, and I’m going to change my number format so it’s easier to Page | 99 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script read to currency with no decimal places and click OK. So now I have my average. I also want to look at the standard deviation. So I’m gonna drag that variable costs down again. Again, it’s going to give me a sum, and I don’t want it to be a sum, I want to change that to be the standard deviation. So I have that option there. Again, change the number format to currency with no decimal places. And click OK and click OK again. So now I can see by model what the average variable cost is and what the standard deviation is. This is helpful information because it can help me to determine if I have a specific or a group of models that maybe have a much higher standard deviation than others indicating that there’s a very large variance in what’s happening with my total variable cost. That would help me take that to that next step, where I want to really start diving into that model and looking at the individual variable costs that are going into those models to see what’s happening there. Perhaps there’s a correlation between the types of options, that the buyer puts in and the variable cost. So there’s still a lot of research to be done, but this will help me at least narrow it down to where I should start my investigation. So although it’s all great in a table, it really would be useful to see this in a visualization. So again, if I click anywhere in my pivot table and go to pivot table, analyze and click on pivot chart, I will get the option to create a chart out of that data. So notice that Excel is going to give me the option to do a clustered column, which doesn’t really help me all that much. Obviously I can see by looking at which models have the a higher standard deviation, but they might also have a higher average total variable cost. Another useful way to look at this would be to use a combo chart. The default that it’s giving me is bars for average, total variable and align for standard. Let’s look at this as to, as a stacked column and then click OK. So I’m going to make this bigger so we can see it a little bit better. So now we can see by this visual, we can right away see the total of this or the total average variable costs and then we can see the standard deviation of that. So again, we could see this one, they were side by side as well, but we can clearly see that the Chare has the highest variable costs as well as the highest variance, the highest standard deviation. So we may want to go back and look at the Chare model. Look at the Island model and perhaps look a little more deeply into, either Pebble or the Jespie. This is just one step in our analysis journey, but it definitely helps us to narrow it down. To summarize what we’ve done in going through, we identified our issue, our data, and we analyzed it. This was the analysis of our descriptive statistics. This was our histogram where we were able to see that. In our descriptives, we see that we have a mean of 17,000 in variable costs per model. The median was 15,000, but we have a pretty large standard Page | 100 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script deviation, as well. It’s about, a little more than a third of the mean. So we then took that information that we wanted to see. Well, let’s see what our distribution looks like, right? Let’s see what our dispersion looks like. And we can see the majority of the observations are between the 14 and the 24, which coincides with what our descriptive statistics told us. And then we took it one step further and we wanted to see by model what the descriptives looked like. And, so we really wanted to get an understanding of which models seem to have the highest variable cost and combined with the highest standard deviation. And we can see that the models that stand out to us are the Chare, the Island, and the Pebble. The ones with the highest standard deviation is definitely the Chare. So our next step will likely to be to go back and look at the models that seem to have the highest variable cost, average variable cost and the highest standard deviation, and see if we can dig down a little bit deeper to see what costs within those variable costs are perhaps driving that or maybe some correlations with what options the buyer bought. So our key takeaways, we identified that we were looking at trying to better understand total variable cost using the 2015 to ’16 sales data. We did descriptive statistics on total variable costs, and then we broke that down by model, so that we could evaluate which specific model had the highest average variable costs and the largest standard deviation. So thank you for watching this video for statistical analysis. Huskie Motors Data Analytics: Video 5 In the final video of this data analytics video series, Dr. Dzuranin will show you how to use Tableau to bring it all together to create a dashboard with key performance indicators (KPIs) of the Huskie Motor Corporation. Video Transcript (Est. time: 24:41 min): Hello, welcome to Huskie Motors data analytics Video 5, “Explanatory Data Visualization,” creating decision useful dashboards with key performance indicators. I’m Ann Dzuranin, and I will be discussing the material in this video. What we will cover in this video is we’ll first identify what the issue is. And the issue is that we want to create decision-useful dashboards with key performance indicators. So what we need to do is identify the dashboards that are going to be needed and then the relevant key performance Page | 101 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script indicators that we would like to have on those dashboards. So when reading through the Huskie Motors case, we’ve identified that the Huskie Motors management is interested in understanding and being able to quickly evaluate region performance, brand performance, model performance, and overall performance. As we discussed in Video 2, deciding on performance measurements can, there can be a variety of those, but what we’ve determined is that, we’ve determined that the KPIs that we’re most interested in are going to be gross sales, sales volume, net revenue, and profit margin. The data we’re going to use to create the dashboards is going to be the 2015 to 2016 sales transactions for Huskie Motors. We’ll use data visualization software. In this video I’ll be demonstrating the software Tableau to analyze the data and evaluate that with the visualizations that we create. We’ll then communicate our results by preparing interactive dashboards, allowing management to be able to filter the dashboards to see down to the most granular level of information that they’re interested in. So let’s go ahead and get started. As I mentioned, the software I’m going to use to demonstrate the visualizations in this video is called Tableau. Any visualization software can be used to do something like this. But we’re going to work with Tableau today. I’ve already loaded Tableau data into the Tableau file and that’s that 2015-’16 actual sales data. So we can go right to the first worksheet and start to create visualizations that we then use in our dashboards. So we identified that one of the things we’re interested in is gross sales. So lets just look at gross sales right now by region. So I can go over and I can quickly pull region into my columns and then I’ll find gross sales and drag that into rows. So now you can see I’ve got Europe, North America, and South America, my three regions, and I have gross sales, and this represents total gross sales, but I actually want to look at gross sales separate 2015 and 2016, so the other thing that I need to bring over is going to be the sold date. So I bring over the sold date into columns I’m going to get then the 20152016 data. Tableau, like a lot of visualization software packages, will give you a default that they think is best. That’s not always the case. I want to look at this in column, bars so that I can kind of look and see visually from one year to the next what’s happening with my data. I can quickly change the name of my sheets so that I remember what I’m working with, and we’re going to go ahead and call this gross sales by region. The other thing I want to think about is how is it that I want to be able to view this data? Am I interested in region and year, just years, when I have a color scheme that’s identified years as two different colors. Page | 102 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script So ’15 is blue and ’16 is this orange color. That may be good if I’m just concerned about year-to-year comparisons. I’m actually concerned about region to region and then within that year. So I actually would prefer to have region as my color. So what I’m going to do is take region and drag that into my colormark. Now I have colored my regions. So now any of the visualizations that I use if I choose to use region is my color scheme then I can quickly identify, you know, orange is North America, red is South America, blue is Europe. I could also, if I wanted to, see the exact number of sales instead of having to look across at the axis, I could also just drag gross sales into the label and that’ll put it on top of all of my bars. Now that’s great, except it’s a little bit large. So I can format that number first of all, so that it’s currency. And then I can also change the display unit. So then I just have it say in millions. So now we can see very quickly what our sales are. We could actually even remove this gross sale access over here if we wanted to as well to take away some of the additional ink on the page that we may not need. I also might be interested in actually seeing this with the highest gross sale, region first and then filtered from there. So what I can do is go to region and I can sort this, and I can sort this based on the field, which would be the sum of gross sales. And I’ll do it in descending order so that it’s from highest to lowest. So now I can quickly see that North America, even though I could tell that from the other version of it, at least this way I can see them in order as to which is contributing the most to sales, gross sales by region. So the power of doing visualizations is not only to be able to change the type of chart that you’re using, but also to be able to make it interactive. And the best way to make something interactive is to be sure that you have the information in here down to the most granular region. So, for example, right now I have gross sales by region. I know eventually I’m going to want to look at gross sales by brand and by model. So what I can do is be sure that I have those in here as filters. So if I drag brand to filters right now, I want all of them in there and I’m going to click OK. If I drag model to filters again, I'll choose all right now and click oK. So what this allows me to do is if I only wanted to see how the brand Tatra was doing in all the regions, I could just pick Tatra and click OK. And now I have my visualization has changed and I can just see how the Tatra brand is doing. So that can help to make something really powerful. And so this way, if I want to just quickly use my visualization in a dashboard, now we’ll be able to filter down all the way to the granular level for what we’re looking at right now is model. So this is a demonstration of how we actually work with visualizations. Let’s go ahead and look in a completed Tableau file that has put, where we’ve put together Page | 103 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script the dashboards for the areas that we identified that we would like to have dashboards ready to help us make decisions. So I’m going to switch over to the completed Tableau file and we’ll start again with region performance. So what I’m going to do is put this in view, a display mode. So this is a visualization that was created for region performance. So remember we said we’re interested in, at revenue, we’re interested in sales volume, we’re interested in profit margin. So the visualizations that were created for this particular, um, dashboard show us net revenue by region, the increase and decrease in sales volume by region, and then our profit margin by region. So very quickly in this dashboard I can see that I’ve got profit margin, North America is decreasing. That could be driven and probably is largely driven by my decrease in sales volume. I have a slight decrease in, um, South America from ’15 to ’16. So that one I can see, although I have an increase in net revenue, my profit margin is going down a bit. Now my sales volume line up. So this could be some, somewhat due to perhaps increased expenses that’s driving my net revenue down. And then when I look at my Europe, I can see that Europe actually is improving. So my profit margin went from 10% to 13.6% from ’15 to ’16. And I could see my net revenue reflects that. A large increase as well. And I can see that my sales volume is up there as well. I keep in mind you can also use this to just, um, filter based on one, um, aspect of it. So I just clicked on Europe, and then I have my other visualizations changed to only show me Europe. So if I don’t want to look at what else is happening, if I’m just interested in looking at Europe today, I just go in, I click on Europe, and that’s the only ones that pop up for me. If I click on that again, it’ll bring them all back. And the same is true if I just clicked on North America or I just clicked on South America. So this is our dashboard that we’re going to use for region performance. Now we also said another dashboard we’re interested in is brand performance. So I’m going to show you a brand performance dashboard. So this dashboard, again, we’re looking at profit margin, net revenue, and sales volume. I could also have brought gross sales over into the visualization. Um, but you probably want to keep your visualizations to three and maybe four visuals within the dashboard. Makes it easier for the decision maker to kind of zero in on what they’re interested in. So now I’m looking at it by brand and I can see my brand Apechete is decreasing on profit margin. Jackson has increased and my Tatra, now this is a problem, ’cause I can see that that brand actually has a negative profit margin. It’s increased a little bit from ’15 to ’16, but something is going on here that I really need to look further into. If I look at brand performance, net revenue, again, I can see that the Apechete, even though net revenue has gone up a bit, our profit margin is going down. When I look at my Jackson, my net revenue has gone up, as has my profit margin. And Page | 104 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script when I look at the Tatra, my net revenues going up a bit, but you know, that’s still in the negative area here. So I really need to start thinking about what’s going on with the Tatra. Um, and another thing that you can do in visualizations is you can create what’s called a dual axis. So I’m interested in looking at how sales volume has changed from one year to the next within my brands. And I can see, I can look at sales volume and I could have another chart that just looks at the sales volume change, or I can put the two together. So that gives me kind of an indication of the magnitude of the increase or decrease in my sales volume. So if we look at the Apechete, for example, we had a sales volume in 2015 of 371 and that increased to 395. That represents a 6.47% increase. If I look at what happened in Jackson, Jackson went up just a bit at 2.54%, and then I can see that sales volume for Tatra is pretty high overall. But it did have an increase, and that increase was about 4.2%. It may have helped a little bit of what’s happening over here. But I still have some things I’ll really need to think about what that Tatra. The other dashboard that we said we’re interested in are the other decision-making tools we’re interested in using is something that can help us understand what’s happening with the model itself. And so we created a dashboard here that combines both brand and model. And what’s interesting about this is so there’s three types of visualizations that we’ve used here. Again, looking at profit margin only. And so I’ve looked at profit margin ratio by brand and model. And what I’ve done here is created a chart that if I look at the darkest blue, that’s the highest profit margin, median profit margin ratio, and the darkest red is going to be the lowest. So if I look at the Apechete right away, I see that I’m all in the blue, right? So I’ve got all positive profit margins. When I look at the Jackson, I can see I’ve got a couple of brands, a couple of models that I’m a little bit concerned about because they are showing a negative profit margin, right? So they’re in losses. And when I look at the Tatra, I can see that I’ve got some really strong models here, but I’ve got two models that are really dragging down what’s happening overall at the Tatra. If I want to see just one model and see how that profit margin is impacted in what region those are in, for example. So let’s say I really want to see what’s happening with this, Jespie. And so if I click on 2015 for Jespie. I can see really quickly how that’s affected in my regions, right? So even though my profit margin is negative in 2015, you know what, let’s just look at 2016 instead, even though my profit margin is negative I’m still doing okay in South America with that one. But you know, I’m barely, I’m barely breaking, you know, positive margin in Europe, and I’ve got a huge negative profit margin ratio in North America. So if I wanted to see which models were doing best, so that would be the advantage. The advantages at 36.3% profit margin, clearly, it’s driving a lot of North America. It’s doing well in Europe, although not nearly as well. And it’s doing well in South America as well, although again, definitely performing Page | 105 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script best in North America. So what’s interesting about being able to do this is, again, you can quickly filter, and see how everything changes. Right now, I’m looking at both years. I do have a filter up here where I can change that and I can only look at ’16. So that helps to, allow me to very quickly just look at what’s happening in one year. And so if I look at just that one year, then I can see again, the Mortimer is actually doing worse than Jespie in that year. And I can also quickly see, that the one that’s doing best is still the advantage. So this is a nice way to bring together both brand and model performance, to be able to understand better what’s happening within our regions, our brands, and our models, and from year to year. Now, the last dashboard that I’m going to illustrate for you is that overall performance dashboard. So this dashboard, what we want to do is really be able to make this very interactive for the user. So they’re able to look at overall performance and then drill that down by model or by brand. So what we’ve brought into this visualization is net revenue by region, profit margin by region, and then growth sales by country so we can get a better understanding of what’s happening within those regions. So the regions are still color-coded as they were in my example where we’ve got North America as this orange color, Europe is blue, and the red is South America. So if we look at this visualization on the bottom, this is what’s called a tree map and what this represents, the larger the area, right, the higher the amount of sales. And the color represents the region. And then I have added the detail here where you can see what that number is in millions. So we can see just by looking at this that the U.S. comprises right North America I should say, comprises the largest portion of all of our sales or half of it, looks about half of our sales. And within North America, the U.S. is by far the largest. If we look in South America, we’re actually fairly close between Columbia, Venezuela, and Brazil. Then followed by Chile, Argentina, and Bolivia. And when we look in Europe, we can see that we’ve got the U.K. and France are equal, Germany, followed by Spain and then Poland and Sweden, which are the same. So this is all well and good. I can see how this is all happening by country. If I just wanted to click on the U.S. and see what the U.S. net revenue is and what the U.S. profit margins are, I just click in U.S. and then I can just look at what’s happening there. If I’m interested, for example, in just how France is performing, I click on France. I can see what my gross sales are. I can see what my net revenue is year to year, and I can see what my profit margin is as well. If I wanted to use this down and go down to an even more granular level, remember we were concerned about what was happening with the Tatra. So if I picked just Tatra, now my visualizations all change and it just shows me the results for just that brand. So it’s that brand, but all the models within that brand and I can see that that Page | 106 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script brand does not do very well in Europe and in South America, although this revenue is high and the sales are high in the U.S., we are running at a loss as far as profit margin. So we really need to think about kind of what’s happening there as well. So, then if I just click on the U.S. and that will again show me what’s happening just within the U.S. for the Tatra. Click back on there and bring it all back. We know that the Tatra has some really high-performing brands and some really poorly performing brands. So if I go and look at, for example, we knew that the, I think it was the Jespie was not doing so well. We can see that this Jespie is what’s driving a lot of the negative profit margin, right? Because we can see we’ve got a loss year to year for that model and we can see also what’s happening over here in North America, right, with the loss, although we can see that it is performing better in Europe in ’15, but declining in ’16 and improving in South America. So if I look down and wanted to see, you know, well, hey, where is it doing well in South America, I go down here and I can see that really Argentina is where I have sales. And again, this is all relative, this profit margin is relative to what’s happening in Argentina in South America. If I just clicked on Argentina, I can see that Argentina has had an increase in sales of the Tetra in net revenue I should say from year to year. And I can see that it’s had a very large increase in profit margin as well. So being able to make this dashboard interactive, and I’ll go ahead and change this back now to all brands and all models. When I have everything in here, it gives me that overall performance evaluation. And then if I want to really dig into what’s driving some of my more concerning issues, for example, my declining profit margin, then I can dig down deeper by looking at what’s happening with each of the brands. And that automatically changes all my visualizations, making it a very powerful way to explain what’s happening with your operations. So we’re going to go back and kind of summarize what we’ve done here. And what we’ve done is gone through and created dashboards for each of the areas that we want to evaluate performance and identify the KPIs using the 2015-16 data and using those visualizations to prepare that interactive dashboard. We put together first region performance where we summarized what was happening in our region by looking at net revenue, increase or decrease in sales volume, and then mean profit margin, median profit margin by region. We then delved a little bit deeper by looking at what was happening with our brands. We have three brands in our business. We’ve got the Apechete, the Jackson, and the Tatra. And we looked again at what was happening with profit margin, revenue, and then sales volume. And for this one, instead of just looking at the increase or decrease, we looked at volume as well as what the percentage change was from year to year in each of the brands. We took that one step further and looked at Page | 107 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script brand and model performance. And there we created a visualization where we could look at profit margin by growth, brand, and model. And we could make this, and we made this interactive so that we could look at it either by both years, which is what this static view is by both years. Or we could have changed and looked at one year at a time. And we could look and see very quickly which profit margins are somewhat concerning. So we’ve got the Jespie, the Mortimer, and the Crux, and these were for both years. And then we can look at it by region over here. And we saw that we could look at just ’16. We could look at just one particular brand or model, and this was color-coded so that the more the lower the profit margin, if it was negative, it was in the red. And if it was positive it was in the blue, and the darker the color, the more extreme either way. And then we fit, we wrapped it up by looking at overall performance where we made this interactive so that we could, if we wanted to, look at it just overall. So this all models, all brands, all models for all areas. We looked a little bit deeper to see within the regions how the countries are performing. And then we showed how we could change all of the visualizations at the same time by simply going through and picking different models and brand combinations so that we could see how those were doing. So the takeaways, again, we identified our issue, which was to create useful dashboards for decision making. We identified the KPIs we were interested [in] for that. We use the data from 2015 and ’16 to analyze that data and communicated our results in a variety of interactive dashboards that would allow the decision maker to change those dashboards to help identify whatever areas they were interested in examining further. So thank you for watching Video 5, “Explanatory Data Visualization.” Huskie Motors Conclusion This concludes the Huskie Motors data analytics video series. Up next, you will have the opportunity to perform some data analytics yourself, as you attempt the exercises based on the Huskie Motors case study. You’ll now need to access the data files from the Resources tab and save them to your computer. You’ll use these files to perform the analytics required for the exercises in the following lesson. Please note: To perform the exercises, you will be asked to load raw data into Microsoft Excel. Based on the version of Excel you are using, please refer to the following links for guidance Page | 108 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script on loading raw data: For Office 365: https://support.office.com/en-us/article/data-import-and-analysis-options3ea52160-08bc-45ac-acd9-bc4a11bcc2a2 For Excel 2010 – 2016: https://support.office.com/en-us/article/text-import-wizard-c5b02af6fda1-4440-899f-f78bafe41857#ID0EAAEAAA=Office_2010_-_Office_2016 Module 3 Wrap‐up You have now completed Module 3, Applying Data Analytics and Visualization. In this module, you were able to Explore the data analytics and visualizations described in the Huskie Motors case study, and Apply this knowledge to complete the exercises. You will now advance to the final module of this course, Module 4 Conclusion and Final Assessment Module 4: Conclusion and Final Assessment Course Roadmap Click Next to proceed to Module 4, Conclusion and Final Assessment. Summary Welcome to Module 4, Conclusion and Final Assessment! In this module, we will conclude the Data Analytics & Visualization Fundamentals Certificate course by reviewing the overall course learning objectives, and then moving to the Final Assessment. Course Objectives Review Let’s revisit the learning objectives for this course. In the preceding modules, you learned to: Recognize the impact of technology and analytics on the accounting profession. Demonstrate how data analytics can influence organizational strategy. Identify ways data visualizations effectively enable appropriate business decisions. Page | 109 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script If you are unsure of your understanding of any of these objectives, please review the modules before attempting the final assessment. We Are Here to Support You IMA is here to support you as advancements in technology and analytics transform the finance and accounting profession. As mentioned by Jeff Thomson, IMA President and CEO, in Module 1, IMA is committed to preparing you today for changes that are happening now as well as those that are coming in the near future. We offer many online courses, podcasts, webinars, and live events with relevant topics in the realm of technology and analytics, and you can expect to see more added. This compilation of resources is curated by IMA to help you navigate through this dynamic time and prepare you for success. Click below to navigate to IMA’s Technology & Analytics Center to access the resources and recommendations to further educate yourself on these topics. Final Assessment Introduction Next, you will progress to the Final Assessment. Remember to click the “Submit” button, located at the top-right of your screen, after choosing your answer. If you pass the Final Assessment with a score of 70% or higher, you will earn your Data Analytics and Visualization Fundamentals Certificate, CPE, as well as a digital badge that you can upload to your professional profile on social media. Good luck! Course Wrap‐Up To access your professional certificate, digital badge, and NASBE CPE certificate, go to the “My Transcript” tab in IMA’s Learning Management System. To download the certificates, click on “Certificates.” To download your Digital Badge, click on “Digital Badge.” We need your feedback! Please click the link displayed to complete a brief survey on your learning experience. If you haven’t already taken the survey, please click the link below. Thank You Thank you for taking the IMA Data Analytics & Visualization Fundamentals CertificateTM course. Page | 110 IMA Data Analytics & Visualization Fundamentals CertificateTM – Course Script Please contact us if you have questions about this course or if you are interested in other IMA courses. Page | 111