Data Visualization, 1e Chapter 1: Introduction Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter Objectives After completing this chapter, you will be able to: LO 1.1 Define analytics and describe the different types of analytics LO 1.2 Describe the different types of data and give an example of each LO 1.3 Describe various examples of data visualization used in practice LO 1.4 Identify the various charts defined in this chapter Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Data Visualization Data visualization is the graphical representation of data and information using displays such as charts, graphs, and maps. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.1 Analytics Analytics is the scientific process of transforming data into insights for making better decisions. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.1 Developments in Analytics Three developments have spurred the explosive growth in the use of analytics for decision making: • Technological advances – incredible amounts of data from scanner technology, ecommerce, social networks, sensors, and personal electronic devices such as cell phones. • Methodological developments – faster algorithms can handle and explore massive amounts of data for data visualization, machine learning, optimization, and simulation. • Explosion in computing power and storage capability – better computational hardware, parallel computing, and cloud computing, enable businesses to solve larger problems faster and with greater accuracy. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.1 A Categorization of Analytical Methods • Descriptive analytics – encompasses a set of analytical tools that describes what has happened in the past. • Examples: data queries, standard reports, descriptive statistics, data visualization, cluster analysis, and what-if spreadsheet models. • Predictive analytics – consists of analytical tools that use mathematical models constructed from past data to predict the future or ascertain the impact of one variable on another. • Examples: linear regression, time series analysis, and predictive data mining. • Prescriptive Analytics – includes mathematical or logical models that indicate the best course of action to take in decision making. • Examples: optimization, simulation, and decision analysis. [Author Name], [Book Title], [#] Edition. © [Insert Year] Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.2 Why Visualize Data? We create data visualizations for two reasons: 1. Exploring data 2. Communicating/explaining a message Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.2 Data Visualization for Exploration: Patterns Data visualization is a powerful tool to identify patterns. Column Chart of Zoo Attendance by Month 25,000 20,000 15,000 10,000 5,000 0 Jan Feb Mar Apr May Jun July Aug Sept Oct Nov Dec Month Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.2 Data Visualization for Exploration: Relationships Data visualization is a powerful tool to better understand the relationship between variables. Linear Regression of Anscombe's Data Set 1 Y 12 10 10 8 8 6 6 4 4 y = 0.5x + 3.00 R² = 0.67 2 Linear Regression of Anscombe's Data Set 2 Y 12 y = 0.5x + 3.00 R² = 0.67 2 0 0 0 2 4 6 8 10 12 14 16 X 0 2 4 6 8 10 12 14 16 X Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.2 Data Visualization for Explanation Data visualization is also important for explaining relationships found in data and for explaining the results of predictive and prescriptive models. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Types of Data Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.3 Quantitative vs Categorical Data Quantitative data Data for which numerical values indicate magnitude. Arithmetic operations, such as addition, subtraction, multiplication, and division, can be performed on quantitative data. Examples: Share Price ($), and Volume. Categorical data Data for which labels or names identify categories of like items. Arithmetic operations cannot be performed on categorical data. Examples: Company, Symbol, and Industry. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.3 Cross-Sectional vs Time Series Data Dow Jones Index values from January 2010 to April 2020 Cross-sectional data Data collected from several entities over the same time frame. Time series data Data collected over several time periods. • Business and economic publications frequently include graphs of time series data. • Graphs help analysts understand what happened in the past, identify trends over time, and project future levels for the time series. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.3 Big Data • There is no universally accepted definition of big data. • A working definition of big data: • Any set of data too large or too complex to be handled by a desktop computer. • The four Vs of big data: • Volume - the amount of data generated • Velocity - the speed at which the data are generated • Variety - the diversity in types and structures of data generated • Veracity - the reliability of the data generated [Author Name], [Book Title], [#] Edition. © [Insert Year] Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.3 Big Data [Author Name], [Book Title], [#] Edition. © [Insert Year] Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.3 Word Cloud A word cloud is a visual display that contains the key terms of a document. The size of the word is proportional to the frequency with which the word appears in the document. A word cloud is a frequently used chart to summarize words used in large sets of text data. Source: https://milady.cengage.com/blog/usingword-clouds-classroom Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.4 Data Visualization for Accounting A clustered column chart showing Benford’s Law versus Tucker Software’s Accounts Payable Entries Benford’s Law, (the First-Digit Law), states that the proportion of observations in which the first digit is 1 through 9, respectively, follows given probabilities. Benford's Law may help detect fraud. If the first digits of numbers in a data set do not conform to Benford's Law, fraud investigation may be warranted. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.4 Data Visualization for Finance A high-low-close stock chart for Verizon Wireless A High-Low-Close Stock chart shows the high value, low value, and closing value of the price of a share of stock over time. This chart shows how the closing price is changing over time and the price volatility each day. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.4 Data Visualization for Human Resource Management A stacked column chart of employee turnover by month A stacked column chart shows part-towhole comparisons, either over time or across categories, using different colors or shades of color. This chart shows how January and JulyOctober are the months in which most employees left the company, and April through June the months with most new hires. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.4 Data Visualization for Marketing A funnel chart of website conversions for a software company A funnel chart shows the progression of a numerical variable for various categories from larger to smaller values. This chart helps to compare the conversion effectiveness of different website configurations, the use of bots, or changes in support services. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.4 Data Visualization for Operations Time series data for unit sales of a product A line chart shows a variable of interest plotted over several time periods. A lone chart helps to understand what happened in the past, identify trends over time, and project future levels. This chart helps identify a repeating pattern and shows how units sold might also be increasing slightly over time. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.4 Data Visualization for Engineering A quality control chart for dog food production A control chart is a graphical display of a variable of interest plotted over time relative to lower and upper control limits. It helps determine if a production process is in or out of control. Points beginning to appear outside the control limits are signals to inspect the process and make any necessary corrections. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.4 Data Visualization for Science A spaghetti chart of hurricane tracks from multiple predictive models Geographic maps help display the results of complicated predictive models, such as predicting the path of a hurricane. A spaghetti chart owes its name to the fact that the depiction of multiple flows through a system using a line for each possible path resemble spaghetti Source: clickorlando.com Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1.4 Data Visualization for Sports A shot chart for NBA player Chris Paul A shot chart displays the location of the shots attempted by a player during a basketball game with different symbols or colors indicating the outcome of a shot. This chart shows shot attempts by NBA player Chris Paul, with a blue dot indicating a successful shot and an orange x a missed shot. Source: basketball-reference.com Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Discussion Activity 1 • Consider the two scatter charts and related trendline and regression statistics shown for Anscombe’s data sets in slide 9. The estimated regression equations and related R-squares for both data sets are identical. • Does fitting a line to the data appear to be a wise choice for both data sets? Explain your answer. • What would be a more appropriate regression equation to fit Anscombe’s Data Set 2? Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Discussion Activity 2 • Consider the first digit of Tucker software’s accounts payable entries in the clustered column chart in slide 15. • Does it appear that the data follow Benford’s Law? Explain your answer. • Which first digits from Tucker's accounts payable entries stand out as underrepresented in terms of absolute and relative proportional difference for the corresponding expected probabilities as dictated by Benford's Law? Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Check Your Knowledge 1. Map visualizations are often used in the natural sciences industry because the data are often _________. a. b. c. d. geographic quantitative ordinal weather-related 2. Which chart displays a variable of interest plotted over time relative to lower and upper control limits? a. b. c. d. High-low-close stock chart Funnel chart Clustered bar chart Control chart Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Summary In this chapter, you should have learned how: • To define analytics, and why analytics has seen such explosive growth over the past couple of decades. • Data visualization is part of descriptive analytics. • To distinguish between the major data types in analytics, such as quantitative vs. categorical, and cross-sectional vs. time series data. • To plan for the data visualization challenges that are typical of big data. • To use data visualization techniques in major functional areas, such as accounting, finance, human resource management, marketing, operations, engineering, science, and sports. Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.