Uploaded by Maryam H

DataViz 1e Ch01

advertisement
Data Visualization,
1e
Chapter 1: Introduction
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Chapter Objectives
After completing this chapter, you will be able to:
LO 1.1 Define analytics and describe the different types of analytics
LO 1.2 Describe the different types of data and give an example of each
LO 1.3 Describe various examples of data visualization used in practice
LO 1.4 Identify the various charts defined in this chapter
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Data Visualization
Data visualization is the graphical representation of data and
information using displays such as charts, graphs, and maps.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.1 Analytics
Analytics is the scientific process of transforming data into insights
for making better decisions.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.1 Developments in Analytics
Three developments have spurred the explosive growth in the use of analytics
for decision making:
• Technological advances – incredible amounts of data from scanner technology, ecommerce, social networks, sensors, and personal electronic devices such as cell
phones.
• Methodological developments – faster algorithms can handle and explore
massive amounts of data for data visualization, machine learning, optimization, and
simulation.
• Explosion in computing power and storage capability – better computational
hardware, parallel computing, and cloud computing, enable businesses to solve
larger problems faster and with greater accuracy.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.1 A Categorization of Analytical Methods
• Descriptive analytics – encompasses a set of analytical tools
that describes what has happened in the past.
• Examples: data queries, standard reports, descriptive statistics, data
visualization, cluster analysis, and what-if spreadsheet models.
• Predictive analytics – consists of analytical tools that use
mathematical models constructed from past data to predict the
future or ascertain the impact of one variable on another.
• Examples: linear regression, time series analysis, and predictive data
mining.
• Prescriptive Analytics – includes mathematical or logical models
that indicate the best course of action to take in decision making.
• Examples: optimization, simulation, and decision analysis.
[Author Name], [Book Title], [#] Edition. © [Insert Year] Cengage. All Rights Reserved. May not be scanned,
copied or duplicated, or posted to a publicly accessible website, in whole or in part.
1.2 Why Visualize Data?
We create data visualizations for two reasons:
1. Exploring data
2. Communicating/explaining a message
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.2 Data Visualization for Exploration: Patterns
Data visualization is a powerful tool to identify patterns.
Column Chart of Zoo Attendance by Month
25,000
20,000
15,000
10,000
5,000
0
Jan
Feb
Mar
Apr
May
Jun
July
Aug
Sept
Oct
Nov
Dec
Month
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.2 Data Visualization for Exploration: Relationships
Data visualization is a powerful tool to better understand the relationship
between variables.
Linear Regression of Anscombe's Data Set 1
Y
12
10
10
8
8
6
6
4
4
y = 0.5x + 3.00
R² = 0.67
2
Linear Regression of Anscombe's Data Set 2
Y
12
y = 0.5x + 3.00
R² = 0.67
2
0
0
0
2
4
6
8
10
12
14
16
X
0
2
4
6
8
10
12
14
16
X
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.2 Data Visualization for Explanation
Data visualization is also important for explaining relationships found in
data and for explaining the results of predictive and prescriptive models.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Types of Data
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.3 Quantitative vs Categorical Data
Quantitative data
Data for which numerical values indicate
magnitude. Arithmetic operations, such as
addition, subtraction, multiplication, and division,
can be performed on quantitative data.
Examples: Share Price ($), and Volume.
Categorical data
Data for which labels or names identify
categories of like items. Arithmetic operations
cannot be performed on categorical data.
Examples: Company, Symbol, and Industry.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.3 Cross-Sectional vs Time Series Data
Dow Jones Index values from January 2010 to April 2020
Cross-sectional data
Data collected from several entities over the
same time frame.
Time series data
Data collected over several time periods.
• Business and economic publications
frequently include graphs of time series
data.
• Graphs help analysts understand what
happened in the past, identify trends
over time, and project future levels for
the time series.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.3 Big Data
• There is no universally accepted definition of big data.
• A working definition of big data:
• Any set of data too large or too complex to be handled by a desktop
computer.
• The four Vs of big data:
• Volume - the amount of data generated
• Velocity - the speed at which the data are generated
• Variety - the diversity in types and structures of data generated
• Veracity - the reliability of the data generated
[Author Name], [Book Title], [#] Edition. © [Insert Year] Cengage. All Rights Reserved. May not be scanned,
copied or duplicated, or posted to a publicly accessible website, in whole or in part.
1.3 Big Data
[Author Name], [Book Title], [#] Edition. © [Insert Year] Cengage. All Rights Reserved. May not be scanned,
copied or duplicated, or posted to a publicly accessible website, in whole or in part.
1.3 Word Cloud
A word cloud is a visual display that
contains the key terms of a document.
The size of the word is proportional to the
frequency with which the word appears in
the document.
A word cloud is a frequently used chart to
summarize words used in large sets of
text data.
Source:
https://milady.cengage.com/blog/usingword-clouds-classroom
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.4 Data Visualization for Accounting
A clustered column chart showing Benford’s Law versus Tucker Software’s Accounts
Payable Entries
Benford’s Law, (the First-Digit Law),
states that the proportion of
observations in which the first digit is 1
through 9, respectively, follows given
probabilities.
Benford's Law may help detect fraud. If
the first digits of numbers in a data set
do not conform to Benford's Law, fraud
investigation may be warranted.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.4 Data Visualization for Finance
A high-low-close stock chart for Verizon Wireless
A High-Low-Close Stock chart
shows the high value, low value, and
closing value of the price of a share of
stock over time.
This chart shows how the closing
price is changing over time and the
price volatility each day.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.4 Data Visualization for Human Resource Management
A stacked column chart of employee turnover by month
A stacked column chart shows part-towhole comparisons, either over time or
across categories, using different colors or
shades of color.
This chart shows how January and JulyOctober are the months in which most
employees left the company, and April
through June the months with most new
hires.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.4 Data Visualization for Marketing
A funnel chart of website conversions for a software company
A funnel chart shows the progression
of a numerical variable for various
categories from larger to smaller values.
This chart helps to compare the
conversion effectiveness of different
website configurations, the use of bots,
or changes in support services.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.4 Data Visualization for Operations
Time series data for unit sales of a product
A line chart shows a variable of interest
plotted over several time periods. A lone
chart helps to understand what
happened in the past, identify trends
over time, and project future levels.
This chart helps identify a repeating
pattern and shows how units sold might
also be increasing slightly over time.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.4 Data Visualization for Engineering
A quality control chart for dog food production
A control chart is a graphical display of a
variable of interest plotted over time
relative to lower and upper control limits.
It helps determine if a production process
is in or out of control.
Points beginning to appear outside the
control limits are signals to inspect the
process and make any necessary
corrections.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.4 Data Visualization for Science
A spaghetti chart of hurricane tracks from multiple predictive models
Geographic maps help display the results
of complicated predictive models, such as
predicting the path of a hurricane.
A spaghetti chart owes its name to the
fact that the depiction of multiple flows
through a system using a line for each
possible path resemble spaghetti
Source: clickorlando.com
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
1.4 Data Visualization for Sports
A shot chart for NBA player Chris Paul
A shot chart displays the location of the
shots attempted by a player during a
basketball game with different symbols or
colors indicating the outcome of a shot.
This chart shows shot attempts by NBA
player Chris Paul, with a blue dot indicating
a successful shot and an orange x a
missed shot.
Source: basketball-reference.com
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Discussion Activity 1
• Consider the two scatter charts and related trendline and regression
statistics shown for Anscombe’s data sets in slide 9. The estimated
regression equations and related R-squares for both data sets are identical.
• Does fitting a line to the data appear to be a wise choice for both data
sets? Explain your answer.
• What would be a more appropriate regression equation to fit Anscombe’s
Data Set 2?
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Discussion Activity 2
• Consider the first digit of Tucker software’s accounts payable entries in the
clustered column chart in slide 15.
• Does it appear that the data follow Benford’s Law? Explain your answer.
• Which first digits from Tucker's accounts payable entries stand out as
underrepresented in terms of absolute and relative proportional
difference for the corresponding expected probabilities as dictated by
Benford's Law?
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Check Your Knowledge
1. Map visualizations are often used in the natural sciences industry because the data are
often _________.
a.
b.
c.
d.
geographic
quantitative
ordinal
weather-related
2. Which chart displays a variable of interest plotted over time relative to lower and upper
control limits?
a.
b.
c.
d.
High-low-close stock chart
Funnel chart
Clustered bar chart
Control chart
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Summary
In this chapter, you should have learned how:
• To define analytics, and why analytics has seen such explosive growth over
the past couple of decades.
• Data visualization is part of descriptive analytics.
• To distinguish between the major data types in analytics, such as
quantitative vs. categorical, and cross-sectional vs. time series data.
• To plan for the data visualization challenges that are typical of big data.
• To use data visualization techniques in major functional areas, such as
accounting, finance, human resource management, marketing, operations,
engineering, science, and sports.
Camm, Cochran, Fry & Ohlmann, Data Visualization - Exploring and Explaining with Data, 1st Edition. © 2021
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Download