Uploaded by 47abbf7db3

Chapter1 Introduction to Visualization

advertisement
Data Visualization
Chapter 1: Introduction to Visualization with Python
©1992–2020 by Packtn Education, Inc. All Rights Reserved. This content is based on book Interactive Data
Visualization with Python: Present your data as an effective and compelling story, 2nd Edition Illustrated Edition, by
Abha Belorkar, Sharath Chandra Guntuku, Shubhangi Hora, Anshu Kumar, Packt Publishing.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts
include the development, research, and testing of the theories and programs to determine their effectiveness. The
authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the
documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or
consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.
Outline
• Data Visualization
• Choosing a Visualization
• Common Pitfalls While Visualizing Data
• Creating a Confusing Visualization
• Guideline of how to create a great visualization
Why Visualization?
There are at least six reasons why someone would create a visualization
of data:
1. to give an overview,
2. to show the scale and complexity of the data,
3. to allow exploration of the data,
4. to communicate findings,
5. to tell a story, and
6. to attract attention and stimulate interest.
<<statisticians Andrew Gelman and Antony Unwin,2012>>
Data Visualization vs infographics
Choosing a Visualization
Choosing a Visualization
Relationship
• These visualizations are used when showing a link between two or more variables. For
example, the relationship between the carbon dioxide emissions per person per country
and the GDP per country.
• The plots that are used to depict relationships include network graphs, scatter plots, Venn
diagrams, bubble charts, trees, and parallel coordinates, among others.
Comparison
• Comparison visualizations are used when you want to show the differences or similarities
between two or more variables.
• The plots that are used to depict comparisons include all the types of bar graphs (simple,
paired bar, paired column, stacked bar, and stacked column), pyramid graphs, heatmaps,
box plots, and violin plots, among others.
Choosing a Visualization
Geo-spatial
• Geo-spatial visualizations are specific to data that is
geographical in nature. Therefore, location is a feature
that must exist in the data. Only then should this
visualization be used.
• The plots used include world maps with different
features, such as choropleth maps, isopleth maps,
contour maps, bubble maps, point maps, icon maps,
and flow maps, among others.
Choosing a Visualization
Time
• When data consists of dates and/or times, these visualizations are used to track
the necessary changes.
• The plots that are used to depict temporal data include variations of line graphs,
stacked area charts, stock charts, sparklines, fan charts, stream charts, and
timeline charts, among others.
Common Pitfalls While Visualizing Data
• Visualizing Too Much Information
• While visualizations are great at simplifying data and conveying important insights, forcing
them to convey too much information results in them becoming too complicated, and so,
ultimately, the viewer isn't able to understand anything by looking at them. Too much
information basically means incorporating more than four or five features in your visualization,
thereby introducing more than 5 colors and having too many words.
Common Pitfalls While Visualizing Data
• Inconsistent Scales
• Each feature has its own range within which
all its data falls; if it's a numerical feature,
then all the values fall within this range,
while if it is a categorical feature, then there
is a discrete set of classes.
• When visualizing more than one or two
features in a single plot, the problem of
scales often arises because each feature has
its own scale. Not considering the scale of
each feature often leads to confusing
visualizations that show trends where there
are none. Inconsistent scales also often
force relationships that do not exist. This
misleads viewers into believing something is
true when it is not.
Common Pitfalls While Visualizing Data
• Mislabeling Elements
• Labels are often overlooked and
considered as trivial elements of a
visualization. Only in their absence
do we realize their importance.
Visualizations without labels
become very confusing as the
viewer doesn't know what they're
seeing.
• The legend here just describes the
different stacks (colors).
Guideline of how to create a great visualization
Download