Lecture 2 - Simple graphs and charts

advertisement
Envisioning Information
Lecture 2
Simple Graphs and Charts
Ken Brodlie
School of Computing
University of Leeds
ENV 2006
2.1
Lecture Outline
•
Preliminaries
– Definitions
– Datatypes
•
Simple Data Presentation
– Graphs and charts
ENV 2006
2.2
Fundamentals
•
Basic Datatypes correspond to
different levels of measurement
•
Data can be:
•
Categorical
– Nominal
• No sense of order
• Apples, oranges,…
– Ordinal
• Ordered in sequence
• January, February, ..
– Categorical - labels
– Numerical – numbers
•
Numerical
– Continuous
• Real numbers
• Height of students in class
– Discrete
• Typically whole numbers
• Marks in an exam
ENV 2006
2.3
Question
•
Give an example for each class
in which numbers are
involved…
•
Categorical - nominal
•
Categorical - ordinal
•
Numerical – continuous
•
Numerical - discrete
ENV 2006
2.4
Exploratory Data Analysis
•
•
•
•
Two stage process:
– Exploratory: Search for
evidence using all tools
available
– Confirmatory: evaluate strength
of evidence using classical data
analysis
Pioneering figure is John
Tukey
New approach to data
analysis, heavily based on
visualization, as an
alternative to classical data
analysis
See wikipedia
ENV 2006
2.5
Simple Data Presentation
ENV 2006
2.6
Simple Data Presentation
•
Simple data tables are often
presented as line graphs, bar
graphs, pie charts, dot
graphs, histograms…
•
Which should we use and
when?
ENV 2006
2.7
Line Graph
•
•
Fundamental technique of
data presentation
Used to compare two
variables
– X-axis is often the control
variable
– Y-axis is the response
variable
•
Students participating in sporting activities
Good at:
– Showing specific values
– Trends
– Trends in groups (using
multiple line graphs)
Mobile
Phone use
Any critical
comments here?
Note: graph labelling is fundamental
ENV 2006
2.8
Simple Representations – Bar Graph
•
Bar graph
– Presents categorical variables
– Height of bar indicates value
– Double bar graph allows
comparison
– Note spacing between bars
– Can be horizontal (when would
you use this?)
Number of police officers
Internet use at a school
Note more space for labels
ENV 2006
2.9
Dot Graph
•
•
Very simple but effective…
Horizontal to give more space
for labelling
ENV 2006
2.10
Pie Chart
•
•
•
Pie chart summarises a set of
categorical/nominal data
But use with care…
… too many segments are
harder to compare than in a bar
chart
Should we have a long lecture?
Favourite movie genres
ENV 2006
2.11
Histograms
•
•
Histograms summarise discrete
or continuous data that are
measured on an interval scale
No gaps if variable is
continuous
Distribution of salaries
in a company
ENV 2006
2.12
Scatter Plot
•
•
Used to present
measurements of two
variables
Effective if a relationship
exists between the two
variables
Car ownership by household income
Example taken from
NIST Handbook –
Evidence of strong
positive correlation
ENV 2006
2.13
Scatter Plots in Excel
•
•
The scatter plot is a fundamental
tool in Excel
Chart type XY (Scatter) and
subtype Unconnected Points
http://www2.ncsu.edu:8010/ncsu/chemistry/resource/excel/excel.html
ENV 2006
2.14
Regression Line
•
Excel allows you to add a linear
regression line (trend line)
Remember: correlation does not imply causality… ie a relationship
exists but one is not necessarily causing the other – there may be a
third factor?
ENV 2006
2.15
Tukey Sum-Difference Plot
Better understanding of residuals …
ENV 2006
2.16
Box Plots
•
•
•
•
In some situations we have, not
a single data value at a point,
but a number of data values, or
even a probability distribution
When might this occur?
Tukey proposed the idea of a
boxplot to visualize the
distribution of values
For explanation and some
history, see:
M – median
Q1, Q3 – quarrtiles
Whiskers –
1.5 * interquartile range
Dots - outliers
Darwin’s plant study
http://mathworld.wolfram.com/Box-andWhiskerPlot.html
http://en.wikipedia.org/wiki/Box_plot
http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html
ENV 2006
2.17
Acknowledgement
• Thanks to Statistics Canada – an excellent web site for simple
data presentation
– http://www.statcan.ca/english/edu/power/toc/contents.htm
ENV 2006
2.18
Exercise for next week
• Understand a bit more about the merits of pie charts and bar
graphs
• Create a dataset with roughly equal numbers in each class
• Which is best if the task is to discriminate?
ENV 2006
2.19
Exercise for next week
• Over the next week look for examples of basic graphs
– In newspapers, magazines or other print media
– On news web sites or other electronic media
• Analyse two examples
– One should be a example where you think the use of graphics is
good
– One should be bad
• Be ready next week to present these results to the class…
ENV 2006
2.20
Envisioning Information : Practical Work
Gnuplot
R
Excel
ENV 2006
2.21
Download