Chapter 2
Organizing/Displaying Data
2.1 Bar, Circle and Time-Series
Graphs
Exploratory Data Analysis
EDA is a method of studying data that uses
stem/leaf plots and histograms. It allows for
exploration, pattern finding, and
observation of extreme values.
EDA is used when you have general data but
are not sure where it might lead or you have
few prior assumptions. This is opposed to an
experiment where specific data is collected
(perhaps with controls) and the observer has
particular questions in mind.
Bar Graphs
Segmented bar chart
Bar Graphs
Vertical or horizontal
Quantitative or Qualitative Data
Bars of uniform Width and uniform spacing between
Lengths represent values of variables, frequency of
occurrence or % of occurrence
Labeled, titled, scales
Sometimes scales on sides are general but you will
also see a label on top of a bar to give more
specific information
You can change the scale by putting in a “break” on
the vertical axis.
Area Principle
The area occupied by a part of a graph
should correspond to the magnitude
of value it represents. Otherwise, the
picture can be misleading even
though it is labeled correctly.
2000
Average amount spent on
1990
Holiday Gifts per child
1980
1970
50
100
150
200
Pareto charts
Circle Graphs (i.e. Pie charts)
Circle Graphs (i.e. Pie Charts)
Each wedge displays proportional part
of total population (that is, the
percentage that give a particular
answer or share a characteristic)
OK for qualitative data.
Time Series Graph (Time Plot)
Time Series Graph (Time Plot)
Data plotted in sequential order
Data sequencing is at regular intervals
Time Series Data must be collected for
thre same variable for the same
subject at regular intervals over a
period of time. NASDAQ, NYSE,
Rainfall, etc are examples of Time
Series Data
Be Careful!!
Make sure your graph is actually saying
something…
Some examples of poor graphs
Resources
wme.cs.kent.edu/kimpton/img/b.png
www.nifl.gov/readingprofiles/FT_Introduction.htm
www.statcan.ca/.../power/ch9/bargraph/bar.htm
http://blogs.ittoolbox.com/eai/implementation/archives/now-with-the-pareto-charts-11915
http://www.quadbase.com/espressreport/help/manual/Charting5.html
http://www.newsandtech.com/issues/2007/01-07/nt/01-07_cornish.htm
http://www.statcan.ca/english/edu/power/ch9/piecharts/pie.htm
http://justinsomnia.org/2006/04/bloglines-subscription-stats-just-check-your-httpd-access-logs/
http://www.appiananalytics.com/solutions/report_automation_gallery.htm
http://trmm.gsfc.nasa.gov/trmm_rain/Events/malaysia_time_series.graph.gif
http://news.bbc.co.uk/2/hi/uk_news/education/4276473.stm
http://www.env.gov.bc.ca/wat/wq/trendstuff/9trends/9locations-06.html
http://www.sapdesignguild.org/resources/diagram_guidelines/DIAGRAMS/SegmSR.GIF
2.2 More Data Display
Histograms, Frequency Tables
and Contingency Tables
Displaying data by counts
Sometimes there is a lot of data. One
way to evaluate data is to list the
counts, or how many times a particular
answer is given. Imagine if the senior
and junior class are asked to choose
their favorite car color out of three
choices. One way to show that data is
a Contingency Table.
Contingency Table
Also called a Two Way Table
A contingency table is a way to display
and analyze the relationship between
2 (or more) sets of categorical data.
Red
Blue
Silver
TOTAL
Female
41
62
33
136
Male
59
22
50
131
TOTAL
100
84
88
267
Grand Total
Marginal Total
If the data is given as percentages, it may be called a two way
frequency table.
Marginal
Total
Histograms
Histogram
Great way to evaluate large quantities of data
Bars touch
Width of bars represent a quantitative
value (the class)
Height indicates frequency (how many
individuals give a response in each
particular class)
Some books call the bars bins
Types of Histograms
Frequency
Histogram
5
Frequency
0
1
1
1
1
1
1
1
Bin
Uniform/Rectangular
Histogram
Symmetric Histogram
Bimodal Histogram
Skewed Left/Right Histograms
Frequency Table
A frequency table is used to organize data for drawing the histogram.
Class category or interval
Class Width Width of particular interval
Class Frequency # of tally marks for a particular class
Lower/Upper Class Limit lowest/highest data value
that can fit in a particular class
Lower Class Limit + Class Width Smallest Value for next
class
Class Boundaries
Upper class boundary = UCL + ½
Lower Class Boundary = LCL - ½
Class Midpoint LCL + UCL
2
How to make a Histogram
1.
Make a frequency table
a)
b)
c)
d)
e)
Determine # of classes and class width
Find LCL and UCL
Tally data and find CF
Class Width = Largest Data Value – Smallest Data Value
Find midpoints
Desired # of Classes
Find boundaries
2.
Put class Boundaries on horizontal axis,
Frequencies on Vertical axis
3.
Draw a bar with width extending between Class
boundaries, whose height = that particular class
frequency
TI 83
Enter the data by hand into L1 – press “STAT”, “EDIT”
and start typing into List 1.
Hit “2nd” “StatPlot”; Turn“ON” the stats plot on Plot1
and select the histogram picture. The TI83 should
automatically select the correct list. If it doesn’t,
change it by typing in “2nd” and then the list name
you want (see above the number keys for the lists)
Hit Graph – you will see a histogram.
Go to window and change the xscale to the class
width and that forces it to match your choice of
classes.
“Trace” then allows you to see class information.
Excel
Enter all the data by hand.
Select “Tools”, “Data Analysis”, “Histogram” (it
might need to install the data analysis
package – do it)
Input range is your range of data values
Output range is the list that you create
somewhere else in your table that lists the
maximum value for each class. This will
force it to make the # of classes you want.
Then click OK. It will put it on another
worksheet in your file.
Ogive
An Ogive is a dot plot that shows the
accumulation at each level.
Resources
http://en.wikipedia.org/wiki/Histogram
http://www.statcan.ca/english/edu/power/ch9/histograms/histo.htm
http://www.aivosto.com/project/help/pm-charts.html
www.tcnj.edu/~rgraham/rhetoric/statistics.html
http://www.ncsu.edu/scivis/lessons/variation/varlab2.html
http://mayoresearch.mayo.edu/mayo/research/cpor/tip2.cfm
2.3 Stem/Leaf and Dotplots
What is a stem/leaf display?
Another way to display data in a histogram like
method without losing the actual individual data
values is a stem/leaf display. It looks like this:
Stem Leaf
2 56
3 0123445
and it is arranged so that the
stem is the left digit(s) and the
leaves are the right digit(s)
4 014
Data such as 25, 26, 30, 31, 32, 33, 34, 35, 35, 40, 41,
44 would display like this:
Why a stem/leaf display
Turn that data table sideways and it
looks like a histogram – the class with
more entries (higher frequency)
extends further right.
Lets turn our day 1 pulse exercise into a
stem/leaf display.
How to make a stem/leaf
display
1. Choose which digits will be the stem
and which will be the leaf.
2. Align stems from smallest to largest *
3. Place the leaves on line with the
corresponding stem
4. Label to indicate representation. i.e.
6│1 = 61 beats per minute.
Dot Plot
A dot plot is a little more basic. One axis is the
individuals (or perhaps basic count value), the other
is the quantitative data values, and dot represents
each data value.
These are the Kentucky
Derby winning times from
1875 through 2004.
Any idea why there are
two clusters? (Hint:
something happened in
1896, and it has nothing
to do with steroids)
Blogs
I found this guy’s blog that was really
interesting. He was musing over his
ipod playlist and wondered how many
times some of his songs had played.
“I exported the Library and
wrote some python scripts
to extract data …It turns
out I have 208 unplayed
songs in my library, and
additionally lots of low
single digit playcount
songs. Here’s an (ugly
excel generated)
histogram”
While I was delving around, I figured I would see if
theres any correlation between the length of time
a song has been in my library, and the number of
times it’s been played. The dot plot turned out
interesting .
Resources
http://www.monkeyatlarge.com/archives/2006/07/
Bock, V., Velleman, P., De Veaux, R, Stats: Modeling the World, 2nd
Edition, Boston, Pearson Addison Wesley p. 49