DATA VISUALIZATION
UNIVARIATE (no review- self study)
STEM & LEAF
BOXPLOT
BIVARIATE
SCATTERPLOT (review correlation)
Overlays; jittering
Regression line overlay (see ASA
website:
http://nlvm.usu.edu/en/nav/frames_asid_144_g_4_t_5.html?open=activities
DATA VISUALIZATION
TOPICS
GRAPHICAL DISPLAYS
UNIVARIATE
BIVARIATE
ASSUMPTIONS OF MULTIPLE REGRESSION
LINEARITY
HOMOSCEDASTICITY
ERROR INDEPENDENCE
NORMALITY
FIXING VIOLATIONS
GRAPHICAL DISPLAYS
• Frequency Histogram:
– SPSS ANALYZE: Descriptive Statistics: Explore: Plot:
Stem and Leaf
– SPSS GRAPH: Boxplot (normal curve overlay
available
• or INTERACTIVE: Boxplot or Analyze: Frequencies
– SPSS GRAPH: Histogram or Interactive: Histogram
• # “bins” = 1 + log2(N)
• Example: N= 500; #bins = 1+ 9 = 10
• Log2(512) = 9 (eg., 2x2x2x2x2x2x2x2x2=512)
ANXIETY Stem-and-Leaf Plot
Frequency
.00
22.00
35.00
7.00
39.00
22.00
26.00
22.00
26.00
12.00
26.00
14.00
24.00
31.00
28.00
6.00
15.00
24.00
14.00
Stem & Leaf
3.
3 . 4444444444455555555555
3 . 66666666777777777777777777777777777
3 . 9999999
4 . 000000000000000000000000011111111111111
4 . 2222222222222222222222
4 . 44455555555555555555555555
4 . 6666666667777777777777
4 . 88888888888999999999999999
5 . 111111111111
5 . 22222222222222222223333333
5 . 44444444444444
5 . 666666777777777777777777
5 . 8888888888888999999999999999999
6 . 1111111111111111111111111111
6 . 333333
6 . 444444444444444
6 . 666666666666666666666666
6 . 88888899999999
Stem width:
Each leaf:
10
1 case(s)
70
70
60
ANXIETY
60
50
50
40
40
30
ANXIETY
40
Frequency
30
20
10
Mean = 50.3
Std. Dev. = 10.147
N = 393
0
40
50
ANXIETY
60
70
50
Count
40
30
20
10
40
50
ANXIETY
60
GRAPHICAL DISPLAYS
• Kernel Smoothing
– SPSS Graph: INTERACTIVE: Line: Dots and
Lines: Spline or Lagrange 3rd and 5th order fits
– does not give you the smoother options
(available for bivariate scatterplots- see later
slides)
Dot/Lines show counts
25
Count
20
15
10
5
40
50
ANXIETY
60
70
100
100
75
75
Count
Count
Dot/Lines show counts
50
50
25
25
0
0
10
12
14
age
16
18
10
12
14
age
16
18
Bivariate Displays
• Scatterplots
– Interval data
– Category by interval- jittering
– Regression fits- lowess lines
• Scatterplot Matrices
Interval Scatterplot: SPSS Graphics: Interactive:
Scatterplot: Fit: Method:Smoother









60

ANXIETY


50


40

10







































12











































age
16


















































50




60






14
No Smoother
70

ANXIETY
70

18

40


10
12
LLR Smoother




















































14
16


18
age
with Normal Smoother
Interval Scatterplot: SPSS Graphics: Interactive:
Scatterplot: Fit: Method:Smoother
70
ANXIETY
60



















































50


40

10
12
LLR Smoother




















































14
age
16

18
with Uniform
Smoother

70
70
60
50
40



















































1.00
1.25
1.50
sex
1.75
2.00

      

 
   





60





 
   


LLR Smoother
ANXIETY
ANXIETY
Category X-axis: without and with jittering (adding
normal random deviate with SD=.15 for sex)


50



   





1.00








    
  

    









1.50
sexrev



 




 

 
  


    


   




   

40
LLR Smoother
  
 


 
 


   
  
    
  




       

 

  
 

 



     
  
 
  
 
   


 
    
    
 

     
  
 

  
  

2.00


Jittering
• Basic idea- when looking at displays for
two or more groups, it is hard to tell where
data lie due to overlaying of points in most
plot programs, so
• Add a small random score to each “group”
score
– For example, for males (score 1) and females
(score 2), add a random number with std dev.
of say .1 to each male and female score
Jittering
• The result is a spreading out of all scores
around the Male or Female column in a
scatterplot:
.
.
.
Y
.
.
.
.
.
.
.
.
.
.
.
.
.
Male=1
Female=2
DATA VISUALIZATION
BIVARIATE
Loess lines: in SPSS an option under
GRAPH/ Interactive / Scatterplot labeled
“FIT” with METHOD = SMOOTHER
The Bandwidth multiplier has a 1.0
default; a smaller value will create more
bumps or curves in the overall curve
70
ANXIETY
60
50


























































































40














50





































LLR Smoother














60
70
80
DEPRESSION
GRAPH/INTERACTIVE/SCATTERPLOT/FIT/BANDWIDTH=1.0
GRAPH/INTERACTIVE/SCATTERPLOT/FIT/BANDWIDTH=.60
70
ANXIETY
60
50


























































































40














50



















































60
70
DEPRESSION
80
LLR Smoother