DATA VISUALIZATION UNIVARIATE (no review- self study) STEM & LEAF BOXPLOT BIVARIATE SCATTERPLOT (review correlation) Overlays; jittering Regression line overlay (see ASA website: http://nlvm.usu.edu/en/nav/frames_asid_144_g_4_t_5.html?open=activities DATA VISUALIZATION TOPICS GRAPHICAL DISPLAYS UNIVARIATE BIVARIATE ASSUMPTIONS OF MULTIPLE REGRESSION LINEARITY HOMOSCEDASTICITY ERROR INDEPENDENCE NORMALITY FIXING VIOLATIONS GRAPHICAL DISPLAYS • Frequency Histogram: – SPSS ANALYZE: Descriptive Statistics: Explore: Plot: Stem and Leaf – SPSS GRAPH: Boxplot (normal curve overlay available • or INTERACTIVE: Boxplot or Analyze: Frequencies – SPSS GRAPH: Histogram or Interactive: Histogram • # “bins” = 1 + log2(N) • Example: N= 500; #bins = 1+ 9 = 10 • Log2(512) = 9 (eg., 2x2x2x2x2x2x2x2x2=512) ANXIETY Stem-and-Leaf Plot Frequency .00 22.00 35.00 7.00 39.00 22.00 26.00 22.00 26.00 12.00 26.00 14.00 24.00 31.00 28.00 6.00 15.00 24.00 14.00 Stem & Leaf 3. 3 . 4444444444455555555555 3 . 66666666777777777777777777777777777 3 . 9999999 4 . 000000000000000000000000011111111111111 4 . 2222222222222222222222 4 . 44455555555555555555555555 4 . 6666666667777777777777 4 . 88888888888999999999999999 5 . 111111111111 5 . 22222222222222222223333333 5 . 44444444444444 5 . 666666777777777777777777 5 . 8888888888888999999999999999999 6 . 1111111111111111111111111111 6 . 333333 6 . 444444444444444 6 . 666666666666666666666666 6 . 88888899999999 Stem width: Each leaf: 10 1 case(s) 70 70 60 ANXIETY 60 50 50 40 40 30 ANXIETY 40 Frequency 30 20 10 Mean = 50.3 Std. Dev. = 10.147 N = 393 0 40 50 ANXIETY 60 70 50 Count 40 30 20 10 40 50 ANXIETY 60 GRAPHICAL DISPLAYS • Kernel Smoothing – SPSS Graph: INTERACTIVE: Line: Dots and Lines: Spline or Lagrange 3rd and 5th order fits – does not give you the smoother options (available for bivariate scatterplots- see later slides) Dot/Lines show counts 25 Count 20 15 10 5 40 50 ANXIETY 60 70 100 100 75 75 Count Count Dot/Lines show counts 50 50 25 25 0 0 10 12 14 age 16 18 10 12 14 age 16 18 Bivariate Displays • Scatterplots – Interval data – Category by interval- jittering – Regression fits- lowess lines • Scatterplot Matrices Interval Scatterplot: SPSS Graphics: Interactive: Scatterplot: Fit: Method:Smoother 60 ANXIETY 50 40 10 12 age 16 50 60 14 No Smoother 70 ANXIETY 70 18 40 10 12 LLR Smoother 14 16 18 age with Normal Smoother Interval Scatterplot: SPSS Graphics: Interactive: Scatterplot: Fit: Method:Smoother 70 ANXIETY 60 50 40 10 12 LLR Smoother 14 age 16 18 with Uniform Smoother 70 70 60 50 40 1.00 1.25 1.50 sex 1.75 2.00 60 LLR Smoother ANXIETY ANXIETY Category X-axis: without and with jittering (adding normal random deviate with SD=.15 for sex) 50 1.00 1.50 sexrev 40 LLR Smoother 2.00 Jittering • Basic idea- when looking at displays for two or more groups, it is hard to tell where data lie due to overlaying of points in most plot programs, so • Add a small random score to each “group” score – For example, for males (score 1) and females (score 2), add a random number with std dev. of say .1 to each male and female score Jittering • The result is a spreading out of all scores around the Male or Female column in a scatterplot: . . . Y . . . . . . . . . . . . . Male=1 Female=2 DATA VISUALIZATION BIVARIATE Loess lines: in SPSS an option under GRAPH/ Interactive / Scatterplot labeled “FIT” with METHOD = SMOOTHER The Bandwidth multiplier has a 1.0 default; a smaller value will create more bumps or curves in the overall curve 70 ANXIETY 60 50 40 50 LLR Smoother 60 70 80 DEPRESSION GRAPH/INTERACTIVE/SCATTERPLOT/FIT/BANDWIDTH=1.0 GRAPH/INTERACTIVE/SCATTERPLOT/FIT/BANDWIDTH=.60 70 ANXIETY 60 50 40 50 60 70 DEPRESSION 80 LLR Smoother