Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds ENV 2006 2.1 Lecture Outline • Preliminaries – Definitions – Datatypes • Simple Data Presentation – Graphs and charts ENV 2006 2.2 Fundamentals • Basic Datatypes correspond to different levels of measurement • Data can be: • Categorical – Nominal • No sense of order • Apples, oranges,… – Ordinal • Ordered in sequence • January, February, .. – Categorical - labels – Numerical – numbers • Numerical – Continuous • Real numbers • Height of students in class – Discrete • Typically whole numbers • Marks in an exam ENV 2006 2.3 Question • Give an example for each class in which numbers are involved… • Categorical - nominal • Categorical - ordinal • Numerical – continuous • Numerical - discrete ENV 2006 2.4 Exploratory Data Analysis • • • • Two stage process: – Exploratory: Search for evidence using all tools available – Confirmatory: evaluate strength of evidence using classical data analysis Pioneering figure is John Tukey New approach to data analysis, heavily based on visualization, as an alternative to classical data analysis See wikipedia ENV 2006 2.5 Simple Data Presentation ENV 2006 2.6 Simple Data Presentation • Simple data tables are often presented as line graphs, bar graphs, pie charts, dot graphs, histograms… • Which should we use and when? ENV 2006 2.7 Line Graph • • Fundamental technique of data presentation Used to compare two variables – X-axis is often the control variable – Y-axis is the response variable • Students participating in sporting activities Good at: – Showing specific values – Trends – Trends in groups (using multiple line graphs) Mobile Phone use Any critical comments here? Note: graph labelling is fundamental ENV 2006 2.8 Simple Representations – Bar Graph • Bar graph – Presents categorical variables – Height of bar indicates value – Double bar graph allows comparison – Note spacing between bars – Can be horizontal (when would you use this?) Number of police officers Internet use at a school Note more space for labels ENV 2006 2.9 Dot Graph • • Very simple but effective… Horizontal to give more space for labelling ENV 2006 2.10 Pie Chart • • • Pie chart summarises a set of categorical/nominal data But use with care… … too many segments are harder to compare than in a bar chart Should we have a long lecture? Favourite movie genres ENV 2006 2.11 Histograms • • Histograms summarise discrete or continuous data that are measured on an interval scale No gaps if variable is continuous Distribution of salaries in a company ENV 2006 2.12 Scatter Plot • • Used to present measurements of two variables Effective if a relationship exists between the two variables Car ownership by household income Example taken from NIST Handbook – Evidence of strong positive correlation ENV 2006 2.13 Scatter Plots in Excel • • The scatter plot is a fundamental tool in Excel Chart type XY (Scatter) and subtype Unconnected Points http://www2.ncsu.edu:8010/ncsu/chemistry/resource/excel/excel.html ENV 2006 2.14 Regression Line • Excel allows you to add a linear regression line (trend line) Remember: correlation does not imply causality… ie a relationship exists but one is not necessarily causing the other – there may be a third factor? ENV 2006 2.15 Tukey Sum-Difference Plot Better understanding of residuals … ENV 2006 2.16 Box Plots • • • • In some situations we have, not a single data value at a point, but a number of data values, or even a probability distribution When might this occur? Tukey proposed the idea of a boxplot to visualize the distribution of values For explanation and some history, see: M – median Q1, Q3 – quarrtiles Whiskers – 1.5 * interquartile range Dots - outliers Darwin’s plant study http://mathworld.wolfram.com/Box-andWhiskerPlot.html http://en.wikipedia.org/wiki/Box_plot http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html ENV 2006 2.17 Acknowledgement • Thanks to Statistics Canada – an excellent web site for simple data presentation – http://www.statcan.ca/english/edu/power/toc/contents.htm ENV 2006 2.18 Exercise for next week • Understand a bit more about the merits of pie charts and bar graphs • Create a dataset with roughly equal numbers in each class • Which is best if the task is to discriminate? ENV 2006 2.19 Exercise for next week • Over the next week look for examples of basic graphs – In newspapers, magazines or other print media – On news web sites or other electronic media • Analyse two examples – One should be a example where you think the use of graphics is good – One should be bad • Be ready next week to present these results to the class… ENV 2006 2.20 Envisioning Information : Practical Work Gnuplot R Excel ENV 2006 2.21