Descriptive Statistics Summarizing data using graphs Which graph to use? • Depends on type of data • Depends on what you want to illustrate • Depends on available statistical software Bar Chart Birth Order of Spring 1998 Stat 250 Students 40 Pe ercent 30 20 10 Middle Oldest Only Birth Order n=92 students Youngest Bar Chart • Summarizes categorical data. • Horizontal axis represents categories, categories while vertical axis represents either counts ((“frequencies”) frequencies ) or percentages ((“relative relative frequencies”). • Used to illustrate the differences in percentages (or counts) between categories. Histogram Age of Spring 1998 Stat 250 Students Frequen ncy (Count) 50 40 30 20 10 0 18 19 20 21 22 23 24 Age (in years) n=92 students 25 26 27 Analogy Bar chart is to categorical data as histogram is to ... measurement data. Histogram • Divide measurement up into equal-sized categories. g • Determine number (or percentage) of measurements falling into each category. category • Draw a bar for each category so bars’ heights represent number (or percent) falling into the categories. • Label L b l andd title i l appropriately. i l Histogram Use common sense in determining number b off categories t i to t use. ((Trial-and-error works fine,, too.)) Too few categories Age of Spring 1998 Stat 250 Students Frequen ncy (Count) 60 50 40 30 20 10 0 18 23 Age (in years) n=92 students 28 Too many categories GPAs of Spring 1998 Stat 250 Students 7 Frequency (Count) 6 5 4 3 2 1 0 2 3 GPA n=92 students 4 Dot Plot Fastest Ever Driving Speed 226 Stat 100 Students, Fall '98 100 Men 126 Women 70 80 90 100 110 120 130 140 150 160 S Speed d Dot Plot • Summarizes measurement data. • Horizontal axis represents measurement scale. • Plot one dot for each data point. point Stem-and-Leaf Plot Stem-and-leaf of Shoes 12 63 (33) 43 25 12 8 4 4 2 2 1 1 1 1 1 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 N = 139 Leaf Unit = 1.0 223334444444 555555555555566666666677777778888888888888999999999 000000000000011112222233333333444 555555556667777888 0000000000023 5557 0023 00 0 5 Stem-and-Leaf Plot • Summarizes measurement data. • Each data point is broken down into a “stem” and a “leaf.” • First, First “stems” stems are aligned in a column. column • Then, “leaves” are attached to the stems. Box Plot Amount of sleep in past 24 hours of Spring 1998 Stat 250 Students 10 9 Hours o of sleep 8 7 6 5 4 3 2 1 0 Box Plot • Summarizes measurement data. • Vertical (or horizontal) axis represents measurement scale. • Lines in box represent the 25th percentile (“first quartile”), the 50th percentile ((“median”) median ), and the 75th percentile ((“third third quartile”), respectively. An aside... • Roughly speaking: – The “25th 25th percentile” percentile is the number such that 25% of the data points fall below the number. – The “median” or “50th p percentile” is the number such that half of the data points fall below the number. – The “75th percentile” is the number such that 75% of the data points fall below the number. Box Plot (cont’d) • “Whiskers” are drawn to the most extreme data p points that are not more than 1.5 times the length of the box beyond either quartile. – Whiskers are useful for identifying outliers. • “Outliers,” or extreme observations, are denoted by asterisks asterisks. – Generally, data points falling beyond the whiskers are considered outliers. outliers Using Box Plots to Compare Fastest Ever Driving Speed 226 Stat 100 Students, Fall 1998 Fastest Spe F eed (mph) 160 110 60 female male G d Gender Which graph to use when? • Stem-and-leaf plots and dotplots are good for small data sets,, while histograms g and box plots are good for large data sets. • Boxplots and dotplots are good for comparing two groups. • Boxplots are good for identifying outliers outliers. • Histograms and boxplots are good for id if i “shape” identifying “h ” off data. d Scatter Plots F t sizes Foot i off Spring S i 1998 St Statt 250 students t d t 31 Right fo oot (in cm) 30 29 28 27 26 25 24 23 22 22 23 24 25 26 27 28 Left foot (in cm) n=88 88 students t d t 29 30 31 Scatter Plots • Summarizes the relationship between two measurement variables. • Horizontal axis represents one variable and vertical axis represents second variable. variable • Plot one point for each pair of measurements. measurements No relationship Lengths g of left forearms and head circumferences of Spring 1998 Stat 250 Students 32 Left forearrm (in cm) 31 30 29 28 27 26 25 24 23 22 52 57 Head circumference ((in cm)) n=89 students 62 Closing comments • Many possible types of graphs. • Use common sense in reading graphs graphs. • When creating graphs, don’t summarize your data too much or too little little. • When creating graphs, label everything for others. h R Remember b you are trying i to communicate something to others!