Chapter 3: Frequency Distributions March 16 In Chapter 3: 3.1 Stemplot 3.2 Frequency Tables 3.3 Additional Frequency Charts Stem-and-leaf plots (stemplots) • Always start by looking at the data with graphs and plots • Our favorite technique for looking at a single variable is the stemplot • A stemplot is a graphical technique that organizes data into a histogram-like display You can observe a lot by looking – Yogi Berra Stemplot Illustrative Example • Select an SRS of 10 ages • List data as an ordered array 05 11 21 24 27 28 30 42 50 52 • Divide each data point into a stem-value and leaf-value • In this example the “tens place” will be the stem-value and the “ones place” will be the leaf value, e.g., 21 has a stem value of 2 and leaf value of 1 Stemplot illustration (cont.) • Draw an axis for the stem-values: 0| 1| 2| 1 3| 4| 5| ×10 axis multiplier (important!) • Place leaves next to their stem value • 21 plotted (animation) Stemplot illustration continued … • Plot all data points and rearrange in rank order: 0|5 1|1 2|1478 3|0 4|2 5|02 ×10 • Here is the plot horizontally: (for demonstration purposes) 8 7 4 2 5 1 1 0 2 0 -----------0 1 2 3 4 5 -----------Rotated stemplot Interpreting Stemplots • Shape – Symmetry – Modality (number of peaks) – Kurtosis (width of tails) – Departures (outliers) • Location – Gravitational center mean – Middle value median • Spread – Range and inter-quartile range – Standard deviation and variance (Chapter 4) Shape • “Shape” refers to the pattern when plotted • Here’s the silhouette of our data X X X X X X X X X X ----------0 1 2 3 4 5 ----------- • Consider: symmetry, modality, kurtosis Shape: Idealized Density Curve A large dataset is introduced An density curve is superimposed to better discuss shape Symmetrical Shapes Asymmetrical shapes Modality (no. of peaks) Kurtosis (steepness) fat tails Mesokurtic (medium) Platykurtic (flat) skinny tails Leptokurtic (steep) Kurtosis is not be easily judged by eye Location: Mean “Eye-ball method” visualize where plot would balance Arithmetic method = sum values and divide by n 8 7 4 2 5 1 1 0 2 0 -----------0 1 2 3 4 5 -----------^ Grav.Center Eye-ball method around 25 to 30 (takes practice) Arithmetic method mean = 290 / 10 = 29 Location: Median • Ordered array: 05 11 21 24 27 28 30 42 50 52 • The median has a depth of (n + 1) ÷ 2 on the ordered array • When n is even, average the points adjacent to this depth • For illustrative data: n = 10, median’s depth = (10+1) ÷ 2 = 5.5 → the median falls between 27 and 28 • See Ch 4 for details regarding the median Spread: Range • Range = minimum to maximum • The easiest but not the best way to describe spread (better methods of describing spread are presented in the next chapter) • For the illustrative data the range is “from 5 to 52” Stemplot – Second Example • Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42 • Stem = ones-place • Leaves = tenths-place • Truncate extra digit (e.g., 1.47 1.4) Do not plot decimal |1|4 |2|03 |3|4779 |4|4 (×1) Center: between 3.4 & 3.7 (underlined) Spread: 1.4 to 4.4 Shape: mound, no outliers Third Illustrative Example (n = 25) • Data: {14, 17, 18, 19, 22, 22, 23, 24, 24, 26, 26, 27, 28, 29, 30, 30, 30, 31, 32, 33, 34, 34, 35, 36, 37, 38} • Regular stemplot: |1|4789 |2|223466789 |3|000123445678 (×1) • Too squished to see shape Third Illustration (n = 25), cont. • Split stem: – First “1” on stem holds leaves between 0 to 4 – Second “1” holds leaves between 5 to 9 – And so on. • Split-stem stemplot |1|4 |1|789 |2|2234 |2|66789 |3|00012344 |3|5678 (×1) • Negative skew - now evident How many stem-values? • Start with between 4 and 12 stem-values • Trial and error: – Try different stem multiplier – Try splitting stem – Look for most informative plot Fourth Example: Body weights (n = 53) Data range from 100 to 260 lbs: 192 152 135 110 128 180 260 170 165 150 110 120 185 165 212 119 165 210 186 100 195 170 120 185 175 203 185 123 139 106 180 130 155 220 140 157 150 172 175 133 170 130 101 180 187 148 106 180 127 124 215 125 194 Data range from 100 to 260 lbs: ×100 axis multiplier only two stemvalues (1×100 and 2×100) too broad ×100 axis-multiplier w/ split stem only 4 stem values might be OK(?) ×10 axis-multiplier see next slide Fourth Stemplot Example (n = 53) 10|0166 11|009 12|0034578 13|00359 14|08 15|00257 16|555 17|000255 18|000055567 19|245 20|3 21|025 22|0 23| 24| 25| 26|0 (×10) Looks good! Shape: Positive skew, high outlier (260) Location: median underlined (about 165) Spread: from 100 to 260 Quintuple-Split Stem Values 1*|0000111 1t|222222233333 1f|4455555 1s|666777777 1.|888888888999 2*|0111 2t|2 2f| 2s|6 (×100) Codes for stem values: * t f s . for for for for for leaves leaves leaves leaves leaves 0 and 1 two and three four and five six and seven eight and nine For example, this is 120: 1t|2 (x100) SPSS Stemplot SPSS provides frequency counts w/ its stemplots: Frequency Stem & 2.00 3 9.00 4 28.00 5 37.00 6 54.00 7 85.00 8 94.00 9 81.00 10 90.00 11 57.00 12 43.00 13 25.00 14 19.00 15 13.00 16 8.00 17 9.00 Extremes Stem width: Each leaf: Leaf . . . . . . . . . . . . . . . 3 . 0 means 3.0 years 0 0000 00000000000000 000000000000000000 000000000000000000000000000 000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000 000000000000000000000000000000000000000000000 0000000000000000000000000000 000000000000000000000 000000000000 000000000 000000 0000 (>=18) 1 2 case(s) Because of large n, each leaf represents 2 observations Frequency Table AGE • Frequency = count • Relative frequency = proportion or % • Cumulative frequency % less than or equal to level | Freq Rel.Freq Cum.Freq. ------+----------------------3 | 2 0.3% 0.3% 4 | 9 1.4% 1.7% 5 | 28 4.3% 6.0% 6 | 37 5.7% 11.6% 7 | 54 8.3% 19.9% 8 | 85 13.0% 32.9% 9 | 94 14.4% 47.2% 10 | 81 12.4% 59.6% 11 | 90 13.8% 73.4% 12 | 57 8.7% 82.1% 13 | 43 6.6% 88.7% 14 | 25 3.8% 92.5% 15 | 19 2.9% 95.4% 16 | 13 2.0% 97.4% 17 | 8 1.2% 98.6% 18 | 6 0.9% 99.5% 19 | 3 0.5% 100.0% ------+----------------------Total | 654 100.0% Frequency Table with Class Intervals • When data are sparse, group data into class intervals • Create 4 to 12 class intervals • Classes can be uniform or non-uniform • End-point convention: e.g., first class interval of 0 to 10 will include 0 but exclude 10 (0 to 9.99) • Talley frequencies • Calculate relative frequency • Calculate cumulative frequency Class Intervals Uniform class intervals table (width 10) for data: 05 11 21 24 27 28 30 42 50 52 Class 0 – 9.99 10 – 19 20 – 29 30 – 39 40 – 44 50 – 59 Total Freq Relative Freq. (%) Cumulative Freq (%) 1 1 4 1 1 2 10 10 10 40 10 10 20 100 10 20 60 70 80 100 -- Histogram A histogram is a frequency chart for a quantitative measurement. Notice how the bars touch. 5 4 3 2 1 Age Class 50 -5 9 40 -4 9 30 -3 9 20 -2 9 09 10 _1 9 0 Bar Chart A bar chart with non-touching bars is reserved for categorical measurements and non-uniform class intervals 500 450 400 350 300 250 200 150 100 50 0 Pre- Elem. Middle School-level High