AP STAT: CHAPTERS 4 & 5 QUANTITATIVE DATA Quantitative Distributions: 1. Dotplots positives: see all observations Example: # of siblings of classmates negatives: not good for large sets of data 0 # of siblings of classmates 2. Stemplot (aka Stem and Leaf Plot) * no commas * leaves are single digits Example: Babe Ruth’s homerun totals each season for the Yankees: 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 41, 34, 22 Babe Ruth’s HR totals 2 2 5 3 4 5 4 1 1 6 6 7 9 Key: 5 4 4 9 2 2 = 22 homeruns 6 0 3. Back to Back Stemplot: Example: Roger Maris’ homerun totals for the Yankees: 08, 13, 23, 33, 28, 16, 14, 39, 26, 61 Create a Back-to-Back stemplot with Babe Ruth’s Roger Maris’ HR’s Babe Ruth’s HR’s 8 0 6 4 3 1 8 6 3 2 2 5 9 3 3 4 5 4 1 1 6 6 7 9 5 4 4 9 Key: 1 6 0 0 8 = 8 homeruns outlier Splitting Stems: When? When your data is clumped together within only a few stems In what ways can the stems be split? By 5, 2, 1 Age guess example…many people guessed ages in the 20’s and 30’s and so the data was clumped together 4. Histograms Example: *don’t see every piece of data # Or % bins= bar width * must use a minimum of 5 bars Histograms on the calculator: see page 46 in the book for help Example: The following are a list of test scores on an exam. Create a histogram of these scores. variable 40 61 66 68 74 77 84 91 76 82 87 53 64 57 51 67 81 65 45 62 66 69 75 78 84 95 76 82 90 67 71 59 64 70 85 68 49 64 66 69 76 80 85 96 64 68 72 76 81 86 99 76 96 72 BINS = 6 x-min = 40 x-max = 100 x scale (bar width) = 10 # Test Score (%) 4 types of histograms: Frequency (calc) # Relative Frequency % Cumulative Frequency # # Cumulative Relative Frequency % Histogram Examples: Example 7: Examples 8: Using the list INCOM, create a frequency histogram. Using the list GPA, create a relative frequency histogram. Describing Distributions (numerically): Center: MEDIAN - middle observation – 50 % above/below QUARTILES – medians of the lower and upper half of the data Q3 = median of upper half --- 75% of data falls below Q1 = median of lower half --- 25% of data falls below Spread: RANGE - (min, max) IQR – Q3 - Q1 ---- middle 50% of the data 5# Summary: Minimum 1st Quartile (Q1) Median (Q2) 3rd Quartile (Q3) Maximum Center: MEAN – arithmetic average Sigma (sum) FORMULA: x x On the calc 1-var stats (page 65 in book) individual piece of data i n total * read as “x-bar” * on calculator: 1-var stats… Spread: STANDARD DEVIATION & RANGE Sx (a, b) Standard deviation (s) is: a number that describes how much the data varies (or how spread out it is) * average difference of each point from the mean * the higher the number, the more spread out the data is Examples: s = 10 s=2 Formula: n s ( y y) i 1 2 i n 1 On calculator: STAT CALC 1 Var Stats mean standard deviation Properties: * s = 0 when all data is the same * Unless all point are the same, “s” is always POSITIVE *Iit is not resistant! It is affected by outliers * Variance = Standard Deviation square OR Ex: Variance = 16 …. Standard Deviation = 4 Standard Deviation = square root of variance Back to Quantitative Distributions: 4) Boxplots Graphical display of 5 number summary minimum Q1 Median (Q2) Outliers: 1.5 x IQR rule Lower Fence (LF) = Q1 1.5( IQR) Written: (LF, UF) Upper Fence (UF) = Q3 1.5( IQR) Example: Using the AGES in months data, test to see if there are any outliers Q3 maximum Modified Boxplot If outliers are present the whisker extends to the lat value that is not an outlier - The outlier is marked with an x, box, or point outiers Parallel Boxplots: 2 or more boxplots on the same scale * used to compare two or more similar variables On the calculator, compare the lists SATMF and SATMM which are the SAT math scores for males and females DESCRIBING DISTRIBUTIONS: Shape, Center, Spread SHAPE MODE: UNIMODAL BIMODAL SHAPES: UNIFORM SYMMETRIC LEFT SKEWED RIGHT SKEWED OUTLIERS GAPS CLUSTERED GRANULARITY OTHER: * consistent, understandable gaps CENTER: MEDIAN: used when comparing skewed distributions or distributions with outliers MEAN (average): used when comparing symmetric or roughly symmetric distributions SPREAD: * goes along with what measure of center you choose Mean standard deviation & range Median IQR & range Mean vs. Median: SYMMETRIC: mean median mean median mode LEFT SKEWED: mean < median median mode mean RIGHT SKEWED: mean > median median mean mode *the mean gets pulled towards the tail in the direction of the skew EXAMPLES (worksheet) Comparing 2 (or more) distributions * Need to mention shape, center, and spread of each * Must make a comparison of the shape, center, and spread - give specific numerical data - use comparison words like greater than, less than, higher than, lower than, the same as, equal to, different from - each thing can be compared in one sentence Example: The median for Block 1 was 76 points which was higher than the median of Block 2 which was 73 points. When comparing: Symmetric to symmetric : Use the mean, standard deviation, and range Skewed to skewed: Use the median, IQR, and range Skewed to symmetric: Use the median, IQR, and range CANNOT… Compare Mean to Median Example: Compare and describe the SATMF and SATMM data sets Shape: The males are unimodal with no outliers and right skewed which is different to the females which are unimodal with no outliers yet left skewed. Center: The median for males is 610 points which is lower than the median for females which is 700 points. Spread: The IQR for males is 55 points which is lower than the IQR for females which is 120 points. The range for males is (480, 700) which is smaller than the range for females which is (500, 800).