Math 3307 Lecture Notes Perkowsky text Monday format May’13 Jan. 2015 Chapters 1 - 3 Activites 1 Activities 2 Homework Assignments 10 points each problem or part Homework 1 – 70 points Chapter 1 2, 4, 8 Chapter 2 2, 4, 6, 8 Homework 2 – 90 points Chapter 3 3, 4(b, e, f, h, j), 10, 12, 14 Homework style sheet and rules: Work on one side only; pdf it and upload it before the deadline on the calendar. Work that is poorly scanned or illegible will be given a zero. This includes sideways or upside down scans! Do NOT crowd the work, leave at least 3” between problems. Label the answers carefully so the grader can grade efficiently. 1 Chapter 1 Elements of Statistics Let’s imagine that you have been hired to collect information on the workload and responsibilities of middle school teachers in the USA. A. Where would you start? Would you try to contact every middle school teacher in the country? What would you do to get the data? B. What types of information would you collect…how would you decide what is important to know in describing the areas of interest? 2 1.1 Getting started ACTIVITIES 1 - Definition Look in the book at the definition. How does it compare to yours? Statistics: Descriptive statistics Definition and examples Inferential statistics Definition and examples 3 Descriptive Statistics Problems – by group! DS1 Which of the following conclusions may be obtained from the following data by purely descriptive methods and which require generalizations? A student in my Spring Pre-calculus class took 4 consecutive daily quizzes and got the following scores: 3, 8, 10, and 12. a.) On only 1 day did he get less than 5 right. b.) The student’s number correct increased on each successive quiz. c.) The student got better at guessing what I was going to ask each day. d.) On the last day the student copied his answers from his neighbor. DS2 Smith and Jones are hairdressers. On a recent day, Smith cut the hair of 4 male clients and 2 female clients. While Jones cut hair on 3 males and 3 females. a.) The amount of time it takes Smith and Jones to do a haircut is approximately the same. b.) Smith always cuts hair on more males than females. c.) The two always have the same number of clients per day. d.) Over a week, Smith averages 6 clients a day. 4 ACTIVITES 1 DS3 More definitions: page 3 in the text Variable Data and Data Set Raw data Population/Sample Population parameter Population statistics Sample statistics 5 Focus on understanding: A local school district would like to conduct a survey to estimate the percentage of the registered voters in the district who would support a school bond levy (tax). To determine the level of support, the school board surveys 1,000 registered voters from their district. What are: The population The sample The variable(s) Raw data Sample statistics Population parameters ACTIVITES – USING THE VOCABULARY 6 Sampling Techniques pages 4 - 7 Simple random sampling ***Graphing Calculators: Let’s generate a random sample and talk about how to use it creatively.*** Systematic sampling Convenience sampling Cluster sampling Stratified sampling Bias in Data Collection page 9 IMPORTANT to know about or discover! Classroom connection Television stations, radio stations, and newspapers often predict the winners of important elections long before the votes are counted. They make these predictions based on polls. A What factors might cause a prediction to be inaccurate? B Political parties often conduct their own pre-election polls to find out what voters think about their campaign and their candidates. How might a political party bias such a poll? 7 1.2 Types of Data Let’s come up with examples of the following: Categorical/Qualitative Data Numerical/Quantitative Data Nominal Data page 12 Ordinal Data Interval Data Ratio Data Discrete/Continuous Chapter 1 summary: OYO. Note: essay questions on the tests. Example: In 3 paragraphs, compare and contrast 2 different types of data. 8 Chapter 2 Organizing and Displaying Data 2.1 Displaying Categorical Data Frequency and Relative Frequency Tables Pages 21 - 23 Read and review in your group. ACTIVITIES – The eyes have it! Dot diagrams: (line plots – page 33) These summarize data visually and quickly. Put one dot for each observation. Note that you don’t need to sort the data to make a dot diagram. For example: If I toss a die 6 times and get: 1 4 5 6 1 2 I’d put a horizontal line down and mark off the 6 possible numbers and then put a dot above each recorded value: 9 DD Problem 1 215013207134241225134 311024113235224403140 This data summarizes the number of times per week that a small regional airport with 48 flights per day that there are delayed takeoffs. Make a dot diagram and analyze the data completely. Dot diagrams are also useful with qualitative or categorical data. ACTIVITIES DD Problem 2 10 Bar Graphs and Circle graphs Example: Here is a distribution of information about Americans aged 18 or older: Marital status Count Percent Single 41.8 22.6 Married 113.3 61.1 Widowed 13.9 7.5 Divorced 16.3 8.8 In Millions There are a couple of ways to display this information graphically. One is a histogram or bar chart and another is a pie chart or circle graph. Pie chart 11 Histogram Why was it important to use the percentages and not the raw counts in both representations? See page 24 for a useful summary of which type of representation to use when. 12 2.2 Displaying Quantitative Data Frequency and Relative Frequency Tables The Rules page 26 Classes: upper limits, lower limits, class mark Class boundaries Example: Fifty candidates entering an astronaut training program were given a psychological profile test measuring bravery. NASA grouped the data to make it more compact. Note that the scores are grouped into units of the SAME length. Why is this important? Would you present this as a pie chart? A dot diagram? A bar chart or histogram? Score in points # of candidates 100 - 119 18 60 - 79 8 120 - 139 8 80 - 99 16 140 - 159 6 What do you think about the extreme values on the results? 13 Stem and Leaf Plots page 30 An improvement on dot diagrams, stem and leaf plots work on data with many various measurements. It is fairly low tech and can be quickly done in a meeting or on the fly. I find them exceptionally useful in small classes (n < 50) for a quick grade analysis. The stems are the 10’s and the leaves are the single digits in each day’s total. It can be useful to organize the leaves in order, too. Here is one of my classes, a final: 10 123 09 45779 08 327758 07 459 06 78 BELOW 1111 Turn the page sideways (anti clockwise)…note the resemblance to a dot diagram! What does this tell you about my class? Note that in each case, there was somebody pretty close to the next level. What grade is “BELOW”? Sometimes if the data is unusually condensed, you might split the stems making more rows rather than fewer rows. 14 Here are some quiz grades out of 130 points: 112 114 114 116 118 119 120 121 122 123 124 125 125 126 127 127 129 The best data presentation is to show 110 – 114, 115 – 119, 120 – 124, 125 – 129 rather than just 2 stems with LOOOOONG leaf lines: 11 11 12 12 244 689 01234 556779 Note that the stems are now both a hundreds and a tens digit! Count the data points off the stem and leaf diagram. Where is the median? The 80th percentile? 15 SL Problem 1 A hotel has 85 rooms. In February of last year they had the following rental statistics: 75 79 37 57 60 64 35 73 62 81 43 72 78 54 69 75 78 49 59 80 58 76 52 49 42 62 81 77 Produce a stem and leaf plot of this data. 16 ACTIVITIES - SL Problem 2 17 SL Problem 3 Decide which representation you’d like to use with this data to show the age of the presidents at inauguration. Dot diagram or stem and leaf. Why did you pick what you did? Produce the display on the page provided at the end of the data. Presidents Find information about U.S. presidents, including party affiliation, term in office, age at inauguration, age at death, and more. State Name and of 1 (party) Term birth 1789 Washingto – 1. n (F)3 1797 Religion2 Born Died Va. 2/22/1732 12/14/179 Episcopalian 9 Age at inaug . Age at deat h 57 67 J. Adams 2. (F) 1797 – 1801 Mass. 10/30/173 7/4/1826 5 Unitarian 61 90 Jefferson 3. (DR) 1801 – 1809 Va. 4/13/1743 7/4/1826 Deist 57 83 Madison 4. (DR) 1809 – 1817 Va. 3/16/1751 6/28/1836 Episcopalian 57 85 Monroe 5. (DR) 1817 – 1825 Va. 4/28/1758 7/4/1831 58 73 6. J. Q. 1825 Mass. 7/11/1767 2/23/1848 Unitarian 57 80 Episcopalian 18 Adams (DR) – 1829 1829 – 1837 S.C. 3/15/1767 6/8/1845 1837 Van Buren 8. – (D) 1841 N.Y. W. H. 9. Harrison (W)4 1841 10 Tyler (W) . 11 Polk (D) . Jackson 7. (D) 61 78 12/5/1782 7/24/1862 Reformed Dutch 54 79 Va. 2/9/1773 Episcopalian 68 68 1841 – 1845 Va. 3/29/1790 1/18/1862 Episcopalian 51 71 1845 – 1849 N.C. 11/2/1795 6/15/1849 Methodist 49 53 Va. 11/24/178 7/9/1850 4 Episcopalian 64 65 Unitarian 50 74 1849 12 4 Taylor (W) – . 1850 4/4/1841 Presbyterian 13 Fillmore . (W) 1850 – 1853 N.Y. 1/7/1800 14 Pierce (D) . 1853 – 1857 N.H. 11/23/180 10/8/1869 Episcopalian 4 48 64 15 Buchanan . (D) 1857 – 1861 Pa. 4/23/1791 6/1/1868 65 77 16 Lincoln . (R)5 1861 – 1865 Ky. 2/12/1809 4/15/1865 Liberal 52 56 3/8/1874 Presbyterian 19 1865 17 A. Johnson – . (U)6 1869 N.C. 12/29/180 7/31/1875 (7) 8 56 66 18 Grant (R) . 1869 – 1877 Ohio 4/27/1822 7/23/1885 Methodist 46 63 19 Hayes (R) . 1877 – 1881 Ohio 10/4/1822 1/17/1893 Methodist 54 70 20 Garfield . (R)5 1881 Ohio 11/19/183 Disciples of 9/19/1881 1 Christ 49 49 21 Arthur (R) . 1881 – 1885 Vt. 10/5/1829 11/18/188 Episcopalian 6 50 56 22 Cleveland . (D) 1885 – 1889 N.J. 3/18/1837 6/24/1908 Presbyterian 47 71 Ohio 8/20/1833 3/13/1901 Presbyterian 55 67 1889 23 B. Harrison – . (R) 1893 24 Cleveland . (D)8 1893 – 1897 N.J. 3/18/1837 6/24/1908 Presbyterian 55 71 25 McKinley . (R)5 1897 – 1901 Ohio 1/29/1843 9/14/1901 Methodist 54 58 T. 26 Roosevelt . (R) 1901 – 1909 N.Y. 10/27/185 1/6/1919 8 Reformed Dutch 42 60 27 Taft (R) . 1909 – 1913 Ohio 9/15/1857 3/8/1930 Unitarian 51 72 20 28 Wilson (D) . 1913 – 1921 Va. 12/28/185 2/3/1924 6 Presbyterian 56 67 29 Harding . (R)4 1921 – 1923 Ohio 11/2/1865 8/2/1923 Baptist 55 57 30 Coolidge . (R) 1923 – 1929 Vt. 7/4/1872 1/5/1933 Congregationali st 51 60 1929 31 Hoover (R) – . 1933 Iowa 8/10/1874 10/20/196 Quaker 4 54 90 F. D. 32 Roosevelt . (D)4 1933 – 1945 N.Y. 1/30/1882 4/12/1945 Episcopalian 51 63 33 Truman . (D) 1945 – 1953 Mo. 5/8/1884 60 88 1953 34 Eisenhowe – . r (R) 1961 Tex. 10/14/189 3/28/1969 Presbyterian 0 62 78 43 46 55 64 12/26/197 Baptist 2 35 Kennedy . (D)5 1961 – 1963 Mass. 5/29/1917 L. B. 36 Johnson . (D) 1963 – 1969 Tex. 8/27/1908 1/22/1973 37 Nixon (R)9 . 1969 – 1974 Calif. 1/9/1913 4/22/1994 Quaker 56 81 38 Ford (R) . 1974 – Neb. 7/14/1913 12/26/200 Episcopalian 6 61 — 11/22/196 Roman Catholic 3 Disciples of Christ 21 1977 39 Carter (D) . 1977 – 1981 Ga. 10/1/1924 — Southern Baptist 52 — 40 Reagan . (R) 1981 – 1989 Ill. 2/6/1911 Disciples of Christ 69 93 41 G.H.W. . Bush (R) 1989 – 1993 Mass. 6/12/1924 — Episcopalian 64 — 1993 42 Clinton (D) – . 2001 43 G. W. . Bush (R) 2001 – 2009 44 2009 Obama (D) . – 6/5/2004 Ark. 8/19/1946 — Baptist 46 — Conn. July 6, 1946 — Methodist 54 — Hawai Aug. 4, i 1961 — United Church of Christ 47 NOTE: 1. F—Federalist; DR—Democratic-Republican; D—Democratic; W—Whig; R—Republican; U—Union. 2. Religious affiliation at election. Several presidents changed religions during their lifetimes. 3. No party for first election. The party system in the U.S. made its appearance during Washington's first term. 4. Died in office. 5. Assassinated in office. 6. The Republican National Convention of 1864 adopted the name Union Party. It renominated Lincoln for president; for vice president it nominated Johnson, a War Democrat. Although frequently listed as a Republican vice president and president, Johnson undoubtedly considered himself strictly a member of the Union Party. When that party broke apart after 1868, he returned to the Democratic Party. 7. Johnson was not a professed church member; however, he admired the Baptist principles of church government. 8. Second nonconsecutive term. 9. Resigned Aug. 9, 1974. 22 Worksheet – presidents continued What if we want to know: “Are we electing younger people than earlier in our history?” j Consider a time series*! Find this in your book and discuss why it might answer the question better than the preceding presentation How could you present the categorical data? Party affliation, home state, religion…decide (without doing!) how you would present each type of categorical data. *a chronological presentation with time on the x axis. 23 Histograms ***Calculator p.66 – 69…graphing a histogram Let’s graph the following data together in our calculators, making a histogram: First discuss each column and what each means! Measurement number 1 0 2 3 3 1 4 5 5 2 6 7 7 5 8 6 9 3 10 0 11 1 12 0 13 2 24 A new, expanded style of bar/histogram: double sided…note the technique for comparing data sets! United States AGE DISTRIBUTION When drawn as a "population pyramid," age distribution can hint at patterns of growth. A top heavy pyramid, like the one for Grant County, North Dakota, suggests negative population growth that might be due to any number of factors, including high death rates, low birth rates, and increased emigration from the area. A bottom heavy pyramid, like the one drawn for Orange County, Florida, suggests high birthrates, falling or stable death rates, and the potential for rapid population growth. But most areas fall somewhere between these two extremes and have a population pyramid that resembles a square, indicating slow and sustained growth with the birth rate exceeding the death rate, though not by a great margin. Let’s talk about what we can see here in this pyramid. 25 Line Graphs page 35 Usually time is the horizontal axis. These are plotted just like graphing in algebra! Now let’s look at page 36, the Classroom Connection illustration and talk about it. 26 2.3 Misleading graphs Read it in class. Let’s discuss it together. Not in the book, but good to know! Simpson’s Paradox and Averages We’ve already seen that averages can be misleading. There’s another way that they can mislead discovered and publicized by Dr. Simpson in the 1960’s. You need to be careful that the categories over which you are averaging are actually comparable! Here’s an excerpt from STATS: Data and Models (ISBN 0-321-20054-3, Pearson) p. 24: One famous example of Simpson’s Paradox arose during an investigation of admission rates for men and women at the University of California at Berkeley’s graduate schools. As reported in Science, about 45% of male applicants were admitted while only about 30% of female applicants got in. It looked like a clear case of discrimination. However, when the data were broken down by school (Engineering, Law, Medicine, etc.) it turned out that women were admitted at nearly the same or, in some cases, much higher rates than the men. How could this be? 27 Women applied in large numbers to schools with very low admissions rates (Law and Medicine, for example, admitted fewer than 10%). Men tended to apply to Engineering and Science. Those schools have admission rates above 50%. When the average was taken, the women had a lower overall rate but the average didn’t really make sense. Often you need to check more closely into the categories within each variable to get the true picture. Here’s the data on the graduate admissions from the 1975 issue of Science: Males accepted/ Females accepted/ applicants applicants Program 1 511/825 89/108 Program 2 352/560 17/25 Program 3 137/407 132/375 Program 4 22/373 24/341 1022/2165 262/849 Let’s do some comparisons: What are the overall averages? What are the averages within program categories? 28 ACTIVITIES – Simpson’s Paradox Chapter 2 Summary read on your own. Here’s a sample test question: Given these grades how will we check them out, compare and categorize? Show more than one way to do this. Discuss the benefits/problems with each way you present. 99, 79, 56, 98, 82, 71, 85, 92, 83, 75, 65, 94, 83 29 Chapter 3 Describing Data with Numbers 3.1 Measures of Center These are the numbers that describe what is normal, usual, and in the middle or the center. These terms are very loose and need firming up mathematically, of course. Mode x Median x Mean x ~ Mode One measure of central tendency is the Mode. This is the number that occurs most frequently in a data set. The data set doesn’t always have a mode – if each data point is a different number the set is mode-free. The mode is always a number in the data set, if there is one. Some data sets have a mode; some are bi-modal or multimodal. 30 Problem Mode 1 Which of the following bars shows the mode in this histogram? Age and saying No Number of No's per hour 6 5 4 Series1 3 2 1 0 1 2 3 4 5 6 Age 31 Median Another measure of central tendency is the Median: The median is the value that is at the numerical middle of the data if there are an odd number of data points and they are arranged in order by size. It is the mean of the 2 middle data points if the number of data points is even and arranged in order by size. The formula for finding the location of the median for n data points is 0.5(n + 1). The process is to order the data and then find the measurement at that location. Problem Median 1 Find the median location for Data set A. n = 19 data points Data set B. n = 52 data points Is the measurement equal to it’s location number? ACTIVITIES Median Problem 2 32 Problem Median 2 In golf the holes are rated for a recommended number of strokes needed to sink the golf ball into the hole. A score of par means the golfer used the recommended number, a birdie is one fewer than recommended, a bogey is one more than the recommended number, an eagle is 2 fewer strokes. At a recent televised tournament, 7 golfers had the following scores, ranked alphabetically by last name: par, birdie, par, par, birdie, bogey, and eagle. Where is the median score located? What is the median score? 33 Problem Median 3 The data shown in the table are the median prices of existing homes in the USA from 1981 through 1986. If the average prices of existing homes were calculated for each of these years, how do you think these values would compare to the median prices shown? Would the average price be higher, lower, or the same? Year Median 1981 66,460 1982 67,800 1983 70,300 1984 72,400 1985 75,500 1986 80,300 34 Mean The most popular measure of “centeredness” is the Mean (sometimes called the average). The mean of n numbers is the sum of the numbers divided by n. If you are working with a data set of measurements, the mean is denoted: x . There are some very cogent reasons for its popularity: It can always be calculated and it’s easy to calculate. It is unique: there is only ONE mean for a data set. It uses EVERY data point; nothing is eliminated. It doesn’t depend on chance or luck. There are some equally important reasons to take the mean with a grain of salt: It is heavily affected by outliers! Let’s look at this. Here is a list of home prices: $77,500 $78,200 $137,000 $110,500 $1,800,300 What is the AVERAGE? Is this a measure of center, usual, normal? What happened? What might we use instead of mean? 35 Do these 2 problems by group then discuss weighted mean Problem CT1 An elevator in PGH is designed to carry a maximum load of 3,200 pounds. If it is loaded with 18 people with a mean weight of 166 pounds, is it in any danger of being overloaded? Problem CT2 Having received a bonus of $20,000 for accepting early retirement, a company’s sales representative invested $6,000 in a bond paying 3.75%, $10,000 in a mutual fund paying 3.96%, and $4,000 in a CD paying 3.25%. Find the weighted mean of these percentages. 36 Weighted mean – DISCUSS together Problem CT3 A lecturer counts the final exam in a course 4 times as much as each of the 3 small exams during the semester. Which of the following students has the higher average? Test 1 Test 2 Test 3 Final Mikey 72 80 65 82 Lizbeth 81 87 75 78 37 Relationships among Mean, Median, and Mode, 1 problem plus one with 3 parts. Problem CT4 The data shown in the table are the median prices of existing homes in the USA from 1981 through 1986. If the average prices of existing homes were calculated for each of these years, how do you think these values would compare to the median prices shown? Would the average price be higher, lower, or the same? Year Median 1981 66,460 1982 67,800 1983 70,300 1984 72,400 1985 75,500 1986 80,300 38 Problem CT5 Here are 3 data sets. The graphs for them follow. x axis STTR STTL Symm 1 1 1 1 2 2 2 2 3 4 3 3 4 5 4 4 5 4 5 5 6 3 6 5 7 2 8 4 8 2 5 3 9 1 4 2 10 1 3 1 Calculate mean, median, and mode for these 3 charts. Mark on the x-axis where each goes. How many data points in each set? 39 Skewed to the right 6 5 4 Series1 3 2 1 0 1 2 3 4 5 6 7 8 9 10 Skewed to the left 9 8 7 6 5 Series1 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 40 Symmetric 6 5 4 Series1 3 2 1 0 1 2 3 4 5 6 7 8 9 10 Summarize your results with a mnemonic device. Which measurement is most sensitive to outliers? Mean or Median? What does it mean to say “most sensitive” Discuss this idea using the salaries of baseball players. ACTIVITIES MMM – 12 points! 41 3.2 Measures of Spread or Variability Range Max - Min ***Graphing Calculator, page 60 Variance: Mean deviation p. 58 xx n The mean deviation is calculated by doing the following: Calculate the mean. Subract the mean from each data point. Take the absolute value of each difference. Add up the positive differences. Divide by n, the number of data points. Standard deviation Variance: p. 60 x x 2 n 1 The standard deviation for a set of data is the square root of the variance. ***graphing calculator p. 61*** 42 The sample variance is calculated by doing the following: First calculate the sample mean, then subtract the mean from each measurement individually and square the answer. Add up all the squares and divide by n 1. Example: Given the following data points find the mean deviation and the standard deviation along with the measures of central tendency. What is the range? Display the data…why did you choose what you did for the display? 5, 6, 9, 0, 1, 6, 11, 5 43 Measures of Variability Problem MV 1 Calculate the mean for each sample below. Calculate the range and variance for each sample. Discuss the information available in the variance. N=5 1.2 1 0.8 Series1 0.6 0.4 0.2 0 1 2 3 4 5 44 N=5 3.5 3 2.5 2 Series1 1.5 1 0.5 0 1 2 3 4 5 45 ACTIVITES Problem MV 2 Problem MV 3 – do in groups in class – 3 problems to discuss Three sets of data are shown below. What are the number of data points in each set? What is the mean for each set (do this WITHOUT a calculator!). Rank the sets from the most variable to the least variable and tell why you made those choices. (again: calculator free). Hint: use the formula for variance to help you reason it out! s 2 ( x x) 2 n 1 46 Data set 1 7 6 Frequency 5 4 Series1 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Measurement 47 Data Set 2 6 Frequency 5 4 Series1 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Measurement 48 Frequency Data Set 3 10 9 8 7 6 5 4 3 2 1 0 Series1 1 2 3 4 5 6 7 8 9 10 11 Measurement 49 ACTIVITIES Problem MV 4 Not in the book, but helpful to know! Grouped Data for Variance calculations If f is the frequency of a data measurement, then the following formula calculates the variance for the data: n s2 f ( x x) i 1 i 2 i n 1 Translate the formula to words in groups! Share around! 50 Problem MV 5 The data in the following table are for the inner diameters of some tubes manufactured by a machine. This table is called a “distribution” because it gives the values and their frequency. Find the mean diameter and the variance for the tubes. D, inches frequency 2.0 2 2.2 4 2.3 6 2.8 3 3.0 5 51 Problem MV 7 The following table is a distribution of the top speeds in mph at which 30 racers were clocked in an auto race. Find the mean and variance for the race. Top Speed Number of racers 145 9 150 8 160 11 170 2 52 3.3 Measures of Position Percentile Rank Decile Quartile Percentile A fractile ranking means that a given number of measurements lie below the given measurement and a given number above. Suppose your child comes home to tell you that she’s in the 90th percentile of her class on a particular test. This means that 90% of the children have lower scores or the same score as she does and 10% have higher scores. You do need to be a little careful with these measurements of relative ranking, though. It could be that 91% of the children failed the test and 9% passed. In this scenario, of course, being in the 90% percentile isn’t much to brag about. You need absolute measures AND relative measures to evaluate a situation about fractiles. Deciles divide the measurements into 10ths and quartiles divide the measurements into quarters. The median is both a decile and a quartile ranking. Let’s look at quartiles: Q1 is the median of all measurements less than the median of the data set. Q3 is the median of all measurements greater than the median of the data set. And deciles: D1 is the measurement such that 90% of the measurements are BIGGER than it. 53 Problem FP 1 The following numbers are weekly lumber production (in million board feet) for a company in Oregon. Find the first quartile and the 90th percentile for the data. 390 406 447 410 370 338 410 320 359 392 315 480 54 Not in the book, but handy to know! Percentage change in a measurement: The percent change in a measurement is often of interest to managers, doctors, and teachers. It is used as a measure of efficacy. The calculation is final - initial initial Suppose you have a student who was reading poorly – 15 words a minute. You train the student using your favorite method and test him again to find him reading 27 words a minute. The percent change is 27 15 15 which is 80%. You would then report an 80% improvement in speed. 55 Problem PC 1 You’ve been looking at a sweater in the store but it costs $135 and that’s too much. BUT one day you go and check and it’s been marked down to $65…what is the percent change? Problem PC2 A student has been working with a tutor on his math skills. His weekly quiz average was a 65% when he started with the help program. His quizzes are 30 points each. During the program his weekly grades are 20, 23, 21, 28, 27, 29 What is the percent change in his average? Would you say that the tutoring helped? ACTIVITIES – PERCENT CHANGE 56 The Empirical Rule page 71 Given a normal distribution (continuous, symmetric, mound-shaped) 68% of the data will lie inside 1 standard deviation from the mean 95% of the data will lie inside 2 standard deviations from the mean 99% of the data will lie inside 3 standard deviations from the mean Let’s sketch this: Z-score – a number that tells you how far a measurement is from the mean. Usual, unexceptional data points will be 1 1.5 s Think C’s on the positive end Unusual will be 1.2 2.5 Rare and outliers will be 2.5 and up or down Think of a grading scheme and standard deviations here: let’s put in standard deviations and letter grades: 57 Here is one of my classes, a listing of the grades on the final…raw data and real This is a stem-and-leaf diagram. 10 123 09 45779 08 327758 07 459 06 78 05 354 How many students were in my class? What is the mean and the standard deviation? s2 ( x x) 2 n 1 Which grade is at the 80% percentile? How far is the 85 from the mean in terms of the standard deviation? 58 ZS Problem 1 If you have 2 students applying for entrance to a G&T program and you have room for only one, which one will you pick based on the following test information? Gina got a 78 on a test with an average of 72 and a standard deviation of 5. Mike got an 87 on a test with an average of 85 and standard deviation 1.5. Who is the stronger student and how do you know? 59 ZS Problem 2 Given the following distribution – Arrange in a dot diagram. Follow the directions on the next page. Measurement number 1 0 2 3 3 1 4 5 5 2 6 7 7 5 8 6 9 3 10 0 11 1 12 0 13 2 60 Discuss the measures of central tendency mean median mode the measures of variability range variance standard deviation and give the z score for the measurement 7. Verify the Empirical Rule by making a dot or bar chart of the data and marking off where each of the standard deviations from the mean are with respect to the data points . ( s, 2s, 3s) 61 ZS Problem 3 The mean salary of the employees at a high school in Missouri is $28, 500 with a standard deviation of $2,100. Discuss the Empirical Rule and who might fit where on a bar chart of employee salaries. The state announces a flat raise of $500 per employee for the next year. Find the mean and standard deviation of the new salaries. Who will benefit the most in a percentage change analysis? 62 ZS Problem 4 Given that the mean is 9.0 and the standard deviation is 1.4 on the data below, give the numbers of the 2,000 data points that should be within 1, 2, and 3 standard deviations of the mean. Then count the numbers that actually ARE within these bounds. Value Frequency 0 1 1 2 2 4 3 8 4 20 5 35 6 60 7 120 8 25 9 500 10 1000 ACTIVITIES ZS PROBLEM 5 63 ZS Problem 6: Analyze the following nuclear reactor data (@2010) Country Argentina In operation Under construction Electr. net Electr. net Number output Number output MW MW 692 2 935 1 Armenia 1 375 - - Belgium 7 5,926 - - Brazil 2 1,884 1 1,245 Bulgaria 2 1,906 2 1,906 Canada 18 12,569 - - 13 10,048 27 27,230 6 4,980 2 2,600 Czech Republic 6 3,722 - - Finland 4 2,716 1 1,600 France 58 63,130 1 1,600 Germany 17 20,490 - - Hungary 4 1,889 - - 20 4,391 5 3,564 - 1 915 China India Iran Mainland Taiwan - 64 Japan 54 46,823 2 2,650 Korea, Republic 21 18,665 5 5,560 Mexico 2 1,300 - - Netherlands 1 487 - - Pakistan 2 425 1 Romania 2 1,300 - 32 22,693 11 9,153 4 1,792 2 782 1 666 - - South Africa 2 1,800 - - Spain 8 7,514 - - 10 9,303 - - Switzerland 5 3,238 - - Taiwan 6 4,980 2 2,600 Ukraine 15 13,107 2 1,900 United Kingdom 19 10,137 - - USA 104 100,747 1 1,165 Total 442 374,958 65 62,862 Russian Federation Slovakian Republic Slovenia Sweden 300 - 65 Work: Some thoughts: A histogram for the number per country? Calculate the measures of center, the variability Check the Empirical Rule? An average output for each reactor? A z-score for the USA, for China? 66 ZS Problem 7 A rough estimate of the range is the mean +/ 2 standard deviations from the mean. Why is this true? Could you use 3 sd? What would the difference be? So you can ESTIMATE the standard deviation by taking the range and dividing by 4…let’s do this. It’s rough, but sometimes you just have to take what you can get! If the range is 16 what is the estimate of the SD? If the mean is 4 and the SD is 1.2 , what is an estimate of the range? 67 3.4 Box and Whisker Plots are sometimes called “box plots”. They use the Five Number Summary in a visual way: Minimum value in the data set Lower Quartile value Median Upper Quartile value Maximum value ***Graphing Calculator, page 79 Definitions: Lower Quartile: Q1: Upper Quartile: Q3: the median of the values below the median the median of the values above the median It is possible to replace the minimum and maximum with prescribed values and have “outliers” marked. Sketch: horizontal 68 IQR: Interquartile Range: is the difference between the upper quartile and the lower quartile. It is where the most “normal” measurements are. Let’s look at page 75 and analyze the two data sets presented there! 69 Box plots are often used to compare data sets! It’s so easy to see how categories compare with them. Constructing a box plot with specified “fences” and “outliers” as opposed to the Five Number Summary only Put the data set in numerical order. Mark the Five Number Summary right on the list. Construct the box with Q1, the median, and Q3 Find the length of the fences (upper and lower, Qx 1.5(IQR)) Identify any data points that lie outside the fences and mark them * BW1 Here is one of my classes, a listing of the grades on the final…raw data and real This is a stem-and-leaf diagram. 10 123 09 45779 08 327758 07 459 06 78 05 354 How many students were in my class? What are the grades? What is the Five Number Summary? The IQR? What is the estimated SD? And the estimated z-score for 67? 70 Sketch the box and whisker plot! Were there any outliers? How do you know they’re outliers? Use the next page for this 71 BW1 continued 72 And another example, utilizing the comparison power of box and whisker plots: Is in ACTIVITIES BW 2 Comparing several data sets with box and whisker plots. A student designed an experiment to test the efficiency of 4 coffee containers from different manufacturers by pouring coffee at 180 into each container and then measuring the temperature difference after 30 minutes. She did the experiment 5 times – using different cups of the same type each time (she didn’t reuse any of the cups). So she used 20 cups total, 5 from each manufacturer. The 5 number summary average temperature differences are in the table below Min Q1 Median Q3 Max IQR Cup 1 6F 6 83.25 14.25 18.5 8.25 Cup 2 0F 1 2 4.5 7 3.5 Cup 3 9F 11.5 14.25 21.75 24.5 10.25 Cup 4 6F 6.50 8.50 14.25 17.5 7.75 Compare the data. Which cup has the best heat retention property? Each group in the room do one and then we’ll go the board and compare! 73 Chapter 3 Summary OYO Sample question: Page 83 number 9, 13 74