Statistics is a ‘do’ field. In order to learn, it you must ‘do’ it. I can recite the rules, I can explain with examples, but whether you learn the material or not is up to you. We depend on the TI-83/84 to eliminate the drudgery of calculations. This is a collaborative class Hints for success in this class Work on class topics every day Form a study group Don’t get discouraged As you solve problems, ask yourself “Does this answer make sense?’ Get help as soon as you need it From me Tutorial Center Don’t get behind 1 Surrounded by examples crime, sports, politics Interpret data to make decisions Analyze information Survey results and your critical eye Do samples represent population Is sample big enough? How was sample chosen? What ‘type’ of people/things selected? Are survey questions loaded? Are graphs properly displayed, data complete, context stated? Was there anything ‘confounding’ the results? 2 Text: Collaborative Statistics by Susan Dean and Barbara Illowsky Available online as a free download. 3 Sampling and Data 4 The Student will be able to Define, in context, key statistical terms. Define, in context, and identify different sampling techniques. Understand the variability of data. Create and interpret Relative Frequency Tables. 5 Statistics collection, analysis, interpretation and presentation of data descriptive statistics inferential statistics Probability mathematical tool used to study randomness theoretical empirical 6 Population entire collection of persons, things or objects under study Sample a portion of the larger population Parameter number that is a property of the population Average, standard deviation, proportion (µ, σ, p) Statistic number that is a property of the sample Average, standard deviation, proportion (x-bar, s, p’) 7 Variable the characteristic of interest for each person or thing in a population numerical categorical Data - data type example the actual values of the variable qualitative quantitative discrete continuous An ‘in context’ example 8 Taking a portion of the total population Need for random sample Represent the population (has the same characteristics as population) each element of the population should have an equal chance of being chosen Population Sample 9 Simple random sampling each member of a population initially has an equal chance of being selected for the sample Random number generator With replacement Without replacement Stratified divide population into groups and then take a sample from each group Cluster sample sample divide population into groups and then randomly select some of the groups and sample all members of those groups 10 Systematic sample select a starting point and take every nth piece of data from a listing of the population Convenience sample using results that are readily available – just happen to be there Why a problem? 11 A soccer coach selects 6 players from a group of boys aged 8 to 10, 7 players from a group of boys aged 11 to 12, and 3 players from a group of boys aged 13 to 14 to form a recreational soccer team. A pollster interviews all human resource personnel in five different high tech companies. An engineering researcher interviews 50 women engineers and 50 men engineers. A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital. A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers. A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns, on the average. 12 1. To find the average GPA of all students in a university, use all honor students at the university as the sample. 2. To find out the most popular cereal among young people under the age of 10, stand outside a large supermarket for three hours and speak to every 20th child under the age of 10 who enters the supermarket. 3. To find the average annual income of all adults in the U.S., sample U.S. congresspersons. Create a cluster sample by considering each state as a stratum (group). By using a simple random sampling, select states to be part of the cluster. Then survey every U.S. congressperson in the cluster. 4. To determine the proportion of people taking public transportation to work, survey 20 people in NYC. Conduct the survey by sitting in Central Park on a bench and interviewing every person who sits next to you. 5. To determine the average cost of a two day stay in a hospital in Massachusetts, survey 100 hospitals across the state using simple random sampling. 13 In data (within the sample) In samples (between samples) The larger the sample the better it represents the population – Law of Large numbers – and sample statistics get closer to population parameters 14 Problems with samples Self-selected samples Sample size issues Undue influence Non-response or refusal of subject to participate Causality Self-funded or Self-interest studies Misleading Use of Data Confounding 15 Data value Frequency how many times the data value occurs Relative Frequency frequency/(total number of data values) Cumulative Relative Frequency summation of previous relative frequencies An example – How many siblings do you have? 16 A A word on fractions You DO NOT have to reduce fractions in this course. In fact, I INSIST that you don’t. If you convert to decimal, take answer to 4 decimal places. word on rounding answers Don’t round until the final answer In general, the final answer should have one more decimal place than the data used to get the answer HOWEVER, the rule of thumb for this course will be probabilities (relative frequencies) to 4 decimal places, everything else to 2, unless you are told otherwise. 17 Descriptive Statistics: Displaying and Measuring Data 18 The Student will be able to Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize, describe, and calculate the measures of location of data: quartiles and percentiles. Recognize, describe, and calculate the measures of the center of data: mean, median, and mode. Recognize, describe, and calculate the measures of the spread of data: variance, standard deviation, and range. 19 Mean or average Use calculator Median x n - the middle data value 50% of data below, 50% above Data MUST be ordered from lowest to highest Use calculator Mode - the most frequent data value Have to count (or put in a frequency table) 20 Relative to other data values Quartiles Splits data into 4 equal groups that contain the same percentage of data Data must be put in numerical order Use calculator Percentiles Splits data into 100 equal groups Data must be put in numerical order Relative to the mean x = x-bar + zs z < 0, data value is below the mean z > 0, data value is above the mean IQR – interquartile range IQR = Q3 – Q1 Middle 50% of data Determine potential outliers Data value < Q1 – 1.5(IQR) Data value > Q3 + 1.5(IQR) 21 Range Difference between high value and low value Standard deviation ‘distance’ from the mean Sample versus population 2 2 ( x x) ( x ) s n 1 N Variance Sample s2 Population 2 Using calcuator 22 ‘Charts’ Stem and Leaf Graphs – example Line Graphs – not using Bar Graphs – not using Boxplots – need min, median, first and second quartile, max Histograms – sort data into bars or intervals 5 to 15 bars horizontal axes is what the data represents vertical axes labeled “frequency” or “relative frequency” 23 Probability Topics Chapter Objectives 24 The student will be able to Understand and use the terminology of probability. Calculate probabilities by listing event sample spaces and counting. Determine whether two events are mutually exclusive or independent. Calculate probabilities using the Addition Rules and Multiplication Rules. Construct and interpret Contingency Tables. Construct and interpret Tree Diagrams. 25 # of students in class ____ # with change in pocket or purse ____ # who have a sister ____ # who have change and a sister ____ P(change) = ____ P(sister) = ____ P(change and sister) = ____ P(change|sister) = ____ 26 Experiment - planned operations carried out under controlled conditions Chance experiment - results not predetermined Outcome - result of an experiment Sample space - set of all possible outcomes Event - any combination of outcomes Probability - long-term relative frequency of an outcome, I.E. it is a fraction - a number between 0 and 1, inclusive OR - as in A OR B - outcome is in A or is in B AND - outcome is in both A and B at the same time Complement - denoted A’ (read “A prime”) - all outcomes that are not in A 27 Conditional Probability of A given B - probability of A is calculated knowing B has already occurred P(A|B) = P(A and B) ÷ P(B) Independent events - the chance of event A occurring does not affect the chance of event B occurring and vice versa must prove one of the following P(A|B) = P(A) P(B|A) = P(B) P(A and B) = P(A)P(B) Mutually Exclusive - event A and event B cannot occur at the same time, they don’t share outcomes P(A and B) = 0 28 Experiment Toss two die, record value showing on each die Sample space (S) {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (3,1), (3,2), (3,3), (3,4), (3,5), (3,6), (4,1), (4,2), (4,3), (4,4), (4,5), (4,6), (5,1), (5,2), (5,3), (5,4), (5,5), (5,6), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)} 29 Let A = the event the sum of the faces of the die is odd A = {(1,2), (1,4), (1,6), (2,1), (2,3), (2,5), (3,2), (3,4), (3,6), (4,1), (4,3), (4,5), (5,2), (5,4), (5,6), (6,1), (6,3), (6,5)} Let B = event of getting a double B = {(1,1), (2,2), (3,3), (4,4), (5,5), (6,6)} Let D = event that at least one face is a 2 D = {(1,2), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (3,2), (4,2), (5,2), (6,2)} 30 P(A) = ___ P(B) = ___ P(D) = ___ P(D and A) = ____ P(A and B) = ____ P(A or D) = ____ P(D|A) = ____ P(A|D) = ____ 31 Need formulas: Addition Rule: P(A OR B) = P(A) + P(B) – P(A AND B) Multiplication Rule: P(A AND B) = P(B)*P(A|B) P(A AND B) = P(A)*P(B|A) Example: P(C) = 0.4, P(D) = 0.5, P(C|D) = 0.6 P(C and D) = _____ Are C and D mutually exclusive? Are C and D independent? P(C or D) = P(D|C) = 32 A table that displays sample values in relation to two different variables that may be contingent on one another. Example - Performance on Job vs. performance in training Performance on Job Below Average Above Average Average Poor 23 60 29 Average 28 79 60 Very Good 9 49 63 TOTAL 60 188 152 TOTAL 112 167 121 400 33 A “graph” used to determine outcomes of an experiment Consists of “branches” that are labeled with either frequencies or probabilities Once probability (frequency) entered on branches, probability (frequency) can be “read” by multiplying down branches and/or adding across branches 34 Experiment - cup with 8 black and 3 yellow beads. Draw 2 beads , one at a time, with replacement. Record bead color. 35 What’s Chapter 1 Chapter 2 Chapter 3 21 multiple choice questions The last 3 quarters exams What fair game to bring with you Scantron (#2052), pencil, eraser, calculator, 1 sheet of notes (8.5x11 inches, both sides) 36