Chapter 13 Statistics and Probability Statistics and Probability – involve studying a group of individuals or objects (POPULATION), and its subsets (SAMPLES) Statistics – used to make sense of data by organizing, summarizing and drawing conclusions. a) Data gathered by RANDOM SAMPLES – each member of the population have an equal chance of being in the sample 2 Types of Data: #1. Qualitative – data divided into categories, such as color of eyes, male or female #2. Quantitative – numerical data, such as the number of miles home, time spent studying Further broken down into: a) discrete – if there is a minimum increment between 2 different values. b) continuous – if the difference between 2 values is arbitrarily small Ways to Display DATA: #1. Frequency Table – displays the frequency of an occurrence by tally marks and the relative frequency by a fraction, decimal or percent. Ex. #2. Bar Graph – displays the categories on a horizontal axis and the frequencies or relative frequencies on a vertical axis, or vice versa. a) The height of the bar shows the frequency of the value b) All bars should be the same width Ex. #3. Pie Chart – displays the categories and the relative frequencies. a) Is divided into sectors whose central angle measure equals the fraction of 360 Distribution Shapes: a) b) Uniform = all date have the same frequency Symmetric = the right and left sides of distribution have frequencies that are mirror images of each other c) Skewed right = the right side of the distribution has much lower frequencies than the left d) Skewed left = the left side of the distribution has much lower frequencies that the right Outlier – a data value that is far removed from the rest of the data *usually caused by errors or by unusual members of the population 2 Ways to Display Quantitative Data: #1. Stem Plot – usually displays small data sets a) b) c) d) Leading digit/s represents the stems Arrange vertically from lowest to highest value Last digit is the leaf – arrange horizontally lowest to highest Can predict distribution by the leaves #2. Histogram – is a bar graph with no gap between bars, a) Divide the range of data into classes of equal width, so that each data value is exactly in one class = class interval b) Draw horizontal axis and indicate the first value in each class interval c) Draw vertical scale and label it with either frequencies or relative frequencies. d) Draw rectangles with a width equal to the class interval and height equal to the frequency of the data within each interval. 13.2 Measures of Center and Spread Measures of Center: #1. Mean – the average are affected by outliers #2. Median – middle number in a data set (if odd) (if even – take mean of 2 middle #’s) are not affected by outliers #3. Mode – the number with the highest frequency (occurs most often) Measure of Center Distributions: A) If a distribution is symmetric = the mean and median are equal B) If a distribution is skewed left = the mean is to the left of the median C) If a distribution is skewed right = the mean is to the right of the median Measure of Spread (variability of data): #1. Standard Deviation – measures the average distance of a data element from the mean, the most common measure of variability – best used if data is symmetric about the mean a) Deviation of the data value xi - x b) Then square each deviation and find the average. = VARIANCE c) The square root of the variance = STANDARD DEVIATION ***However, is the data is taken from a sample rather than a population, it is common to divide by n-1 instead of n when taking the average of the squared deviations = SAMPLE STANDARD DEVIATION (denoted by s) #2. Range – the difference between the maximum and minimum data values. #3. Interquartile Range – a measure of variability that is resistant to extreme values, yet still gives a good indication of the spread of the data. = Q3 – Q1 A) 5 Number Summary and Box Plot: consists of 1) Minimum, Q1, Median,Q3, Maximum 2) Minimum = lowest # in data set 3) Q1 = median of lower quartile 4) Median = middle number of the whole data set 5) Q3= median of upper quartile 6) Maximum = biggest # in data set **Box plot is a visual representation of 5 number summary 13-3 Basic Probability Experiment – any process that generates one or more observable outcomes. a) Sample Space = the set of all possible outcomes Ex. page 865 b) Event- any outcome or set of outcomes in the sample space. Ex. in rolling a die, the set {1,3,5} is an event that can be described as rolling an odd number Probability – of an event is a number from 0 to 1 that indicates how likely the event is to occur a) b) c) d) Probability of 0 = event cannot occur Probability of 1 = event must occur Sum of the probabilities of all outcomes in the sample space is 1. The probability of an event is the sum of the probabilities of the outcomes in the event Probability Distribution – the probability of an event described by a table Mutually Exclusive Events – two events that have no outcomes in common a) Cannot both occur in the same trial of an experiment b) Find the probability of an event (E or F) by adding the individual probabilities c) Complement of an Event = has a probability of 1 – p (set of all outcomes that are not contained in the event) Independent Events – if the occurrence or non-occurrence of one event has no effect on the probability of the other event. a) If 2 events are independent, then the probability of event (E and F) is the product of the individual probabilities Differences between Mutually Exclusive and Independent: Mutually Exclusive 1. 2. Term often refers to 2 possible results for a SINGLE trial of an experiment. The word “or” is often used to describe a pair of mutually exclusive events. 3. For mutually exclusive events, E and F, P(E or F) = P(E) + P(F) ** P(E U F) Independent 1. Term often refers to the results from 2 or MORE trials of an experiment or from different experiments. 2) The word “and” is often used to describe a pair of independent events. 3) For independent events E and F P(E and F)= P(E) · P(F) ** P(EΩF) Random Variables – is a function that assigns a number to each outcome in the sample space of the experiment. Ex. rolling two dice: Random variable is the total number of the faces shown a) Write sample space b) Find range of random variable c) List outcomes for which the value of 7 is assigned. Expected Value(Mean) of a Random Variable – the average value of the outcomes. **If the experiment is repeated a large number of times, the average approaches the expected value. ****TO CALCULATE THE EXPECTED VALUE FROM A PROBABILITY DISTRIBUTION = multiply each value by its probability and add the results: EX. Sum of spins Probability 0 1 1 16 2 1 8 3 3 16 4 1 4 5 3 16 6 1 8 1 16 = 0(1/16) + 1(1/8)+2(3/16) + 3(1/4) + 4(3/16) + 5(1/8) + 6(1/16) = 3 13-4 Determining Probabilities Exact probability will never be known, so we estimate it in 2 ways: #1. Experimental Estimates of Probability a) ** as the number of trials of an experiment increases, the relative frequency of an outcome Approaches the probability of the outcome P(E) = number of trials with an outcome in E n #2. Theoretical Estimates of Probability a) Suppose an experiment has a sample space of n outcomes, all of which are equally likely. 1 Then the probability of each outcome is 𝑛, and the probability of an event E is given by: P(E) = number of outcomes in E n Ex. an experiment consists of spinning a spinner divided into 5 equal sections numbered from 1 to 5. Suppose that all outcomes are equally likely: a) Write a probability distribution for the experiment b) Find the probability of the event that the spinner lands on a prime number Fundamental Counting Principle (multiplication principle) = consider a set of k experiments. Suppose the first experiment has n1 outcomes, the second has n2 outcomes and so on. Then the total number of outcomes is n1•n2•…•nk for all k experiments. Ex. Suppose there are 5 roads from Town A to Town B, 4 roads from Town B to Town C, and 6 roads from Town C to Town D. How many different routes are there from Town A to Town D, passing through both Towns B and C? Ex. Do 3 coin toss tree diagram on page 880 Permutations – if r items are chosen in order without replacement from n possible items, the number of permutations is: nPr = 𝑛! (𝑛−𝑟)! Combinations – if r items are chosen in any order without replacement from n possible items, the number of combinations is: nCr 𝑛! = 𝑟!(𝑛−𝑟)! **if each item is equally likely to be chosen, the permutations and combinations are all equally likely for a given value of r. Ex. Suppose 5 tiles labeled with capital letters A,B,C,D and E are placed randomly in slots a,b,c,d and e. What is the probability that each capital letter is matched with its lowercase counterpart? Ex. A bag contains 26 tiles, each labeled with a letter A through Z. Julie chooses 5 tiles at random from the bag. What is the probability that she chooses the letters of her name in any order (matching all 5 letters)? What is the probability of matching 4 letters? 3 letters? 0 letters? 13.5 Normal Distributions Properties: 1) 2) 3) 4) 5) 6) 7) Bell shaped Symmetric about the mean X-axis is a horizontal asymptote Area under the curve and above x-axis is 1 Maximum value occurs at the mean Has 2 points of inflection at 1 standard deviation to the right and left of the mean The mean, median and mode all have the same value = CENTER Empirical Rule: About 68% of the data values are within 1 standard deviation of the mean About 95% of the data values are within 2 standard deviations of the mean About 99.7% of the data values are within 3 standard deviations of the mean