1 Descriptive Statistics Concepts of population, sample, random sample and frequency distribution of discrete and continuous data. A population is the set of all individuals with a given value for a variable associated with them. A sample is a small group of individuals randomly selected (in the case of a random sample) from the population as a whole, used as a representation of the population as a whole. The frequency distribution of data is the number of individuals within a sample or population for each value of the associated variable in discrete data, or for each range of values for the associated variable in continuous data. Presentation of data: frequency tables and diagrams Grouped data: mid-interval values, interval width, upper and lower interval boundaries, frequency histograms. Mid interval values are found by halving the difference between the upper and lower interval boundaries. The interval width is simply the distance between the upper and lower interval boundaries. Frequency histograms are drawn with interval width proportional to bar width and frequency as the height. Median, mode; quartiles, percentiles. Range; interquartile range; variance, standard deviation. Mode (discrete data) is the most frequently occurring value in the data set. Modal class (continuous data) is the most frequently occurring class. Median is the middle value of an ordered data set. For an odd number of data, the median is middle data. For an even number of data, the median is average of two middle data. Percentile is the score bellow which a certain percentage of the data lies. Lower quartile (Q1) is the 25th percentile. Median (Q2) is the 50th percentile. Upper quartile (Q3) is the 75th percentile. Range is the difference between the highest and lowest value in the data set. The interquartile range is Q3−Q1. Cumulative frequency is the frequency of all values less than a given value. A table can be drawn as shown: 2 The population mean, μ is generally unknown but the sample mean, 𝑥̅ used to serve as an unbiased estimate of this mean. That used to be. From now on for the examination purposes, data will be treated as the population. Estimation of mean and variance of population from a sample is no longer required. In the examination the following formulae should be used Mean Population variance 𝝈𝟐 of a finite population of size n is a measure of statistical dispersion (to what extent the data values deviate from the mean). Standard deviation approximately 68% of the population will measure between 1 standard deviation either side of the mean approximately 95% of the population will measure between 2 standard deviations either side of the mean approximately 99.7 % of the population will measure between 3 standard deviations either side of the mean 3 Probability The number of trials is the total number of times the “experiment” is repeated. The outcomes are the different results possible for one trial of the experiment. Equally likely outcomes are expected to have equal frequencies. The sample space, U, is the set of all possible outcomes of an experiment. And event is the occurrence of one particular outcome. 𝑛(𝐴) 𝑃(𝐴) = 𝑛(𝑈) P(A) is the probability of an event A occurring in one trial, n(A) is the number of times event A occurs in the sample space n(U) is the total number of possible outcomes. Complementary Events Two events are described as complementary if they are the only two possible outcomes. event A: rain, complement of A is A’: not rain Two complementary events are mutually exclusive. Since an event must either occur or not occur, the probability of the event either occurring or not occurring must be 1. This can be stated as follows. 𝑷(𝑨) + 𝑷(𝑨′ ) = 𝟏 Use when you need probability that an event will not happen Possibility when we are interested in more than one outcome (events are “and”, “or”, “at least”) Combined Events ∪ (𝑢𝑛𝑖𝑜𝑛) ≡ 𝑒𝑖𝑡ℎ𝑒𝑟 ∩ (𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛) ≡ 𝑏𝑜𝑡ℎ/𝑎𝑛𝑑 Given two events, B and A, the probability of at least one of the two events occurring, 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) either A or B or both P(A) includes part of B from intersection P(B) includes part of A from intersection 𝑃(𝐴 ∩ 𝐵) (both A and B) was counted twice, so one has to be subtracted 4 𝐼𝑡 𝑖𝑠 𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑡 𝑡𝑜 𝑘𝑛𝑜𝑤 ℎ𝑜𝑤 𝑡𝑜 𝑔𝑒𝑡 𝑃(𝐴 ∩ 𝐵) For mutually exclusive events (no possibility that A and B occurring at the same time) Turning left and turning right (you can't do both at the same time) Tossing a coin: Heads and Tails 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) 𝑃(𝐴 ∩ 𝐵) = ∅ Independent and Dependent Events A bag contains three different kinds of marbles: red, blue and green. You pick the marble twice. Probability of picking up the red one (or any) the second time depends weather you put back the first marble or not. • Independent Events: • Dependent Events: the probability that one event occurs in no way affects the probability of the other event occurring. probability of one event occurring influences the likelihood of the other event You put the first marble back You don’t put the first marble ∎ Conditional Probability: Given two events, B and A, the conditional probability of an event A is the probability that the event will occur given the knowledge that an event B has already occurred. This probability is written as (notation for the probability of A given B) P (A|B ) Probability of the intersection of A and B (both events occur) is: 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵) • Independent Events: 𝑃(𝐴|𝐵) = 𝑃(𝐴) = 𝑃(𝐴|𝐵′ ) 𝐴 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑑𝑒𝑝𝑒𝑛𝑑 𝑜𝑛 𝐵 𝑛𝑜𝑟 𝑜𝑛 𝐵′ 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵) • Dependent Events: 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵) 𝑃(𝐴|𝐵) 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑑𝑒𝑝𝑒𝑛𝑑𝑖𝑛𝑔 𝑜𝑛 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡 𝐵 5 𝑃(𝐴|𝐵) = 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴|𝐵) = = 𝑃(𝐵) 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵) 𝑛(𝐴 ∩ 𝐵) 𝑛(𝐴 ∩ 𝐵) 𝑛(𝑈) = 𝑛(𝐵) 𝑛(𝐵) 𝑛(𝑈) Probability that A will happen given B occurred, is number of both together out of B available On the other hand probability of both together 𝑃(𝐴 ∩ 𝐵) = 𝑛(𝐴 ∩ 𝐵) 𝑛(𝑈) EXAMPLE: In the class of 25 students, 14 like pizza and 16 like iced coffee. One student likes neither and 6 student like both. One student is randomly selected from the class. What is the probability that the student likes a. pizza b. likes pizza and coffee b. likes pizza given that she or he likes iced coffee? n = 25 a. P(pizza) = 𝑤ℎ𝑜 𝑙𝑖𝑘𝑒 𝑝𝑖𝑧𝑧𝑎 𝑎𝑙𝑙 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 b. P( pizza and coffee) = 𝑐𝑜𝑓𝑓𝑒𝑒 6 25 16 25 𝑤ℎ𝑜 𝑙𝑖𝑘𝑒 𝑝𝑖𝑧𝑧𝑎 𝑎𝑛𝑑 𝑐𝑜𝑓𝑓𝑒𝑒 𝑐𝑜𝑓𝑓𝑒𝑒 = 𝑤ℎ𝑜 𝑙𝑖𝑘𝑒 𝑝𝑖𝑧𝑧𝑎 𝑎𝑛𝑑 𝑐𝑜𝑓𝑓𝑒𝑒 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵) c. P( pizza given coffee) = 25 𝑎𝑙𝑙 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 Using formula 𝑃(𝐴|𝐵) = 14 𝑤ℎ𝑜 𝑙𝑖𝑘𝑒 𝑝𝑖𝑧𝑧𝑎 𝑎𝑛𝑑 𝑐𝑜𝑓𝑓𝑒𝑒 c. P( pizza given coffee) = = = = 6 16 = 6 25 6 16 = 6 ● Use of Venn diagrams, tree diagrams and tables of outcomes to solve problems. 1. Venn Diagrams The probability is found using the principle 𝑃(𝐴) = 2. 𝑛(𝐴) 𝑛(𝑈) Tree diagrams A more flexible method for finding probabilities is known as a tree diagram. This allows one to calculate the probabilities of the occurrence of events, even where trials are non-identical (where 𝑃(𝐴|𝐴) ≠ 𝑃(𝐴)), through the product principle. 3. Tables of outcomes two dice combinations 11 12 13 14 15 16 21 22 23 24 25 26 31 32 33 34 35 36 41 42 43 44 45 46 51 52 53 54 55 56 61 62 63 64 65 66 7 ⧪ Bayes’ Theorem 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵) 𝑃(𝐴|𝐵) = ▪ ⟹ 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴) 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴)𝑃(𝐵|𝐴) = 𝑃(𝐵) 𝑃(𝐵) 𝐵𝑎𝑦𝑒𝑠′ 𝑡ℎ𝑒𝑜𝑟𝑒𝑚 Another form of Bayes’ theorem (Formula booklet) From tree diagram: there are two ways to get A, either after B has happen or after B has not happened: 𝑃(𝐴) = 𝑃(𝐵)𝑃(𝐴|𝐵) + 𝑃(𝐵′)𝑃(𝐴|𝐵′) 𝑃(𝐵|𝐴) = 𝑃(𝐵)𝑃(𝐴|𝐵) 𝑃(𝐵)𝑃(𝐴|𝐵) + 𝑃(𝐵′)𝑃(𝐴|𝐵′) ⟹ 8 ▪ Extension of Bayes’ Theorem If there are more options than simply B occurs or B doesn’t occur, for example if there were three possible outcomes for the first event B1, B2, and B3 Probability of A occurring is: 𝑃(𝐵1 )𝑃(𝐴|𝐵1 ) + 𝑃(𝐵2 )𝑃(𝐴|𝐵2 ) + 𝑃(𝐵3 )𝑃(𝐴|𝐵3 ) 𝑃(𝐵𝑖 |𝐴) = 𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 ) 𝑃(𝐵1 )𝑃(𝐴|𝐵1 ) + 𝑃(𝐵2 )𝑃(𝐴|𝐵2 ) + 𝑃(𝐵3 )𝑃(𝐴|𝐵3 ) Outcomes B1, B2, and B3 must cover all the possible outcomes.