Probability & Data Analysis: Intro Presentation

STAT1005 Introduction to Probability and Data Analysis Dr Vasudevan Mangalam Mathematics and Statistics Curtin University Dr Vasudevan Mangalam STAT1005 1/ 65 Outline Introduction to Probability and Statistics Probability Statistics Basic Ideas of Probability The Statistical Package About STAT1005 Sample Spaces, Outcomes and Events Naı̈ve Definition of Probability Counting Methods Permutations Combinations Dr Vasudevan Mangalam STAT1005 2/ 65 Why Probability? I If mathematics is the logic of certainty, probability is the logic of uncertainty and chance. I Everyone has a some intuitive notion of using probability to express uncertainty: I probability of rain tomorrow I chance that your team will win next weekend I likelihood that someone is guilty (Sally Clarke, OJ Simpson) I What are your intuitive notions of probability? Dr Vasudevan Mangalam STAT1005 3/ 65 Probability and Quantum Mechanics I Electron orbits are representations of the space in which an electron is likely to be found I The result of wave functions, where the amplitude of the wave function is interpreted as the probability of finding an electron at any point in space Dr Vasudevan Mangalam STAT1005 4/ 65 Probability in Genetics I Fundamental laws of genetics are stated as probability laws, i.e., what is the likelihood of a particular trait – eye colour, for example – being passed on from one generation to the next? Dr Vasudevan Mangalam STAT1005 5/ 65 Probability in Weather and Climate I Forecasts are computed and expressed in terms of probability. I What is the probability of rain tomorrow? I Will extreme rainfall events become more likely in the future, and how extreme will they be? Dr Vasudevan Mangalam STAT1005 6/ 65 The Birthday Problem I There are k people in a room. Assume each person’s birthday is equally likely to be any of the 365 days of the year (we exclude February 29). I Assume that people’s birthdays are independent. I Also assume there are no twins in the room. I What is the probability that two or more people in the group have the same birthday? I On a different note, are birthdays equiprobable? Read the following article: https://www.abc.net.au/news/2017-12-13/australias-mostand-least-popular-birthdays-revealed/9241978 Dr Vasudevan Mangalam STAT1005 7/ 65 Are Birthdays Equiprobable? ABC News Report Dr Vasudevan Mangalam STAT1005 8/ 65 3.1 A Brief History of Probability I A gambler’s dispute in 1654 led to the creation of a mathematical theory of probability by two French mathematicians, Blaise Pascal and Pierre de Fermat. I Antoine Gombaud, (AKA Chevalier de Méré) a French noble man and a writer with an interest in gaming and gambling questions, called Pascal’s attention to an apparent misconception concerning a popular dice game. Dr Vasudevan Mangalam STAT1005 9/ 65 A Game of Dice I The game consisted of throwing a pair of dice 24 times. The problem was to decide whether or not to bet even money on the occurrence of at least one double six during the 24 throws. A seemingly well-established gambling rule suggests that betting even money on a double six in 24 throws would be profitable. But de Méré’s own calculations showed that this is not so. I This problem led to an exchange of letters between Pascal and Fermat in which the fundamental principles of probability were formulated for the first time. I Questions: I What is the probability of getting at least one six in 4 throws of a die? I What is the probability of getting at least one double-six in 24 throws of a pair of dice? Dr Vasudevan Mangalam STAT1005 10/ 65 What is Probability Used for? I Epidemiology: The spread of virus among humans I Agriculture: Crop trials, plant genetics I Meteorology: Weather prediction I Sports strategies: Probabilities are used for planning and gaining competitive advantage. I Insurance options: Risk assessment I Games and recreational activities: What are the chances of getting a full house in a poker hand? I Economics and finance: Performance of stock market And many more. . . Dr Vasudevan Mangalam STAT1005 11/ 65 Probability and Statistics I Probability focuses on a systematic study of randomness and uncertainty. In any situation where one of several outcomes may occur, probability provides a method for quantifying the chance or likelihood associated with different outcomes. I In inferential statistics, we often ask the question “could this have occurred by chance alone, or is there something really going on here?” I Our intuition is often not very helpful in assessing randomness and uncertainty, so probability theory provides us with a principled framework for answering questions such as the one above. I Probability theory provides us with probability models for describing different ‘types of randomness’. Dr Vasudevan Mangalam STAT1005 12/ 65 What is Statistics? I Statistics is a powerful tool for researchers to I discover new things I prevent mistakes I Example: 70% of graduates wish they had learned more statistics. I Scientists say they have discovered the 25 genes responsible for 7 common serious diseases. How did they do it? Dr Vasudevan Mangalam STAT1005 13/ 65 What is Statistics? Dr Vasudevan Mangalam STAT1005 14/ 65 What is Statistics? I From data we draw conclusions. I We need statistics to draw appropriate conclusions from data. I Statistics is the science of drawing conclusions from data. Dr Vasudevan Mangalam STAT1005 15/ 65 Hubble’s Law I Edwin Hubble discovered the relation between distances to the galaxies and the speed at which they move away from us (and from each other). Dr Vasudevan Mangalam STAT1005 16/ 65 Melanoma and Latitude: Sun Exposure Increases Cancer I Henry Oliver Lancaster was the Foundation Chair of Mathematical Statistics at the University of Sydney. I Lancaster was the first person to quantify the relation between melanoma and latitude. Dr Vasudevan Mangalam STAT1005 17/ 65 Henry Oliver Lancaster: Rubella Causes Infant Deafness Dr Vasudevan Mangalam STAT1005 18/ 65 What is Statistics Used for? I Medical research: Drug trials, public health, genetics I Agriculture: Crop trials, plant genetics I Forensic science: DNA evidence I Astronomy: Galaxy distribution, exoplanets I Geology I Internet security I Environmental science: Ecology, Climate I Many articles in scientific journals use statistics to analyze data and draw conclusions. Climate is a statistical description of weather conditions and their variations, including both averages and extremes Australian Academy of Science The Science of Climate Change, August 2010 Dr Vasudevan Mangalam STAT1005 19/ 65 Statistics in Science Dr Vasudevan Mangalam STAT1005 20/ 65 Statistics in Science Dr Vasudevan Mangalam STAT1005 21/ 65 The Sally Clarke Case I Applying probabilistic ideas with a poor understanding of the underlying concepts can result in tragic errors. I Sally Clarke was a young British solicitor who became the victim of a famous case of incorrect statistics. She was wrongly convicted of the murder of two of her children. Dr Vasudevan Mangalam STAT1005 22/ 65 The Sally Clarke Case I Her first son died suddenly within a few weeks of birth in 1996. I When her second son also died suddenly in 1998, she was arrested for double murder. I A key witness in her trial was paediatrician Professor Sir Roy Meadow, who testified that the chance of two children from an affluent family suffering sudden infant death syndrome (SIDS) was 1 in 73 million. I This probability calculation was based on the statistics of previous SIDS deaths and the assumption that the chances of a single cot death is 1 in 8543 and that such deaths in a family are independent. Dr Vasudevan Mangalam STAT1005 23/ 65 The Sally Clarke Case I You can read about the multiple errors committed in the probability calculation and the conclusion drawn from it in the ‘Statistical evidence’ section of https://en.wikipedia.org/wiki/Sally_Clark I In 2001, the Royal Statistical Society issued a public statement expressing its concern on the misuse of statistics in court. I They calculated the odds of a second cot death in a family as about 200 to 1. I Sally Clark was convicted in November 1999. Dr Vasudevan Mangalam STAT1005 24/ 65 The Sally Clarke Case I However, in an appeal in January 2003 it was revealed that the prosecutor’s pathologist had failed to disclose microbiological reports that suggested one of her sons had died of natural causes. I Clarke’s conviction was overturned and she was released from prison having served more than three years of her sentence, but she never recovered from the psychological damage and the mistreatment she received in prison. I Sally Clarke died of alcohol poisoning in 2007. I The case was widely considered as one of the great miscarriages of justice in modern British legal history. I Hundreds of other cases were reviewed subsequently, and two other women convicted of murdering their children had their convictions overturned. Dr Vasudevan Mangalam STAT1005 25/ 65 Statistics to the Rescue I Analysing data is not a mindless process. I Great scientific discoveries have been made by analysing data. I Terrible mistakes have been committed by wrongly interpreting data. I Statistics is the guiding light! Dr Vasudevan Mangalam STAT1005 26/ 65 Basic Ideas of Probability I Chance behaviour is unpredictable in the short run, but has a regular and predictable pattern in the long run. I We call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions. I The probability of any outcome of a chance process is the proportion of times the outcome would occur in a very long series of repetitions. I Mathematical probability is an idealization based on imagining what would happen in an indefinitely long series of trials. Dr Vasudevan Mangalam STAT1005 27/ 65 Law of Large Numbers in a Coin Toss Experiment Dr Vasudevan Mangalam STAT1005 28/ 65 Probability Models I Descriptions of chance behaviour contain two parts: a list of possible outcomes and a probability for each outcome. I The sample space S of a random phenomenon is the set of all possible outcomes. I An event is an outcome or a set of outcomes of a random phenomenon. That is, an event is a subset of the sample space. I A probability model is a mathematical description of a random phenomenon consisting of two parts: a sample space S and a way of assigning probabilities to events. Dr Vasudevan Mangalam STAT1005 29/ 65 Probability Rules I Any probability is a number between 0 and 1. I All possible outcomes together must have probability 1. I If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities. I The probability that an event does not occur is 1 minus the probability that the event does occur. Dr Vasudevan Mangalam STAT1005 30/ 65 Probability Rules Mathematically, I Rule 1. The probability P (A) of any event A satisfies 0 ≤ P (A) ≤ 1. I Rule 2. If S is the sample space in a probability model, then P (S) = 1. I Rule 3. If A and B are disjoint, P (A ∪ B) = P (A) + P (B). This is the addition rule for disjoint events. I Rule 4. For any event A, P (A does not occur) = 1–P (A). Dr Vasudevan Mangalam STAT1005 31/ 65 Finite and Discrete Probability Models I One way to assign probabilities to events is to assign a probability to every individual outcome, then add these probabilities to find the probability of any event. This idea works well when there are only a countable (finite or countably infinite) number of outcomes. I A finite probability model is one with a finite sample space. I Discrete probability models include finite models as well as sample spaces that are countably infinite. Countably infinite sets are ones that are equivalent to the set of all positive integers. I To assign probabilities in a finite model, list the probabilities of all the individual outcomes. These probabilities must be numbers between 0 and 1 that add to exactly 1. The probability of any event is the sum of the probabilities of the outcomes making up the event. Dr Vasudevan Mangalam STAT1005 32/ 65 Continuous Probability Models I Suppose we want to choose a number at random between 0 and 1, allowing any number between 0 and 1 as the outcome. I We cannot assign probabilities to each individual value because there is an infinite interval of possible values. I A continuous probability model assigns probabilities as areas under a density curve. The area under the curve and above any range of values is the probability of an outcome in that range. I Normal distributions are examples of continuous probability models. Dr Vasudevan Mangalam STAT1005 33/ 65 Random Variables I A random variable is a numerical outcome of a random phenomenon (such as random sampling). I The probability distribution of a random variable X tells us what values X can take and how to assign probabilities to those values. I Random variables that have a finite or countably infinite list of possible outcomes are called discrete random variables. I Random variables that can take on any value in an interval are continuous random variables. Dr Vasudevan Mangalam STAT1005 34/ 65 The Statistical Package R R is the software package we shall use for probabilistic and statistical computations. > (x <- rnorm(9,3.2,1.6)) [1] 3.623632 4.100666 3.925909 [4] 4.342729 2.045807 2.883580 [7] 2.566033 3.890913 3.566257 > mean(x) [1] 3.438392 > sd(x) [1] 0.7711733 Dr Vasudevan Mangalam STAT1005 35/ 65 The Statistical Package R Why Use R? I Free, open-source software I Works on Windows, Mac, Linux. . . I Standard platform for data analysis I Relatively easy to learn I Saves us a lot of work I Many online resources available I Thousands of extension packages for specific tasks I Used in many other units Dr Vasudevan Mangalam STAT1005 36/ 65 Example > QuizMarks [1] 13 14 10 14 13 [212] 4 7 7 11 0 > mean(QuizMarks) [1] 13.48444 > sd(QuizMarks) [1] 5.067489 > median(QuizMarks) [1] 14 9 14 14 ... 6 8 0 5 ... > hist(QuizMarks, xlab = "Marks", main = "Histogram of AiM Quiz Marks", col = "darkseagreen") Dr Vasudevan Mangalam STAT1005 37/ 65 STAT1005 Unit Objectives I Introduce the theory of probability, random variables, discrete and continuous distributions I Introduce you to statistics as a way of thinking I Provide you with some simple tools for analysing data I Gain understanding of the role of statistical inference in drawing conclusions about the world I By the end of the unit, you should I Have a good grounding that will allow you to take more advanced units; I Have a firm grasp of the fundamental principles of probability; I Be on your way to becoming a statistically literate citizen who can cast a critical eye over claims made in the media and elsewhere. Dr Vasudevan Mangalam STAT1005 38/ 65 UniPass Dr Vasudevan Mangalam STAT1005 39/ 65 Blackboard Dr Vasudevan Mangalam STAT1005 40/ 65 Sample Spaces, Outcomes and Events I The mathematical framework for probability is built around sets. I Imagine that an experiment is performed, resulting in one out of a set of possible outcomes. I An experiment is any action or process that generates observations. Before the experiment is performed, it is unknown which outcome will be the result; afterwards, the result ‘crystallises’ into the actual outcome. Definition 1 The sample space S of an experiment is the set of all possible results of the experiment. Elements of the sample space are called outcomes. An event A is a subset of the sample space S, and we say that A occurred if the actual outcome is in A. Dr Vasudevan Mangalam STAT1005 41/ 65 Sample Spaces, Outcomes and Events Example 1 I Toss a coin once: S = {H, T} = {Head, Tail} I The number of customers to a bank on a randomly selected day: S = {0, 1, 2, . . . } I Measure someone’s systolic blood pressure: S = {x : x ≥ 0} = [0, ∞) Remark 1 I In the first example, the sample space is finite. I In the second example, the sample space is countably infinite. I In the third example, the sample space is uncountably infinite. I A set that has either a finite or a countably infinite number of elements will be called countable. I In the first two examples the sample space is said to be discrete (countable), while in the last example S is said to be continuous (interval, uncountably infinite, uncountable). Dr Vasudevan Mangalam STAT1005 42/ 65 Union, Intersection and Complement An event is just a set of outcomes, so that relationships and results from elementary set theory can be used to study events. Definition 2 I The union of two events A and B, denoted by A ∪ B and read “A or B”, is the event consisting of all outcomes that are either in A or in B or in both events. I The intersection of two events A and B, denoted by A ∩ B and read “A and B”, is the event consisting of all outcomes that are in both A and B. I The complement of an event A, denoted by Ac or Ā, is the set of all outcomes in S that are not contained in A. Dr Vasudevan Mangalam STAT1005 43/ 65 Union and Intersection Intersection: A ∩ B is the set of points that are both in A and in B. Union: A ∪ B is the set of points that are in A or B or both. Dr Vasudevan Mangalam STAT1005 44/ 65 Complementary and Disjoint Events The complement of an event A, denoted A0 , A or Ac , is the set of points which are not in A. Definition 3 When A and B have no outcomes in common, they are said to be mutually exclusive or disjoint events. (A ∩ B = ∅) Exercises 1. What is the complement of Ac ? 2. What is the intersection of A and Ac ? Dr Vasudevan Mangalam STAT1005 45/ 65 De Morgan’s Laws Example 2 I Two coins are tossed. I Let A denote the event that the first toss yields a H. (a) What is the complement of A? S = {HH, HT, T H, T T } , A = {HH, HT } , Ac = {T H, T T } (b) Write down an event that is disjoint with A. Ac is disjoint with A. {T H} and {T T } are also disjoint with A. Remark 2 De Morgan’s laws: For two events A, B ⊆ S: (A ∪ B)c = Ac ∩ B c Dr Vasudevan Mangalam and (A ∩ B)c = Ac ∪ B c STAT1005 46/ 65 Example 3 (Coin flips) A coin is flipped 10 times. Writing Heads as H and Tails as T , a possible outcome is HHHT HHT T HT , and the sample space is the set of all possible strings of length 10 of Hs and T s. We can encode H as 1 and T as 0, so that an outcome is a sequence (s1 , . . . , s10 ) with sj ∈ {0, 1}, and the sample space is the set of all such sequences. Now let’s look at some events: 1. Let A1 be the event that the first flip is Heads. As a set, A1 = {(1, s2 , . . . , s10 ) : sj ∈ {0, 1} for 2 ≤ j ≤ 10}. This is a subset of the sample space, so it is indeed an event; saying that A1 occurs is the same as saying that the first flip is Heads. Similarly, let Aj be the event that the j th flip is Heads for j = 2, 3, . . . , 10. 2. Let B be the event that at least one flip was Heads. S As a set B = 10 A j=1 j . Dr Vasudevan Mangalam STAT1005 47/ 65 3. Let C be the event that all flips were Heads. T As a set C = 10 A j=1 j . 4. Let D be the event that there were at least two consecutive Heads. S T As a set, D = 9j=1 (Aj Aj+1 ) Events can be described using words or in set notation. Events and occurrences sample space s is a possible outcome A is an event A occured something must happen Dr Vasudevan Mangalam S s∈S A⊆S sactual ∈ A sactual ∈ S STAT1005 48/ 65 New events from old events A or B (inclusive) A and B not A A or B, but not both at least one of A1 , . . . , An all of A1 , . . . , An A∪B A∩B Ac (A ∩ B c ) ∪ (Ac ∩ B) A1 ∪ · · · ∪ An A1 ∩ · · · ∩ An Relationships between events A implies B A and B are mutually exclusive A1 , . . . , An are a partition of S Dr Vasudevan Mangalam A⊆B A∩B =∅ A1 ∪ · · · ∪ An = S, Ai ∩ Aj = ∅ for i 6= j STAT1005 49/ 65 Naı̈ve Definition of Probability The naı̈ve definition applies when every outcome in S is equally likely to occur. Definition 4 (Naı̈ve definition of probability) Let A be an event for an experiment with a finite sample space S. The naı̈ve probability of A is P (A) = number of outcomes favourable to A |A| = |S| total number of outcomes in S Remark 3 Obviously, P (A) can only be defined in this way only if S is a finite set. Experiments in which each outcome is equally likely to occur are also called Laplace experiments. Dr Vasudevan Mangalam STAT1005 50/ 65 Naı̈ve Definition of Probability There are several important types of problems where the naı̈ve definition is applicable: I when there is symmetry in the problem that makes outcomes equally likely. . . I when the outcomes are equally likely by design. For example, consider conducting a survey of n people in a population of N people. . . I when the naı̈ve definition serves as a useful null model. In this setting, we assume that the naı̈ve definition applies just to see what predictions it would yield, and then we can compare observed data with predicted values to assess whether the hypothesis of equally likely outcomes is tenable. Dr Vasudevan Mangalam STAT1005 51/ 65 Counting Methods I Calculating the naı̈ve probability of an event A involves counting the number of outcomes in A and in the sample space S. I Enumerating all outcomes is straightforward when the list is small, but in many real instances, such lists are prohibitively long. Exploiting general counting rules we can calculate probabilities without listing all outcomes. I In many instances, we can count the number of possibilities using the multiplication rule. I Consider a compound experiment consisting of two sub-experiments such that the first has n1 outcomes and for each possible outcome for the first, the second has n2 outcomes. Then the compound experiment has n1 × n2 possible outcomes. I This rule is known as the fundamental principle of counting. Dr Vasudevan Mangalam STAT1005 52/ 65 An Example of the Multiplication Rule I At your local cinema, you can buy ice cream with a choice of Bacio, Hazelnut, and Tiramisu flavours, and with wafer or waffle cones. By the multiplication rule, there are 2 × 3 = 6 choices, and you can visualize the decision process with a tree diagram. I It doesn’t matter whether you choose the cone first or the flavour first. I As long as there is the same number of choices of the second element for each first element, the product rule is valid even when the set of possible second elements depends on the first, i.e. the option of flavours are different for different types of cones. Dr Vasudevan Mangalam STAT1005 53/ 65 General Multiplication Rule I If a compound experiment consists of k sub-experiments such that the ith sub-experiment has ni outcomes and for each combination of outcomes for the others, then the compound Qk experiment has i=1 ni = n1 × n2 × · · · nk possible outcomes. I This can also be stated another way: Suppose that a set consists of ordered collections of k-elements (k-tuples) such that there are n1 possible choices for the first element, for each choice of the first element, there are n2 possible choices of the second element, etc., and for each possible choice of the first k − 1 elements, there are nk choices of the kth element. Then there are n1 × n2 × · · · nk possible k-tuples. Dr Vasudevan Mangalam STAT1005 54/ 65 Examples I Suppose a set has n elements. How many possible subsets of S exist? I For each element of S, we can choose whether to include it or exclude it in the construction of a subset. This gives us 2 possibilities for each of the n elements. So from the product rule, it follows that there are 2n subsets. I If we perform sampling with replacement of size k from a set of n objects, there are nk possible outcomes. I Using the 26 letters in the English alphabet, a 3-letter ‘word’ is generated at random. What is the probability that the word contains neither of the letters ‘a’ and ‘e’ ? I Here |S| = 263 . If A is the set of words that have no ‘a’ or ‘e’, then |A| = 243 . Hence by the naı̈ve definition of probability, 3 12 243 . P (A) = 3 = 26 13 Dr Vasudevan Mangalam STAT1005 55/ 65 Examples I If we perform sampling with replacement of size k from a set of n objects, there are n × (n − 1) × · · · (n − k + 1) possible outcomes if k ≤ n (and zero possibilities if k > n). I In the ‘words’ example, what if we now specify that the words cannot contain the same letter more than once? I Now |S| = 26 × 25 × 24 and |A| = 24 × 23 × 22. Thus P (A) = 24 × 23 × 22 26 × 25 × 24 I This leads us to the definition of permutations. Any ordered sequence of k objects taken from a set of n distinct objects is called a permutation of size k of the objects. I We first introduce the factorial notation: n!, pronounced as ‘n factorial’, is defined as n! = n × (n − 1) × · · · 2 × 1. Dr Vasudevan Mangalam STAT1005 56/ 65 Permutations I The number of permutations of n objects in k places is denoted by Pkn (or Pn,k ) and is given by Pkn = n × (n − 1) × · · · (n − k + 1) = n! . (n − k)! I Ten markers are available for grading a test. The test has four questions, and the professor wishes to select a different marker to grade each question with only one marker per question. In how many ways can markers be chosen to grade the test? I Here n = 10 and k = 4, so the number of different ways markers can be assigned is P410 = Dr Vasudevan Mangalam 10! = 5040. 6! STAT1005 57/ 65 Permutations I A deck of 52 cards is shuffled and 5 cards are dealt face up in a row. Find the probability that four aces present in adjacent position. I The total number of arrangements is P552 = 311875200. I The number of arrangements where the four aces are present and are in adjacent position is 48 × 4! × 2! = 2304. I Thus the required probability is 2304 = 0.000007388. 311875200 I How many permutations are there of the word STATISTICS? I First we think of the repeated letters as distinct to get 10!. I Then we divide it by the factors by which the repetitions occur to get the number of permutations to be 10! = 50400. 3!3!2! Dr Vasudevan Mangalam STAT1005 58/ 65 The Birthday Problem I There are k people in a room. Assume each person’s birthday is equally likely to be any of the 365 days of the year. We assume that people’s birthdays are independent of each other. What is the probability that two or more people in the group have the same birthday? I We can calculate the probability of the complement, that of P 365 k all birthdays being distinct. This is given by 365 k . Then the probability that there is a clash is given by k P (k) k P (k) 1 0 22 .476 P (k) = 1 − Pk365 365k 2 .003 23 .507 10 .117 40 .891 Dr Vasudevan Mangalam 3 .008 30 .706 STAT1005 20 .411 50 .9704 59/ 65 21 .444 80 .9999 The Birthday Problem Thus we need only 23 people in the group to have a higher than 50% chance of there being a clash. Dr Vasudevan Mangalam STAT1005 60/ 65 Simulating the Birthday Problem I One way to simulate the birthday problem using R is as follows. I Randomly generate birthdays for k = 50 people by sampling from the numbers 1 to 365 with replacement. I Find the frequency each day and record the maximum frequency r. I Program R to do this N = 10000 times, and calculate the fraction of time that r ≥ 2. I You will probably find that this fraction is close to P (50) = 0.9704. Dr Vasudevan Mangalam STAT1005 61/ 65 Combinations I In permutations, order matters. In the ‘words’ example, ‘pqr’ is different from ‘rpq’. I In many situations, counting the number of unordered subsets of a specific size is required. I Given a set of n distinct objects, any unordered subset of size k of the objects is called a combination. I The number of combinations of size k that can be formed from n distinct objects will be written by n = Ckn = Cn,k . k I For given n and k, the number of combinations is smaller than the number of permutations (unless k = 1, in which case they are equal), because when order is disregarded, some of the permutations correspond to the same combination. Dr Vasudevan Mangalam STAT1005 62/ 65 Combinations I In the set {A, B, C, D, E}, the number of permutations of 5! = 60. size 3 is (5−3)! I Consider a single subset {A, B, C}. There are 3! = 6 permutations that corresponding to a single combination. I Similarly, for any other combination of size 3 there are 6 permutations, each obtained by ordering the three objects. I Thus, we’re overcounting by a factor of 6, and hence the 5! × 3!1 = 10, i.e., number of combinations is (5−3)! {A, B, C} {A, B, D} {A, B, E} {A, C, D} {A, C, E} {A, D, E} {B, C, D} {B, C, E} {B, D, E} {C, D, E} Dr Vasudevan Mangalam STAT1005 63/ 65 Combinations I More generally, when there are n distinct objects, the number of combinations of size k is given by Pn n! n = k = k! k!(n − k)! k I n0 = nn = 1 because there is only one way to choose a set of all n elements or of no elements. I n1 = n since there are n subsets of size 1. I In how many different ways can 6 tosses of a coin yield 2 heads and 4tails? Answer: 62 = 15. I How many committees of two actuarial science students and one data science student be formed from four actuarial science students and three data science students? Answer: 42 × 31 = 6 × 3 = 18. Dr Vasudevan Mangalam STAT1005 64/ 65 Combinations I Eight politicians meet at a fund-raising dinner. How many greetings can be exchanged if each politician shakes hands with every other politician exactly once? I Think of the politicians as being numbered from 1 to 8. A handshake corresponds to an unordered sample of size 2 8 chosen from those 8. So the answer is 2 = 28. I If there are 20 points on a paper such that no three are collinear, 1. how many line segments can one create with two of these points as endpoints? 2. how many triangle can one form with three of these points as vertices? I The answers are I Note that nk = 20 20 2 = 190 and 3 = 1140 respectively. n 20 20 n−k . For instance, 18 = 2 = 190. Dr Vasudevan Mangalam STAT1005 65/ 65

Probability & Data Analysis: Intro Presentation

Related documents

Products

Support

Probability & Data Analysis: Intro Presentation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib