Theoretical Distributions in Probability and Statistics Decision-making In a large family where it is known that there is genetic pre-disposition to suffer from diabetes, how many children out of a possible 7 are likely to be affected by diabetes? A hospital administrator needs to decide how many people to staff the Accident and Emergency Department of the hospital during 9am to 12pm on weekdays. How should the administrator decide? In 2004, the World Health Organisation (WHO) revised the body-mass index (BMI) definitions for overweight and obese individuals in Asian populations. Instead of a BMI range of 25 – 29 for defining overweight, and a BMI range of > 30 for defining obese (as are used in Caucasian populations), the corresponding ranges for Asian populations are 23 – 27.5 and > 27.5. How did the scientists at WHO decide on the new ranges? Modeling the outcome variable with some appropriate theoretical framework Data exploration and Statistical analysis 1. Data checking, identifying problems and characteristics 2. Understanding chance and uncertainty 3. How will the data for one attribute behave, in a theoretical framework? Data Data exploration, categorical / numerical outcomes Model each outcome with a theoretical distribution Random variable Definition: A random variable is a theoretical consideration of the possible outcome of an event. Example: In a survey of 5 students, how many female students are there? The answer to this is a random variable. The possible outcome are 0, 1, 2, 3, 4 or 5 female students. So the random variable describes what the answer could have been, prior to finding out the actual answer. Suppose we know that out of 5 students, there are 4 girls. Then there is no uncertainty nor variability anymore, the exact answer is known and thus this is not a random variable anymore. Discrete random variables Probability mass function Definition: The PMF describes the probability of the possible events for a random outcome. Properties of a probability function: Example 1: Let X denote the number of heads obtained when an unbiased coin is tossed 3 times. Find the probability distribution of X. Find also P(|X – 2| 1.2). Cumulative distribution function Definition: The CDF describes the joint probability of multiple events, and is formally defined as F(X) = P(X x) for any real x. Properties of a CDF: Example 2: Uniform Distribution Definition: A random variable is said to follow a Uniform distribution if any of the possible outcomes are equally likely. Mathematically: P(X = x) = constant. So if there are n possible outcomes, the chance of each of the outcomes is 1 / n. Example 3: In a game of chance, a gambler chooses an integer between 13 and 18 inclusive (including 13 and 18). There are equal chances for any number in the set {13, 14, 15, 16, 17, 18} to be drawn. Let X be the random variable denoting the number drawn. Find the probability distribution of X and also P(X < 16). Bernoulli Distribution A random experiment with two possible outcomes, conveniently defined as “success” or “failure” is called a Bernoulli trial after Jacob Bernoulli (1654 – 1705). The choice of the event as “success” or “failure” is completely arbitrary. Example: a toss of a coin will show either a head or a tail. The “success” event can be either the head, or the tail. Conventionally, p denotes the probability of success and 1 – p denotes the probability of failure. Images from www.google.com Binomial Distribution The number of “success” events out of n repeated trials, each trial resulting in 2 mutually exclusive outcomes with the repeated trials being mutually independent, follows a Binomial distribution. Example 4: A batch of pregnancy test kit contains 50 kits of which 10% are known to be defective. If 3 test kits are randomly chosen with replacement from the batch, what is the probability that: (i) all will be defective; (ii) none will be defective; (iii) at least one will be defective; (iv) exactly one will be defective; (v) exactly two will be defective; (vi) not more than two will be defective. Multinomial Distribution The Binomial distribution has been used to obtain probabilities for the number of times an event of interest (out of 2 possible events) occurs when the same experiment is repeated several times. Sometimes one is interested to count the number of occurrences of several events simultaneously. In such a situation the multinomial distribution is useful. Assuming there are k possible outcomes, and E1, E2, …, Ek denote the corresponding number of occurrences of each of the possible outcomes out of a total of n events, then with pi = P(Ei). Example 5: When snapdragons with pink flowers are crossed, a randomly chosen offspring has either red (with prob. 0.25), pink (with prob. 0.50) or white (with prob. 0.25) flowers. What is the probability that among 10 randomly chosen seeds, 3 will develop white flowers, 2 red ones and 5 pink flowers? Poisson Distribution The Poisson distribution is usually used to calculate the probabilities of a number of occurrences of a rare event. Often these cases are such that an event can occur repeatedly over a long period of time or over a large area; the distribution applies to the number of occurrences in a small interval of time or over a small area. Example: machine breakdowns, arrivals of calls at a telephone exchange, faults developing in a pipeline, random arrival of customers at a service station, accident occurrences, radioactive decay, gene mutations at a particular locus Assumptions of a Poisson Distribution • The outcomes occur randomly. • The number of outcomes occurring in one time interval or specified region is independent of the number that occur in any other disjoint time interval or region. • The probability that a single outcome will occur during a very short time interval or in a small region is a very small and is constant. • The probability of 2 or more outcomes occurring in such a short time interval or fall in such a small region is negligible. Properties of a Poisson Distribution (A) If X ~ Binomial(n, p), X Poisson (np) as n , p 0, with np constant. That is, the Poisson distribution arises as the limiting case of the Binomial distribution. (B) Suppose that X1 and X2 are independent random variables with X1 ~ Poisson(1) and X2 ~ Poisson(2), then Y = X1 + X2 ~ Poisson(1 + 2). That is, the sum of two independent Poisson random variables also has a Poisson distribution. Example 6: The number of emergency admissions each day to a hospital is found to have a Poisson distribution with mean 2. a) Evaluate the probability that on a particular day there will be no emergency admissions. b) At the beginning of one day, the hospital has 5 beds available for emergencies. Calculate the probability that this will be an insufficient number for the day. c) Calculate the probability that there will be exactly 3 admissions altogether on two consecutive days. Example 7: Oranges are packed in crates each containing 250. On the average 0.6% are found to be bad when the crates are opened. What is the probability that there will be more than 2 bad oranges in a crate? Recap – Numerical EDA • Calculating informative numbers which summarise the dataset • What are the numbers useful for describing the age of 1,059 individuals with diabetes? • Location parameters (mean, median, mode) • Spread (range, standard deviation, interquartile range) • Skewness 20 30 40 Properties of means and variances Mean age (54.6 years) in theoretical distributions play important roles in determining variations in the definitions of the 50 60 70 80 outcomes AGE Mean (Expectation) of a discrete random variable The expectation of a discrete outcome X, commonly known as the mean of X or the expected value of X, is denoted as E(X) and defined as The value of E(X) refers to the average value of x that one can expect after sampling a large number of values from . E(X) is the long run average of observations of the variable X. The expectation of any function g(.) which depends on the random variable X, g(X), is defined as follows Variance of a discrete random variable The variance of X, or the population variance of X, is denoted by Var(X) and is defined as Var(X) is usually denoted by 2, and is defined to be the standard deviation of X. Functions of means and variances Example 8: Find the expected score of a single roll of a fair die. Continuous random variables Definition: A continuous random variable X takes any value in a given range, and theoretically can be measured to any desired degree of accuracy. (E.g. height, weight, age, etc.) When the total number of possible outcome is very large, the histogram will approximate to a smooth curve called a frequency curve or a probability density curve. The function represented by this curve is called the frequency function, or more commonly known as the probability density function, denoted by f. As the function f denotes a probability function, Some notes on continuous random variables Properties of continuous random variables The cumulative density function (cdf) of a continuous random variable is denoted FX(x) = P(X x) for any real x Uniform Distribution Definition: A random variable is said to follow a Uniform distribution in the interval [a, b] if the probability density function is a constant in the interval. Normal Distribution 68% of the probability, 1 standard deviation away 95% of the probability, 2 SDs away 40 50 60 70 Exam marks for Mathematics exam 80 Normal Distribution Also known as the Gaussian distribution. A useful distribution to model outcomes in the natural world. Images from www.google.com Properties of the Normal distribution - Special case: If = 0, 2 = 1, the X has a Standard Normal distribution. Usually, the probability density function of the standard normal is written (x), and the cdf is written (x). - If X ~ N(0, 1), and Y = aX + b, then Y ~ N(b, a2). Conversely, if X ~ N(, 2), and Y = (X – ) / , then Y ~ N(0, 1). - If X1 ~ N(1, 12) and X2 ~ N(2, 22), and X1 and X2 are mutually independent, then Y = X1 + X2 ~ N(1 + 2 , 12 + 22). - The plot of density function f is bell-shaped and symmetrical about the line x = with a single peak. So the mean, mode and median of the normal distribution coincide. - Practically all of the population (about 99.7%) lies in the interval 3, about 95% of the population lies in the interval 2 and about 68% of the population lies in the interval . Properties of the Normal distribution - Suppose X ~ Binomial(n, p), for large n and relatively large p, the normal distribution can be used as an approximation and X N(np, np(1 – p)) - Suppose X ~ Poisson(), for large , the normal distribution can also be used as an approximation and X N(,) - When the Normal distribution is used to approximate to a discrete distribution, continuity correction must be used. This is because the discrete probability P(X = ) is equivalent to the continuous probability of P( 0.5 X < + 0.5). - For example, suppose X is discrete and the normal approximation is used. Suppose also the question requires to find P(X < 35). This is equivalent to finding the continuous probability P(X < 34.5), since the discrete value x = 35 is not included in the range X < 35, and so the continuous random variable cannot be bigger than 34.5. (since 34.5 x < 34.9999…will still round up to give 35 in the discrete random variable) Calculating probabilities for N(0,1) - http://www.stat.psu.edu/~babu/418/norm-tables.pdf - Cumulative Standard Normal table Images from training.ce.washington.edu P(Z < 0.45) = 0.67364 ? P(Z > 1.12) = 1 ? – P(Z < 1.12) = 1 – 0.8684 = 0.1316 P(Z < -0.45) = 1 – P(Z > -0.45) = 1 – P(Z < 0.45) RExcel and Normal distribution RExcel and Normal distribution Example 9: Suppose X ~ N(0, 1), and x takes values from the set X. Find the following probabilities, by using RExcel. a) P(X < x) for x = 0.65 b) P(X x) for x = 0.123 c) P(X > x) for x = 2.78 d) P(X > x) for x = 0 Example 10: X and Y are independent random variables which are both normally distributed, with X ~ N(100, 25) and Y ~ N(120, 20). Calculate the following probabilities: (a) P(X > 92) (b) P(Y > X) (c) P(2X + Y < 300) (d) P(|X – Y| < 10) Exponential Distribution Recall that, under certain assumptions, the number of occurrences of rare events follows a Poisson distribution. Sometimes, the interest may be in the time till the observation of the event. Let Yt denote the number of occurrences of rare events in t time units. Suppose the mean number of events is per time unit. Then Yt follows a Poisson distribution with mean = t. Let X denote the time, measured from an arbitrary moment to the first event. Then P(X > x) = P(No events in an interval of x time units) = P(Yx = 0) = e x Therefore FX(x) = P(X x) = 1 – P(X > x) = 1 – ex , and f(x) = ex This is called the exponential distribution or the waiting time distribution. Exponential Distribution The waiting time until an event occurs in a Poisson process follows the exponential distribution. Lack of memory property This is rather relevant to some of you! The waiting time for a bus follows an Exponential distribution (prove this!), and this property of an Exponential distribution is rather depressing. It says that the chance that you have to wait for another 5 minutes for the bus is exactly the same if you had waited for 20 minutes already and yet still have not seen it arrive! Example 11: Assume that the number of radioactive particles emitted by a radioactive substance is 1.5 per second. What is the chance that we have to wait more than three seconds for the first emission to occur? Example 12: Assume that the average time between two subsequent visits of insects to a certain flower is 12 minutes. You are starting to observe the flower. What is the chance that you will have to wait for no more than 15 minutes for the first insect to arrive? What is the chance that the time between the first and second arriving insect is less than 15 minutes? What is the chance that less than 3 insects will visit the flower, given that you observe the flower for one hour? Entropy Often in medical research, we are interested in predicting the outcome given some probability statements. Suppose there are four possible outcomes after chemotherapy treatment: (complete remission, partial remission, no change, early death) If the probabilities of the four outcomes estimated from current data are: (0.90, 0.08, 0.02, 0.00), you will feel confident about the treatment, since current data intuitively provided a lot of information and this information seems to suggest a highl likelihood of positive outcomes. Similarly, if the probabilities are (0.01, 0.01, 0.08, 0.90) You will also feel confident that you should avoid undergoing the treatment, because again, current data provided a lot of information to suggest negative outcomes. Entropy However, if the probabilities are: (0.25, 0.25, 0.25, 0.25) you actually will not gain additional information from previous data, or previous data are perfectly uninformative. Entropy is a statistical measure to quantify the amount of information available for prediction, and is calculated from using all the probabilities of the possible outcomes (i.e. from the probability function). Statistical definition The entropy of a random variable X with probability function p(x) is defined to be the quantity Entropy It can be shown that for a random variable with n possible values, the entropy is always bounded between 0 and log(n), where: 0 corresponds to the situation with perfect information Log(n) corresponds to the situation with no information. Relative mutual information It is increasingly common to define the relative mutual information (RMI) as RMI(X) = 1 – [H(X)/log(n)] to yield a more intuitive information criterion that is bounded between 0 and 1, where: 0 corresponds to the situation with no information 1 corresponds to the situation with perfect information. Example 13: Let X denote the outcome when flipping a fair coin and Y the outcome when rolling a fair die. Let furthermore Z be one, if two fair dice show a double six and zero otherwise. Notice that if you want to predict the outcome of these random variables, you have the best chance to predict Z correctly. Y is hardest to predict. Calculate the entropies and the relative mutual information of these three random variables. Something fun – practical application of what we have learnt so far! Very common for students to go through the material on probability and theoretical distributions thinking about what’s the relevance of all these in real life! Let’s look at something fun, which most of you will hopefully have some experience with: Images from www.google.com Monopoly - 40 grids possible - each player moves his avatar around the game board by rolling two dice - Community Chest / Chance - Acquire properties across the game board - Develop properties of the same colour combination into houses and hotels - Aim to bankrupt other players and be the richest (sounds familiar?) - Potential of going to jail if landing on “Go to jail” - or if you roll doubles 3 times in a row - or if Chance / Community Chest sends you there. Images from www.google.com Monopoly - 40 grids possible - Every grid equally likely? (or 2.5% chance?) - What are the properties that are most likely to be landed on? - Computer simulation of Monopoly, with all the rules and regulations - turns out that the Jail spot has the highest occupancy rates (5.88%) - that inevitably results in the orange properties being the most frequented (8.47%) Images from www.google.com Simple probability theory and knowledge of dice outcome can provide a marginal edge in games! Possible outcomes from roll of two dice: Prob(X = 2) = 1 in 36 Prob(X = 3) = 2 in 36 Prob(X = 4) = 3 in 36 Prob(X = 5) = 4 in 36 Prob(X = 6) = 5 in 36 Prob(X = 7) = 6 in 36 Prob(X = 8) = 5 in 36 Prob(X = 9) = 4 in 36 Prob(X = 10) = 3 in 36 Prob(X = 11) = 2 in 36 Prob(X = 12) = 1 in 36 8.19% 7.61% 3.06% 2.65% 7.52% 8.47% 2.96% 2.20% 2.91% 4.61% 7.17% 2.15% 5.88% 6.62% 4.57% Waiting time? We can model the waiting time for someone to land on a particular grid with an Exponential distribution. For example, let’s suppose we are interested in the most expensive property on the board. 38.2 39.8 32.7 37.838.8 39.4 38.4 40.1 36.0 40.2 33.8 39.2 36.6 41.9 34.3 45.3 41.5 45.3 47.9 41.9 39.0 39.6 46.5 46.1 43.7 46.2 47.6 40.5 Students should be able to • know the definitions of the various terminologies and distributions • know how to calculate the probability mass/density function for the theoretical distributions, and in empirical situations • calculate the probability of specific outcomes, when assuming a theoretical distribution for these outcomes • understand the interpretation of entropy and know how to calculate the entropy