2023 Lecture 5 Probability Distributions and Statistic Inference Dr. Sangwook HA Department of Management (EBIS programme) Faculty of Business and Management Busin WEEK FIVE CONTENTS 01 Session 1: Probability Distributions and Data Modeling 02 Session 2: Statistical Inference (Inferential Statistics) I 01 Probability Distributions and Data Modeling 1-1 Probability Distributions 1-2 Data Modeling 4 Probability Distributions and Data Modeling • Data visualizations and descriptive statistics we have learned so far provide some information about (mostly): ▪ ▪ • However, knowing the past and present may not enough to make decisions for the future, mainly because of uncertainty and randomness in business ▪ • Data at hand (sample) Past and/or present events e.g., product delivery can be delayed because of a bad weather The concept of probability (distribution) and data modeling help managers to make (future) decisions under the presence of uncertainty and randomness and serve a basis for predicting future events (i.e., predictive analytics) 5 Basic Concepts of Probability ● An (probability) experiment is the process (action, trial) that results in an outcome. In a random experiment, outcome cannot be predicted with certainty Experiment: roll two dice Outcome (2 = 1+1) observe ● The sample space is the collection of all possible outcomes of an experiment Sample space ● The outcome of an experiment is a result that we ● An event is a collection of one or more outcomes from a sample space Event: {Even outcomes} = {2, 4, 6, 8, 10, 12} 6 Probability Definitions • Probability is the likelihood that an outcome occurs. Probabilities are expressed as values between 0 and 1 • Probabilities may be defined from one of three perspectives: ▪ Classical definition: probabilities can be deduced from theoretical arguments ▪ Relative frequency definition: probabilities are based on empirical data ▪ Subjective definition: probabilities are based on judgment and experience 7 Classical Definition of Probability Probability of a specific event X occurs • = no. of the specific event X occurred total no. of event occurred Suppose we roll 2 dice: Probability die rolls sum to 3 = 2/36 ≈ 0.0556 8 Relative Definition of Probability • Probability a computer is repaired in 10 days = 0.076 ▪ This probability can change if additional observations are collected 8 9 Basic Concepts of Probability ● Rule 1: Probability of any event is the sum of probability of outcomes that comprise that event ● Rule 2: Probability of complement of an event A is P(Ā)=1-P(A) - For an event A, P(Ā)+P(A) = 1 Therefore, P(Ā)=1-P(A) ● Rule 3: If events A and B are mutually exclusive, then P(A or B) = P(A)+P(B) ● Rule 4: If two events A and B are not mutually exclusive, then P(A or B) = P(A) + P(B) – P(A and B) 10 Conditional Probability ● Conditional probability is the probability of occurrence of one event A, given that another event B is known to be true or has already occurred. ● P(A|B) = P(A and B)/P(B) - the conditional probability of A given B ● Data shows the first and second purchases for a sample of 200 customers. ● Probability of purchasing an iPad given already Second purchase purchased an iMac = 2/13 First purchase 10 11 Random Variables and Probability Distribution ● A random variable is a numerical description of the outcome of an experiment = data (a set of metrics) generated by an experiment ○ A discrete random variable is one for which the number of possible outcomes can be counted ■ e.g., outcomes of dice rolls, whether a customer likes or dislikes a product, number of hits on a website link today ○ A continuous random variable has outcomes over one or more continuous intervals or real numbers ■ e.g., weekly change in Dow Jones Industrial Average, daily temparature, time between machine failures 12 Random Variables and Probability Distribution ● Probability distribution: a characterization of the possible values that a random variable may assume along with the probability of assuming these values ▪ A theoretical model of the random variable (contain probabilities of all possible outcomes) ▪ Can be constructed by using observed data (= an empirical probability distribution) ✓ ✓ ✓ X axis = all possible outcomes (values) from an experiment Y axis = probability of observing each outcome (value) from the experiment The sum of area under the function is always 1 (= 100%) 13 Discrete Probability Distribution • The probability distribution of the discrete outcomes is called a Probability Mass Function (PMF) • A mathematical function f(x) specifying the probability of the random variable X • xi represents the i th value of X, and 𝑓(𝑥𝑖 ) is the probability • Properties: 14 Discrete Probability Distributions Example: Probability Mass Function for the sum of two independent rolling dice f(x=2)=1/36 f(x=3)=2/36 f(x=4)=3/36 f(x=5)=4/36 f(x=6)=5/36 ⋮ f(x=12)=1/36 15 Cumulative Distribution Function • A cumulative distribution function, F(x) specifies the probability that the discrete (or continuous) random variable X assumes a value less than or equal to a specified value, x; that is, Example: Using the Cumulative Distribution Function ● Probability of rolling between 4 and 8: P(4≤X≤8) = P(3<X≤8) = F(X=8)-F(X=3) =26/36-3/36 =23/36 16 Expected Value of a Discrete Random Variable • The expected value of a random variable corresponds to the notion of the mean, or average, for a sample. • For a discrete random variable X, the expected value, denoted is the weighted average of all possible possible outcomes, where the weights are the probabilities: 17 Computing the Expected Value • Rolling two dice 18 Application: Airline Revenue Management • • • • • Full and discount airfares are available for a flight. Full-fare ticket costs $560. Discount ticket costs $400. X = ticket price paid p = 0.75 (the probability of selling a full-fare ticket) • • The airline should not discount full-fare tickets because the expected value of a full-fare ticket is greater than the cost of a discount ticket. • Break-even point: $399 ≈ 0.714*($560) 19 Expected Value and Decision Making • The expected value is a “long-run average” and is appropriate for decisions that occur on a repeated basis. • For one-time decisions, however, you need to consider the downside risk and the upside potential of the decision. 20 Expected Value of a Charitable Raffle • Cost of raffle ticket is $50. • 1000 raffle tickets are sold. • Winning prize is $25,000. • E(x) = -50*0.999 + 24950*0.001 = -25 • If you played this game repeatedly over the long run, you would lose an average of $25.00 each time you play. • However, for any one game, you would either lose $50 or win $24,950. – Is the risk of losing $50 worth the potential of winning $24,950? 21 Variance of a Discrete Random Variable • The variance, of a discrete random variable X is a weighted average of the squared deviations from the expected value: 22 Computing the Variance of a Random Variable • Rolling two dice Continuous Probability Distributions • Continuous random variable is defined over one or more intervals of real numbers = has an infinite number of possible outcomes ▪ Change in DJIA using 5% increment ▪ Change in DJIA using 2.5% increment The probability distribution is approaching a smooth curve as the interval for outcomes decreases 34 35 Continuous Probability Distributions • A probability density function is a mathematical function that characterizes a continuous random variable (e.g., stock market index) Continuous Probability Distributions Probability density function ● A curve described by a mathematical function that characterizes a continuous random variable Properties of a probability density function ● f(x)≥0 for all values of x -> a graph of the density function must lie at or above the x-axis ● Total area under the density function equals 1. ● P(X=x)=0 -> we cannot define a probability of a specific value in the case of a continous random variable (infinite numbers!) ● Probabilities are only defined over an interval. ● P(a≤X≤b) is the area under the density function between a and b. P(a≤X≤b) = P(X≤b)-P(X≤a) = F(b)- F(a) 36 Distributions and Business Decisions Why? Working knowledge of common families of probability distributions: 1. helps you to understand underlying process that generates sample data 2. useful in building decision models with theoretical distribution of data 3. helps to compute probabilities of occurrence of outcomes to assess risks and make decisions 37 Commonly Used Distributions ● ● Discrete Variables ○ Bernoulli Distribution ○ Binomial Distribution ○ Poisson Distribution … Continuous Variables ○ Uniform Distribution ○ Normal Distribution (and Standard Normal Distribution) ○ (Student’s) t-distribution ○ Exponential Distribution 38 39 Uniform Distribution • The uniform distribution characterizes a continuous random variable for which all outcomes between a minimum (a) and a maximum (b) are equally likely. • Density function: • Cumulative distribution function: • Expected value 40 Computing Uniform Probabilities • Sales revenue for a product varies uniformly each week between $1000 and $2000. • Probability that sales revenue will be less than x = $1,300. – • Probability that revenue will be between $1,500 and $1,700. – 41 Discrete Uniform Distribution • A variation of the uniform distribution is one for which the random variable is restricted to integer values between a and b (also integers); this is called a discrete uniform distribution. – Example: roll of a single die. Each of the numbers 1 through 6 have a probability of occurrence. 42 Normal Distribution ● f(x) is a bell-shaped curve ● Characterized by 2 parameters ○ 𝝁 (mean) ○ 𝝈 (standard deviation) ● Properties 1. Symmetric 2. Mean = Median = Mode 3. Range of X is unbounded (negative ~ positive ∞) 1. Empirical rules apply = The area under the density function within ± 2 standard deviation is 95.4%, and that within ± 3 standard deviation is 99.7%) 43 Normal Distribution Example The distribution for customer demand (units per month) is normal with: mean=750 stdev.=100 Find the probability that demand will be: a) at most 900 units/month b) exceed 700 units/month c) be between 700 and 900 units/month Normal Distribution a) Probability of Demand be at most 900 units/month Normal Distribution b) Probability of Demand exceeds 700 units/month 46 Normal Distribution c) Probability of Demand be between 700 and 900 units/month Standard Normal Distribution ● A standard normal distribution is a normal distribution with a mean 0 and standard deviation of 1 ▪ A standard normal random variable is denoted by Z (z-scores) ▪ The scale along the z-axis represents the number of standard deviations from the mean of zero 47 48 Using Standard Normal Distribution Tables ● Table 1 of Appendix A ● We can compute probabilities for any normal random variable X having a mean 𝝁 and standard deviation 𝝈 by converting it to a standard normal random variable Z: ● In other words, all normal distributions can be converted into the standard normal distribution & use the standard normal distribution table to calculate its probabilities 49 Computing Probabilities with Standard Normal Tables ● In the earlier example, what is the probability that demand will be at least 900 units/month? ● Using the table, we find: P(X<900)=P(Z<1.50)=0.93319 𝑧= (900 − 750) = 1.50 100 50 Question (b) ● Probability that demand will exceed 700 units, or P(X>700). = 1-pnorm(700,750,100)=1-0.3085=0.6915 51 Question (c) ● Probability that demand will be between 700 and 900, or P(700<X<900). = pnorm(900,750,100)-pnorm(700,750,100)=0.9332-0.3085 = 0.6247 55 Probability Functions in R In R, probability functions take the form: distribution_abbreviation ()where the first letter refers to the aspect of the distribution returned: • d = density • p = cumulative distribution function • q = quantile function • r = random generation (random deviates) Abbreviations: • multinom (multinominal distribution) • binom (binominal distribution) • nbinom (negative binominal distribution) • norm (normal distribution) • exp (exponential distribution) • pois (poison distribution) • unif (uniform distribution) Example: rnorm() generates values drawn from a normal distribution with a specified mean and standard deviation 56 Probability Functions in R By default, R’s functions for normal distribution returns values for the standard normal distribution 57 Examples of Using R to plot probability distribution • Plot the standard normal curve on the interval [–3,3] • What is the area under the standard normal curve to the left of z=1.96? • What is the value of the 90th percentile of a normal distribution with a mean of 500 and a standard deviation of 100? • Generate 50 random normal deviates with a mean of 50 and a standard deviation of 10. 58 Data Modeling and Distribution Fitting ● Using sample data may limit our ability to predict uncertain events that may occur because potential values outside the range of the sample data are not included ● A better approach is to identify the underlying probability distribution from which sample data come by “fitting” a theoretical distribution to the data and verifying the goodness of fit statistically = test whether the sample data has a shape of a specific probability distribution ▪ Examine a histogram for clues about the distribution’s shape ▪ Look at summary statistics such as the mean, median, ▪ standard deviation, coefficient of variation, and skewness 59 Analyzing Airline Passenger Data ● Sample data on passenger demand for 25 flights ● The histogram shows a relatively symmetric distribution. ● The mean, median, and mode are all similar, although there is moderate skewness. Normal distribution seems reasonable. 31 60 Goodness of Fit ● A better approach than simply visually examining a histogram and summary statistics is to analytically fit the data to the best type of probability distribution. ● Statistical measures of goodness of fit: ▪ Chi-square (need at least 50 data points) ▪ Kolmogorov-Smirnov (works well for small samples) ▪ Anderson-Darling (puts more weight on the differences between the tails of the distributions) ▪ Shapiro-Wilk Normality Test (test data against normal distribution) 32 Goodness of Fit ● Kolmogorov-Smirnov ● Shapiro-Wilk test Goodness of Fit ● Graphical methods Probability density plot Quantile-Quantile (QQ) Plot 02 Statistical Inference I 2-1 Part 1: Sampling and Estimation 64 Sampling and Estimation Average monthly spending of Chinese university students = ? (𝝁) ● Sampling refers to a process of collecting a subset of observations from its (intended / assumed) population ● Sample data allow us to infer the characteristics of population, which is usually not known (= Estimation of unknown population parameters) ● Estimators are measures used to estimate unknown population parameters ● A point estimate is a single number derived from a sample that is used to estimate the value of a population parameters ● Examples: ഥ is a point estimate of 𝝁 ○ Mean: 𝒙 ○ Standard deviation: s is a point estimate of 𝝈 Average monthly spending of 100 UIC students = RMB 1000 (ത𝒙) 65 Sampling and Estimation ● (Classical) statistics aim to infer population level values / probabilities ● To do this, we need to assume the characteristics of population in terms of probability = probability distribution (e.g., normal distribution) Distribution of values for variable diesum in data (data distribution, n=36) ● We already know that if we can assume the probability distribution of a variable, we can calculate the population-level value for a given sample value Probability function ● In practice, performing such calculation is often challenging because: 1) Errors in the sampling process 2) Population information for sample data is usually unknown (e.g., mean, S.D., …) Area under the function (probability) Distribution of all possible values & probability of observing these values for variable diesum at population (probability distribution, n = ∞) 66 Sampling Error ● Different samples from the same population can have different characteristics because: 1. Sampling (statistical) error occurs because samples are only a subset of the total population ▪ Sampling error depends on the size of the sample relative to the population; This type of sampling error cannot be totally avoided 2. Non-sampling error occurs when the sample does not adequately represent the target population ▪ Nonsampling error usually results from a poor sample design or choosing the wrong population frame (e.g., convenience sampling) Both errors can influence (point) estimates 67 A Sampling Experiment ● A population is uniformly distributed between 0 and 10. ▪ Mean = (0+10)/2=5 ▪ Variance = (10-0)2/12=8.333 ● Experiment: 1. Generate 25 samples of size 10 from this population (10 rows * 25 2. 3. 4. 5. columns = 250 obs) For each sample, compute its mean, sd, mean± 3sd, and its range Prepare a histogram of the 250 observations (= all samples) Prepare a histogram of the 25 sample means (mean of the means from 25 columns) Repeat for larger sample sizes (size 25, 100, 500) and draw comparative conclusions Experiment Results Note that the average of all the sample means is quite close the true population mean of 5.0. 68 69 Experiment Results ● Repeat the sampling experiment for samples of size 25, 100, and 500. • As the sample size increases, the average of the sample means are all still close to the expected value of 5; • however, the standard deviation of the sample means becomes smaller, • meaning that the means of samples are clustered closer together around the true expected value. The distributions become normal. 41 Estimating Sample Error Using the Empirical Rules ● Using the empirical rule for 3 standard deviations away from the mean, ~99.7% of sample means should be between: [2.55,7.45] [3.65,6.35] [4.01,5.91] [4.76,5.24] for for for for n=10 n=25 n=100 n=500 ● As the sample size increases, the sampling error decreases. 70 71 Sampling Distribution of the Mean ● Sampling distribution of the mean is the distribution of the means of all possible samples of a fixed size n from some population ● The standard deviation of the sampling distribution of the mean is called the standard error of the mean: ● As n increases, the standard error decreases ▪ Sample means are less dispersed around the population mean = Larger sample sizes have less sampling error 72 Central Limit Theorem (CLT) 1. If the sample size is large enough (n>=30), then the sampling distribution of the mean: ● is approximately normally distributed regardless of the distribution of the population ● has a mean equal to the population mean 2. If the population is normally distributed, then the sampling distribution is also normally distributed for any sample size. ● The central limit theorem allows us to use the theory we learned about calculating probabilities for normal distributions to draw conclusions about sample means. When calculating probabilities, determine whether it is related to an individual observation or mean or a sample (std dev is the std error ). 73 Central Limit Theorem (CLT) Why CLT is important? As mentioned earlier, performing a statistical inference by using sample data is challenging because of 1) Errors in the sampling process 2) Population information for sample data is usually unknown (e.g., mean, S.D., …) CLT helps us to resolve these issues by defining the probability distribution of sample means from a population (i.e., sampling distribution) With CLT, we can calculate the probability of observing a sample with a specific mean value at the population level 74 Using Standard Error in Probability Calculations ● The amount of purchase orders for books on a publisher’s website is normally distributed with a mean of $36 and a standard deviation of $8. ● Find the probability that: 1. The amount of someone’s purchase order exceeds $40. Use the population standard deviation: P(x>40)=1-pnorm(40,36,8)=0.3085 2. the mean amount of 16 customers’ purchase orders exceeds $40. Use the standard error of the mean (8/ 16 = 2): P(x̅>40)=1-pnorm(40,36,2)=0.0228 75 Interval Estimates ● An interval estimate provides a range for a population characteristic based on a sample (population distribution is assumed) ▪ Intervals specify a range of plausible values for the characteristic of interest and a way of assessing “how plausible” they are ▪ e.g., if we observe a value “X” in a sample, what would be its value in the population? ● 100(1-α)% probability interval is any interval [A,B] such that the probability of falling between A and B is 1-α. ▪ Probability intervals are often centered on the mean or median ▪ Example: in a normal distribution, the mean +/- 1 sd describes an approximate 68% probability interval around the mean. ▪ Another example, the 5th and 95th percentiles in a data set constitute a 90% probability interval 76 Interval Estimates in the News ● In the U.S., news media often conduct a poll (sample) to predict the outcome of an election (population) 1. A Gallup poll might report that 56% of voters support a certain candidate with a margin of error of ±3% ▪ We would have a lot of confidence that the candidate would win since the interval estimate is [53%, 59%] 2. Suppose the poll reported a 52% level of support with a ±4% margin of error ▪ We would be less confident in predicting a win for the candidate since the interval estimate is [48%, 56%] How to calculate the error associated with a point estimate? 77 Confidence Intervals ● A confidence interval is a range of values between which the value of the population parameter is believed to be, along with a probability that the interval correctly estimates the true (unknown) population parameter. ▪ This probability is called the level of confidence, denoted by 1-α, where α is a number between 0 and 1. ▪ The level of confidence is usually expressed as a percent; common values are 90%, 95%, or 99%. ● For a 95% confidence interval, if we chose 100 different samples, leading to 100 different interval estimates, we would expect that 95% of them would contain the true population mean ▪ In other words, Confidence interval estimates provide a way of assessing the accuracy of a point estimate Confidence Interval for the Mean with Known Population Standard Deviation ● We can use the standard normal distribution to calculate the range of sample mean at the population level if SD is known ● Sample mean ± margin of error ● Margin of error is: zα/2 (standard error) ○ zα/2: value of standard normal random variable for an upper tail area of α/2 (or a lower tail area of 1-α/2). ○ Example: if α=0.05 (for a 95% confidence interval), then z0.975=1.96 ○ Example: if α=0.10 (for a 90% confidence interval, then then z0.95=1.645 78 Confidence Interval for the Mean with Known Population Standard Deviation ● A production process fills bottle of liquid detergent. The standard deviation in filling volumes is constant at 15 mls. A sample of 25 bottles revealed a mean filling volume of 796 mls. ● A 95% confidence interval estimate of the mean filling volume for the population is But what if we don’t know the population’s standard deviation? 79 The t-Distribution ● Also called Student’s t-Distribution ● Used for confidence intervals when the population standard deviation is unknown ● Its only parameter is the degree of freedom (df) (no. of sample values - no. of est parameters) ▪ The shape of t-distribution changes with df 80 Confidence Interval for the Mean with Unknown Population Standard Deviation 81 ● Formula: ●t value from t-distribution with (n-1) degrees of freedom, giving an upper tail probability of α/2 (or a lower tail area of 1-α/2). α/2,n-1: ○ Example: if α=0.05, n=30 (for a 95% confidence interval), then • t0.975,29=2.05; ○ Example: if α=0.10 (for a 90% confidence interval), then • t0.95,29=1.70. T Dist, n=30 T Dist, n=100 T Dist, n=1000 Z Dist The shape of t-distribution becomes closer to a normal distribution as its df increases Confidence Interval for the Mean with Unknown Population Standard Deviation ● Excel file Credit Approval Decisions. Find a 95% confidence interval estimate of the mean revolving balance of homeowner applicants. ● Sample mean=$12,630.37; s=$5393.38; standard error=$1037.96; t0.025,26=2.056 12,630.37± 2.056(5393.38/√27) 82 Confidence Interval for the Mean with Unknown Population Standard Deviation 83 Thank You For Listening Homework Intended Learning Outcomes Assessment Tasks • Run the R codes and try to understand all the codes Assessment Tasks Learning Activities