Statistics and Their Distributions Assume we are running an online retail store. Some factors may be of great interest to us. Like the distributions of buyers nation-wide, the time a customer spend on the website per each visit, costumers’ satisfactories and etc. For some factors, we may get the data for the “whole population”, like the time spent per each visit. While for others, we can only obtain information of a sample, like costumers’ satisfactories. If we take into account of the future customers, we are unable to get the information about the population theoretically. All the information we are dealing with now is just from a sample. Liang Zhang (UofU) Applied Statistics I July 7, 2008 1 / 17 Statistics and Their Distributions For example, we have the following data about the time spent per each visit (in min.) from a sample of size 10: 1 2 3 4 5 time 24 51 12 95 26 6 7 8 9 10 time 5 33 62 31 27 Each observation is “random”, i.e. we can not predict the exact value before we obtain the observation. Therefore we can associate a random variable Xi to the ith observation. Liang Zhang (UofU) Applied Statistics I July 7, 2008 2 / 17 Statistics and Their Distributions Often, we are interested in some overall properties of the sample. For example, the maximum of the previous sample is 95; the minimum is 5; the mean is 36.6; the medain is 29; and the standard deviation is 26.4. Sometimes, these characteristics are more interesting to us than the sample data itself. We call these characteristics statistics. Liang Zhang (UofU) Applied Statistics I July 7, 2008 3 / 17 Statistics and Their Distributions Definition A statistic is any quantity whose value can be calculated from sample data. Remark: 1. A statistic is a random variable. The reason is prior to obtaining data, we are not sure what value of any particular statistic will result. We use uppercase letters to denote statistics and lowercase letter to denote the calculated or observed values of statistics. Liang Zhang (UofU) Applied Statistics I July 7, 2008 4 / 17 Statistics and Their Distributions Remark: 2. A statistic must be calculated from sample data. For example, if in addition to the size 10 sample for the previous example, we also know that the time spent per each visit is normally distributed with mean µ and variance σ 2 , then neither the population mean µ nor the population variance σ 2 is a statistic. While the sample mean and the sample variance are two valid statistics, which will be denoted by X and S 2 , respectively. 3. Any statistic, being a random variable, has a probability distribution. The probability distribution of a statistic is referred to as its sampling distribution. The sampling distribution of a statistic DEPENDS on the sample size n. Liang Zhang (UofU) Applied Statistics I July 7, 2008 5 / 17 Statistics and Their Distributions Definition The random variables X1 , X2 , . . . , Xn are said to form a (simple) random sample of size n if 1. The Xi s are independent random variables. 2. Every Xi has the same probability distribution. In words, X1 , X2 , . . . , Xn forms a random sample if the Xi ’s are independent and identically distributed (iid). Liang Zhang (UofU) Applied Statistics I July 7, 2008 6 / 17 Statistics and Their Distributions Remark: When sampling with replacement or from an infinite (conceptual) population, the two conditions are satisfied and the result can be regarded as a random sample. For sampling WITHOUT replacement from a finite population, although consecutive observations are not independent and identically distributed, we can still regard the result as a random sample if the sample size n is much smaller than the population size N. In practice, if n/N ≤ .05 (at most .05% of the population is sampled), we can regard the sample as a random sample. Liang Zhang (UofU) Applied Statistics I July 7, 2008 7 / 17 Statistics and Their Distributions Deriving Sampling Distributions Example (Problem 38) There are two traffic lights on my way to work. Let X1 be the number of lights at which I must stop, and suppose that the distribution of X1 is as follows: 0 1 2 x1 µ = 1.1, σ 2 = .49 p(x1 ) .2 .5 .3 Let X2 be the number of lights at which I must stop on the way home; X2 is independent of X1 . Assume that X2 has the same distribution as X1 , so that X1 , X2 is a random sample of size n = 2. a. Let X = (X1 + X2 )/2. Find the probability distribution of X . b. Calculate P(X ≤ 1). c. Calculate µX . How does it relate to µ, the population mean? d. Calculate σ 2 . How does it relate to σ 2 , the population variance? X Liang Zhang (UofU) Applied Statistics I July 7, 2008 8 / 17 Statistics and Their Distributions Deriving Sampling Distributions Example A certain system consists of two identical components. The life time of each component is supposed to have an expentional distribution with parameter λ. The system will work if at least one component works properly and the two components are assumed to work independently. Let X1 and X2 be the lifetime of the two components, respectively. What can we say about the lifetime of the system T0 = X1 + X2 ? Liang Zhang (UofU) Applied Statistics I July 7, 2008 9 / 17 Statistics and Their Distributions Deriving Sampling Distributions Example A certain system consists of two identical components. The life time of each component is supposed to have an expentional distribution with parameter λ = 3. The system will work if both components work properly and the two components are assumed to work independently. Let X1 and X2 be the lifetime of the two components, respectively. Then the lifetime of the system is T1 = min(X1 , X2 ). What is the average lifetime of 5 such systems? This time, direct derivation of the sampling distribution is complicated. Instead, we use the method simulation. Liang Zhang (UofU) Applied Statistics I July 7, 2008 10 / 17 Statistics and Their Distributions Simulation Experiments 1. Use some software to generate a size-5 random sample whose distribution is EXP(3); 2. Generate another size-5 random sample whose distribution is EXP(3); 3. Construct the data set min(Xi , Yi ) for i = 1, . . . , 5 from these two random samples; 4. Calculate the mean of the data set. This is one simulation. 5. Simulate another 499 times; 6. Construct the histogram for the 500 results from simulations. Liang Zhang (UofU) Applied Statistics I July 7, 2008 11 / 17 Statistics and Their Distributions Liang Zhang (UofU) Applied Statistics I July 7, 2008 12 / 17 Statistics and Their Distributions The larger the sample size is, the smaller the spread of the sampling distribution of the sample mean is. Liang Zhang (UofU) Applied Statistics I July 7, 2008 13 / 17 Statistics and Their Distributions Example (Problem 45) Carry out a simulation experiment using a statistical computer package or other software to study the sampling distribution of X when the population distribution is lognormal with E (ln(X )) = 3 and V (ln(X )) = 1. Consider the four sample sizes n = 10, 20, 30, and 50, and in each case use 500 replications. 1. Use some software to generate a size-10 random sample whose distribution is LOGN(3,1); 2. Calculate the mean of the random sample. This is one simulation. 3. Simulate another 499 times; 4. Construct the histogram for the 500 results from simulations. 5. Repeat the simulation for n = 20, 30 and 50. Liang Zhang (UofU) Applied Statistics I July 7, 2008 14 / 17 Statistics and Their Distributions Liang Zhang (UofU) Applied Statistics I July 7, 2008 15 / 17 Statistics and Their Distributions As the sample size becomes larger, the sampling distribution looks more like the normal distribution. Liang Zhang (UofU) Applied Statistics I July 7, 2008 16 / 17 Statistics and Their Distributions Liang Zhang (UofU) Applied Statistics I July 7, 2008 17 / 17