IE 415: SUMMER 2015 LAB 1: Simulating Confidence Intervals with Excel 1. Introduction In this laboratory exercise, you will work in teams of two, and estimate the “confidence level” of confidence intervals for the mean of a random variable. As was (or will be) discussed in lecture, confidence intervals fall under the topic of statistical inference where the objective is to make conclusions about a population or probability distribution based on sample observations. Confidence intervals are most often constructed as an interval that contains the mean of a probability distribution with a specified “confidence”. The interpretation of confidence level applies to the repeated collection of data and construction of confidence intervals, where the confidence level equals the fraction of confidence intervals constructed that contain the mean of the probability distribution. However, confidence levels are based on assumptions made about the probability distribution from which the data samples are realizations. If the data are assumed to be independent samples from a normal distribution, and the standard deviation is estimated from the data, then the 95% confidence interval for the mean is: [𝑥̅ − 𝑡0.025,𝑛−1 𝑠 √𝑛 , 𝑥̅ + 𝑡0.025,𝑛−1 𝑠 √𝑛 ] Where 𝑥̅ is the sample average of n observations, s is the sample standard deviation, and 𝑡0.025,𝑛−1 is a value from a t-distribution with n-1 degrees of freedom such that the probability of observing a value greater than 𝑡0.025,𝑛−1is equal to 0.025. When the assumptions under which a confidence interval is constructed are not met, then the confidence level will usually not equal the stated confidence level. In this lab you will use Monte Carlo simulation to estimate the confidence interval confidence levels under varying situations. 1 2. Generating Data in Excel Excel can generate observations from several distributions. To do so, the Analysis ToolPak must be installed. This is done through the selection: File→Options→Add-Ins. The generation of observations from distributions is done through the selection: Data (a tab)→Data Analysis→Random Number Generation. The lab instructors will demonstrate. You will use this Excel capability to generate observations from a normal distribution with a mean = 5, and standard deviation = 5. You will also be generating observations from an exponential distribution with a mean = 5, and standard deviation = 5. If we let X denote the exponential random variable, then a single observation x, will generated using the formula: 𝑥 = −5 ∗ ln(𝑅𝐴𝑁𝐷()) RAND() is the Excel formula for generating a random number (a value equally likely to be any value between 0 and 1). 3. Useful Excel Features Two useful features in Excel that you can use in this lab assignment are the IF function and relative and absolute cell referencing. IF Function The IF function in Excel has the following syntax: =IF(logical_test, [value_if_true], [value_if_false]) See Excel Help for more information on the arguments. The “IF function” can be used in many ways in a spreadsheet simulation. An important feature is that IF functions can be nested in other IF functions, which permits the modeling of outcomes based on multiple conditions, without having to enumerate all of the possible conditions. Relative and Absolute Cell Referencing In Excel, formulas and/or functions can reference values in other cells (e.g., cell A3). If a formula present in cell C3 is “=A3 + B3” (the sum of values in cells A3 2 and B3) is copied to cell C4, the formula in C4 automatically changes to “=A4+B4”. To control this automatic formula cell referencing, a method called absolute column and row referencing can be used. If the formula in cell C3 is “=$A$3 + $B$3”, and cell C3 is copied and pasted into any other cell location, the copied formula remains “=$A$3 + $B$3”. If a “$” sign is removed from the position in front of a row letter, the row reference will change automatically, but the column reference will not. The opposite is true if the “$” sign is removed from the position in front of a column number. This absolute referencing technique is helpful when creating tables that reference values in the column and row headings. To practice this, create small tables (a multiplication table is a good example) with numeric column and row headings and then insert a formula into a cell in the table that references the column heading (use an absolute row reference for this cell) and row heading values (use an absolute column reference for this cell) . You should then be able to copy and paste this single formula into all remaining table cells. 4. Lab Assignment To complete the lab, work with your partner and follow the general procedure below separately for both normal and exponential random variables (each with a mean and standard deviation = 5). Generate five independent observations of the random variable, Construct a 95% confidence interval, Record whether the confidence interval contains the known mean, Repeat a total of 5000 times (i.e., generate five independent observations 5000 times and 5000 95% confidence intervals), Estimate the confidence level from the simulation results. Since you know the true mean you can tabulate the percentage of confidence intervals that contain the true mean. This percentage is the estimated confidence level. Note: 𝑡0.025,4 = 2.776 = T.INV(0.975,4) in Excel. Do not use any macros or other Excel add-ins to complete the assignment. What to turn in E-mail your completed Spreadsheets to your lab TA. Include the following: Names of team members (File Name: Last Name-Last Name-Lab#); Clear row and column headings. Any other documentation to make the spreadsheet understandable. This is a judgment call so add more rather than less documentation. Assignments 3 that are too hard to understand will be penalized. Spreadsheets with no column/row labels and no documentation will receive a zero. Clearly show the estimated confidence level. 4