Random Variable Lab Overview The important thing to do in lab is to play with the random number generators, and do something with the numbers you generate. The random number generators can be found under the “Tools” menu choose “Data Analysis…” and then find “Random Number Generation.” Here I will quote from MS Excel Help: “This analysis tool fills a range with independent random numbers drawn from one of several distributions.” Random Number Generation Once you are in the RNG dialogue window, You will have to put in a few pieces of info so that Excel can generate the kind of randomness you want. Since we're going to be simulating rolling a die, we will want to simulate a discrete distribution Number of variables: Enter the number of columns of values you want in the output table (e.g. how many dice do You want to roll??). Number of random numbers: Enter the number of data points you want to see (e.g. how many times do You want to roll each die?). Distribution: The one we are using today is a) Discrete: Characterized by a value and the associated probability range (e.g. the value of each side of the die (1, 2, 3, 4, 5, 6) and the probability of each value (on a fair die, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6)). The range must contain two columns: The left column contains values, and the right column contains probabilities associated with the value in that row. The sum of the probabilities must be 1. So you’ll enter those two columns before you open the RNG, then when you get to this point, you’ll highlight those two columns. Remember the left-right thing. You can toss a coin by entering “H” and “T” in the left column and 1/2s in the right column. Random Seed: The random seed lets you control, so to speak, the RNG. If you use the same random seed, then you'll get the same sequence of random numbers every time. If you use a different random seed, then you'll be sure to get a different sequence of random numbers every time. Output options: a) Output range: Enter the reference for the upper-left cell of an output table you want created. MS Excel automatically determines the size of the output area and displays a message if the output table will replace existing data. b) New worksheet ply: Will a new worksheet in the current workbook and put the output there. c) New workbook: Will create a new workbook and paste the results in the new workbook. TASK 1 What is the distribution of outcomes for on roll of a fair, six-sided die? Draw it here, placing the possible outcomes on the X-axis, and their respective probabilities on the Y-axis. probability outcomes “Roll” a die 10 times. You have just collected one sample. Enter the mean of each sample in the table below. Now collect 19 more samples. Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9 Sample 10 Sample 11 Sample 12 Sample 13 Sample 14 Sample 15 Sample 16 Sample 17 Sample 18 Sample 19 Sample 20 We are now going to treat the twenty sample means as our “data.” What is the mean of the sample means? ___________ What does the distribution of sample means look like? (make a histogram below). Compare this distribution to the probability distribution you drew above. frequency sample mean TASK 2 On the average, the academic motivation and study habits of female students as a group are better than those of males. The Young Adults Survey of Study Habits and Attitudes (YASSHA) is a psychological test that measures these factors. The distribution of YASSHA scores among women at a college has mean 120 and stdev 28, while the distribution of scores for men has a mean 105 and stdev 35. Simulate taking a sample of 100 women from the female population and a sample of 100 men from the male population. (Make sure you use a different random seed for the male and female sample or else they won’t be truly independent random samples). Once you’ve done this, SAVE. Return to the Random Number Generation dialogue window, You will have to put in a few pieces of info so that Excel can generate the kind of randomness You want. Number of variables: Enter the number of columns of values you want in the output table (e.g. how many groups do You want to study?). Number of random numbers: Enter the number of data points you want to see (e.g. how many cases do You want in each group?). Distribution: The one we should be worried about today is Normal: Characterized by a mean and a standard deviation. Excel will ask you for these. If You give a mean of 0 and a stdev of 1 You will have a standard normal distribution. Right? ATTN: Sometimes the first random number Excel spits out at You will be way way beyond the range of what You expect. If so, delete it. 1) Calculate the x M and s2M of the two groups (male and female). 2) A devious experimenter has decided to make the male group look better by adding 20 points to each male’s score. How will x M and s2M change? (you can answer theoretically or you can try it) 3) In retaliation, another unethical experimental multiplies all the female scores by 1.5. How will x W and s2W change? 4) Okay, using your original untampered samples, line up the two columns of data next to each other. Now imagine that in each row you are looking at the scores of sister-brother pairs. We want to ask -what are the mean and variance of the difference (F-M) between their scores? Do brothers score higher than sisters or vice versa? Is there a lot of variability in the difference between siblings? Create a new variable that contains the difference scores and get the x F-M and s2F-M TASK3 Now we'll be looking a bit more closely to the normal distribution itself. 1a) Looking at your original data for women, figure out what the z-score would be for a woman in your sample who scored in the 75% percentile. Use the "percentile" function (enter the array of your data, and a number between 0 and 1 – in this case, .75. 1b) Do the same for a man in the men's distribution. Use the "normsinv" function to calculate the following for the theoretical population that you pulled your sample from (you will have to feed Excel a probability between 0 and 1 and it will return a zscore): 2a) Tony boasts that he scored in the 75% percentile on YASSHA. What was his z-score? What was his raw score? 2b) Antoinette boasts that she scored in the 75% percentile on YASSHA. What was her z-score? What was her raw score? Use the "normsdist" function to answer the following questions (You will need to feed Excel z-scores and it will give you the cumulative probability of scoring BELOW that z-score) (think about that): 3a) What is the probability that a woman chosen at random scores below 138? 3b) What is the probability that a man chosen at random scores below 138? 3c) What is the probability that a man chosen at random will score below the mean? 3d) What is the probability that a woman chosen at random will score a standard deviation or more below the mean? Using the "percentrank" function (you will have to enter the array of your data and a raw score, and try to ignore "significance"), figure out how your sample compares to the population. 4a) What is the probability that a woman chosen at random from your sample scores below 138? 4b) What is the probability that a man chosen at random from your sample scores below 138? 4c) What is the probability that a man chosen at random from your sample will score below the mean? 4d) What is the probability that a woman chosen at random from your sample will score a standard deviation or more below the mean?