Random Variable Lab

advertisement
Random Variable Lab
Overview
The important thing to do in lab is to play with the random number generators, and do something with
the numbers you generate. The random number generators can be found under the “Tools” menu 
choose “Data Analysis…” and then find “Random Number Generation.” Here I will quote from MS
Excel Help:
“This analysis tool fills a range with independent random numbers drawn from one of several
distributions.”
Random Number Generation
Once you are in the RNG dialogue window, You will have to put in a few pieces of info so that Excel
can generate the kind of randomness you want. Since we're going to be simulating rolling a die, we
will want to simulate a discrete distribution
Number of variables: Enter the number of columns of values you want in the output table (e.g. how
many dice do You want to roll??).
Number of random numbers: Enter the number of data points you want to see (e.g. how many times do
You want to roll each die?).
Distribution: The one we are using today is
a) Discrete: Characterized by a value and the associated probability range (e.g. the value of each side
of the die (1, 2, 3, 4, 5, 6) and the probability of each value (on a fair die, 1/6, 1/6, 1/6, 1/6, 1/6,
1/6)). The range must contain two columns: The left column contains values, and the right column
contains probabilities associated with the value in that row. The sum of the probabilities must be
1. So you’ll enter those two columns before you open the RNG, then when you get to this point,
you’ll highlight those two columns. Remember the left-right thing. You can toss a coin by
entering “H” and “T” in the left column and 1/2s in the right column.
Random Seed: The random seed lets you control, so to speak, the RNG. If you use the same random
seed, then you'll get the same sequence of random numbers every time. If you use a different random
seed, then you'll be sure to get a different sequence of random numbers every time.
Output options:
a) Output range: Enter the reference for the upper-left cell of an output table you want created. MS
Excel automatically determines the size of the output area and displays a message if the output
table will replace existing data.
b) New worksheet ply: Will a new worksheet in the current workbook and put the output there.
c) New workbook: Will create a new workbook and paste the results in the new workbook.
TASK 1
What is the distribution of outcomes for on roll of a fair, six-sided die?
Draw it here, placing the possible outcomes on the X-axis, and
their respective probabilities on the Y-axis.
probability
outcomes
“Roll” a die 10 times. You have just collected one sample. Enter the mean of each sample in the table
below. Now collect 19 more samples.
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Sample 7
Sample 8
Sample 9
Sample 10
Sample 11
Sample 12
Sample 13
Sample 14
Sample 15
Sample 16
Sample 17
Sample 18
Sample 19
Sample 20
We are now going to treat the twenty sample means as our “data.” What is the mean of the sample
means? ___________ What does the distribution of sample means look like? (make a histogram
below). Compare this distribution to the probability distribution you drew above.
frequency
sample mean
TASK 2
On the average, the academic motivation and study habits of female students as a group are better
than those of males. The Young Adults Survey of Study Habits and Attitudes (YASSHA) is a
psychological test that measures these factors. The distribution of YASSHA scores among women at a
college has mean 120 and stdev 28, while the distribution of scores for men has a mean 105 and stdev
35.
Simulate taking a sample of 100 women from the female population and a sample of 100 men from
the male population. (Make sure you use a different random seed for the male and female sample or
else they won’t be truly independent random samples). Once you’ve done this, SAVE.
Return to the Random Number Generation dialogue window, You will have to put in a few
pieces of info so that Excel can generate the kind of randomness You want.
Number of variables: Enter the number of columns of values you want in the output table (e.g.
how many groups do You want to study?).
Number of random numbers: Enter the number of data points you want to see (e.g. how many
cases do You want in each group?).
Distribution: The one we should be worried about today is
Normal: Characterized by a mean and a standard deviation. Excel will ask you for
these. If You give a mean of 0 and a stdev of 1 You will have a standard normal
distribution. Right? ATTN: Sometimes the first random number Excel spits out at
You will be way way beyond the range of what You expect. If so, delete it.
1) Calculate the x M and s2M of the two groups (male and female).
2) A devious experimenter has decided to make the male group look better by adding 20 points to
each male’s score. How will x M and s2M change? (you can answer theoretically or you can try it)
3) In retaliation, another unethical experimental multiplies all the female scores by 1.5. How will x W
and s2W change?
4) Okay, using your original untampered samples, line up the two columns of data next to each other.
Now imagine that in each row you are looking at the scores of sister-brother pairs. We want to ask -what are the mean and variance of the difference (F-M) between their scores? Do brothers score
higher than sisters or vice versa? Is there a lot of variability in the difference between siblings? Create
a new variable that contains the difference scores and get the x F-M and s2F-M
TASK3
Now we'll be looking a bit more closely to the normal distribution itself.
1a) Looking at your original data for women, figure out what the z-score would be for a woman in
your sample who scored in the 75% percentile. Use the "percentile" function (enter the array of your
data, and a number between 0 and 1 – in this case, .75.
1b) Do the same for a man in the men's distribution.
Use the "normsinv" function to calculate the following for the theoretical population that you pulled
your sample from (you will have to feed Excel a probability between 0 and 1 and it will return a zscore):
2a) Tony boasts that he scored in the 75% percentile on YASSHA. What was his z-score? What was
his raw score?
2b) Antoinette boasts that she scored in the 75% percentile on YASSHA. What was her z-score? What
was her raw score?


Use the "normsdist" function to answer the following questions (You will need to feed Excel z-scores
and it will give you the cumulative probability of scoring BELOW that z-score) (think about that):
3a) What is the probability that a woman chosen at random scores below 138?
3b) What is the probability that a man chosen at random scores below 138?
3c) What is the probability that a man chosen at random will score below the mean?
3d) What is the probability that a woman chosen at random will score a standard deviation or more
below the mean?
Using the "percentrank" function (you will have to enter the array of your data and a raw score, and
try to ignore "significance"), figure out how your sample compares to the population.
4a) What is the probability that a woman chosen at random from your sample scores below 138?
4b) What is the probability that a man chosen at random from your sample scores below 138?
4c) What is the probability that a man chosen at random from your sample will score below the mean?
4d) What is the probability that a woman chosen at random from your sample will score a standard
deviation or more below the mean?
Download