Understanding true probability, model estimates, and experimental

advertisement
Understanding true probability, model estimates,
and experimental estimates
True probability is the (almost always) unknown actual probability that an event will occur in a given
situation. The actual or “true” probability of a particular coin landing heads up may be affected by the
asymmetry of the two faces of the coin, a flaw in its manufacture etc, so may not be exactly 0.5. However,
the model (theoretical) probability of a fair coin landing heads of 0.5 could be considered a good model
estimate of the “true” probability. We can also find out about the unknown true probability by observation
(experiment) through determining the proportion of heads in a large number of tosses, and using this
proportion as an estimate of the “true” probability.
In probability, an experiment is one or more trials of a probability situation. An experimental estimate of
an event occurring is calculated from observation as the number of successful trials divided by the total
number of trials when the number of trials is sufficiently large. In the long run (over many trials), the
experimental estimate may approach the true probability and may approach the model probability if it can be
determined and if it is a good model of the situation (for example, symmetry of a die, or scenario has
binomial distribution characteristics). An experimental estimate that a coin will land heads if it is tossed 20
times and lands heads up 14 times is 14/20 = 0.7.
A probability model is a representation of a situation involving probability. Probability models can
incorporate experimental estimates and assumptions about the situation (for example, independence). These
assumptions may be based on an idealised view of the world or an understanding of the mathematics of
probability, and prior knowledge (for example, recognising the scenario could be modelled by the Poisson
distribution).
A model estimate is an estimate of the probability that an event will occur, based on a probability model.
The model estimate of a fair coin landing heads is 0.5. If a probability model is a good representation of the
situation, the experimental estimate of an event occurring over many trials will be close to the model
estimate. A model must always be considered in context. A good model is one which is fit for the purpose
for which it is being used. When tossing an approximately fair coin, the model estimate of P(heads) = 0.5 is
a good model for most purposes. A transportation engineer wishing to set up the timing of traffic lights so
that traffic flows smoothly will require a more complex model, tested against experimental observations to
ensure that it is fit for the purpose.
In some situations there is no obvious probability or theoretical model, so we can only estimate the
probabilities and probability distributions via experiment. These estimates can then be used as a basis for
building a probability or theoretical model. For instance, to develop a model of the probability of getting a
basketball through the hoop, an initial model might assume a constant probability of 0.5. As data are
gathered, there could be successive refinements of the model so that it becomes a better estimate of the true
probability. The data might indicate that the probability of getting the ball in the hoop is closer to 0.2 and
that it changes over time.
Sometimes we might think that an obvious probability or theoretical model applies, but experimental
estimates demonstrate that our model is a poor one. There is now a need to find a better model using the
estimates from the experiments. We might initially model the result of spinning a coin as P(heads) = 0.5 but
realise that that estimate is a poor one and use data to improve it. The P(heads) = 0.5 idea is based on the
assumption that the 2 outcomes are equally likely using the physical symmetry of the coin and prior
knowledge about tossing a coin.
Notes:
1. Many books and teachers refer to “the” probability of an event. We need to be clear what is “the”
probability we are referring to. Is it the model (theoretical) probability or the experimental
probability?
2. When doing probability experiments we need to be clear that we are determining the experimental
estimate of the probability of an event not “the” probability of the event.
3. Probability is a difficult philosophical issue and many books have been written about how it could be
viewed. The above view is derived from a probability modelling perspective.
Some examples of true probability, model estimates, and experimental estimates
Example 1
What is the probability that the next baby born in New Zealand will be a boy? The true probability that the
next baby will be a boy is unknown. There is no theoretical model to base our probability on.
We can develop an initial model estimate of the probability a baby is a boy, based on our prior knowledge
(our hunch). This might be P(boy) = 0.5, or might use knowledge from other sources, so might be P(boy)=
0.525 (international data from Statistics NZ).
If we can get some experimental data, we can use it to estimate the probability of a baby being a boy,
compare this estimate to our initial model probability and develop a better model probability. Experimental
data in probability can be any results of observation of the situation. We have some data which was
collected from National Women’s hospital in Auckland in the 1990s. Is the data going to be useful? It is old,
only from Auckland, and not randomly selected. The data was collected by a hospital, which is likely to be a
reliable source of data. The sample was large. The proportion of male children born is unlikely to have
changed since the 1990s or to be different in Auckland than in the rest of NZ. We can decide that this data
will be useful as the basis of an experimental estimate of the probability of a baby being a boy.
Deterministic and probabilistic models
A deterministic model does not include elements of randomness. Every time you run the model with the
same initial conditions you will get the same results.
Most simple mathematical models of everyday situations are deterministic, for example, the height (h) in
metres of an apple dropped from a hot air balloon at 300m could be modelled by h = - 5t2 + 300, where t is
the time in seconds since the apple was dropped.
Simple statistical statements, which do not mention or consider variation, could be viewed as deterministic
models. The linear regression equation in a bivariate analysis could be applied as a deterministic model if,
for example, lean body mass = 0.8737(body weight) - 0.6627 is used to determine the lean body mass of an
elite athlete.
A probabilistic model includes elements of randomness. Every time you run the model, you are likely to get
different results, even with the same initial conditions. A probabilistic model is one which incorporates some
aspect of random variation.
Deterministic models and probabilistic models for the same situation can give very different results.
Consider a very simple model of a cash machine. Customers arrive to use the machine every two minutes on
average. Customers take 2 minutes to use the machine on average. What is the probability that a customer
has to wait 3 minutes or more?
A deterministic model of the situation just uses the average gap between customers and the average time of
usage, and assumes these have no variation, that is, all gaps are 2 min, and all usage times are 2 min. The
model assumes that someone arrives exactly every two minutes and uses the machine for exactly two
minutes, so there is never any waiting time. The distribution of waiting times is that all waiting times are
zero minutes.
A simple probabilistic model of the same situation might keep the time of use at the machine as 2 minutes
for each person, but include random arrival times. One way to include randomness in the model is to do a
simulation. We can simulate 15 random arrival times in a 30 minute period, for example, 2 4 5 5 10 11 12 15
16 19 20 24 29 29 29. In the table below, the customers are represented by the letters a to o, arriving to
either use the machine or wait until the machine is free.
Download