Understanding true probability, model estimates, and experimental estimates True probability is the (almost always) unknown actual probability that an event will occur in a given situation. The actual or “true” probability of a particular coin landing heads up may be affected by the asymmetry of the two faces of the coin, a flaw in its manufacture etc, so may not be exactly 0.5. However, the model (theoretical) probability of a fair coin landing heads of 0.5 could be considered a good model estimate of the “true” probability. We can also find out about the unknown true probability by observation (experiment) through determining the proportion of heads in a large number of tosses, and using this proportion as an estimate of the “true” probability. In probability, an experiment is one or more trials of a probability situation. An experimental estimate of an event occurring is calculated from observation as the number of successful trials divided by the total number of trials when the number of trials is sufficiently large. In the long run (over many trials), the experimental estimate may approach the true probability and may approach the model probability if it can be determined and if it is a good model of the situation (for example, symmetry of a die, or scenario has binomial distribution characteristics). An experimental estimate that a coin will land heads if it is tossed 20 times and lands heads up 14 times is 14/20 = 0.7. A probability model is a representation of a situation involving probability. Probability models can incorporate experimental estimates and assumptions about the situation (for example, independence). These assumptions may be based on an idealised view of the world or an understanding of the mathematics of probability, and prior knowledge (for example, recognising the scenario could be modelled by the Poisson distribution). A model estimate is an estimate of the probability that an event will occur, based on a probability model. The model estimate of a fair coin landing heads is 0.5. If a probability model is a good representation of the situation, the experimental estimate of an event occurring over many trials will be close to the model estimate. A model must always be considered in context. A good model is one which is fit for the purpose for which it is being used. When tossing an approximately fair coin, the model estimate of P(heads) = 0.5 is a good model for most purposes. A transportation engineer wishing to set up the timing of traffic lights so that traffic flows smoothly will require a more complex model, tested against experimental observations to ensure that it is fit for the purpose. In some situations there is no obvious probability or theoretical model, so we can only estimate the probabilities and probability distributions via experiment. These estimates can then be used as a basis for building a probability or theoretical model. For instance, to develop a model of the probability of getting a basketball through the hoop, an initial model might assume a constant probability of 0.5. As data are gathered, there could be successive refinements of the model so that it becomes a better estimate of the true probability. The data might indicate that the probability of getting the ball in the hoop is closer to 0.2 and that it changes over time. Sometimes we might think that an obvious probability or theoretical model applies, but experimental estimates demonstrate that our model is a poor one. There is now a need to find a better model using the estimates from the experiments. We might initially model the result of spinning a coin as P(heads) = 0.5 but realise that that estimate is a poor one and use data to improve it. The P(heads) = 0.5 idea is based on the assumption that the 2 outcomes are equally likely using the physical symmetry of the coin and prior knowledge about tossing a coin. Notes: 1. Many books and teachers refer to “the” probability of an event. We need to be clear what is “the” probability we are referring to. Is it the model (theoretical) probability or the experimental probability? 2. When doing probability experiments we need to be clear that we are determining the experimental estimate of the probability of an event not “the” probability of the event. 3. Probability is a difficult philosophical issue and many books have been written about how it could be viewed. The above view is derived from a probability modelling perspective. Some examples of true probability, model estimates, and experimental estimates Example 1 What is the probability that the next baby born in New Zealand will be a boy? The true probability that the next baby will be a boy is unknown. There is no theoretical model to base our probability on. We can develop an initial model estimate of the probability a baby is a boy, based on our prior knowledge (our hunch). This might be P(boy) = 0.5, or might use knowledge from other sources, so might be P(boy)= 0.525 (international data from Statistics NZ). If we can get some experimental data, we can use it to estimate the probability of a baby being a boy, compare this estimate to our initial model probability and develop a better model probability. Experimental data in probability can be any results of observation of the situation. We have some data which was collected from National Women’s hospital in Auckland in the 1990s. Is the data going to be useful? It is old, only from Auckland, and not randomly selected. The data was collected by a hospital, which is likely to be a reliable source of data. The sample was large. The proportion of male children born is unlikely to have changed since the 1990s or to be different in Auckland than in the rest of NZ. We can decide that this data will be useful as the basis of an experimental estimate of the probability of a baby being a boy. Deterministic and probabilistic models A deterministic model does not include elements of randomness. Every time you run the model with the same initial conditions you will get the same results. Most simple mathematical models of everyday situations are deterministic, for example, the height (h) in metres of an apple dropped from a hot air balloon at 300m could be modelled by h = - 5t2 + 300, where t is the time in seconds since the apple was dropped. Simple statistical statements, which do not mention or consider variation, could be viewed as deterministic models. The linear regression equation in a bivariate analysis could be applied as a deterministic model if, for example, lean body mass = 0.8737(body weight) - 0.6627 is used to determine the lean body mass of an elite athlete. A probabilistic model includes elements of randomness. Every time you run the model, you are likely to get different results, even with the same initial conditions. A probabilistic model is one which incorporates some aspect of random variation. Deterministic models and probabilistic models for the same situation can give very different results. Consider a very simple model of a cash machine. Customers arrive to use the machine every two minutes on average. Customers take 2 minutes to use the machine on average. What is the probability that a customer has to wait 3 minutes or more? A deterministic model of the situation just uses the average gap between customers and the average time of usage, and assumes these have no variation, that is, all gaps are 2 min, and all usage times are 2 min. The model assumes that someone arrives exactly every two minutes and uses the machine for exactly two minutes, so there is never any waiting time. The distribution of waiting times is that all waiting times are zero minutes. A simple probabilistic model of the same situation might keep the time of use at the machine as 2 minutes for each person, but include random arrival times. One way to include randomness in the model is to do a simulation. We can simulate 15 random arrival times in a 30 minute period, for example, 2 4 5 5 10 11 12 15 16 19 20 24 29 29 29. In the table below, the customers are represented by the letters a to o, arriving to either use the machine or wait until the machine is free.