Probabilistic Models for “Real-World” Processes

advertisement
Probabilistic Models for “Real-World” Processes
Many physical processes involve a random component – an element that cannot be described exactly by
a deterministic algorithm. A name for such a process is a “random experiment”. The term “experiment”
used here does not necessarily have its usual meaning as a controlled situation in which treatments are randomly assigned to units and reponses are observed, etc. Some examples of what we consider to be “random
experiments” are below. I’m sure you can think of many more...
Examples of Random Experiments
• We try to access a web page and record the time it takes for the webpage to respond. The time varies
depending on how busy the network is, the speed of the connection, and other factors beyond our
control.
• We record the number of car accidents at an intersection. The number of accidents varies depending
on how busy the intersection is, how wreckless the drivers are, and other factors unknown to us.
• The National Highway Traffic Safety Administration (NHTSA) collects detailed data from automobile
crashes on an annual basis. Among other variables, the NHTSA records the whether or not the
occupants were wearing seatbelts properly, the impact speed, the age and gender of the occupants, the
number of fatalities, and whether or not the vehicle was equipped with an airbag. The authors of an
article called “Who Wants Airbags?” (Meyer and Finney, Chance, 2005) analyze the NHTSA data.
Their analysis provides evidence that airbags are associated with a higher chance of death, especially
at low speeds when occupants are not wearing seatbelts.
• The Wall Street Journal tracks the DOW Jones industrial averages. We can not predict the industrial
averages exactly because of the influence of economic factors that we do not completely understand.
• Consumers can send an email to an organization’s fraud box to report a phishing attack. The organization records the number of notifications and the time between notifications. Specific characteristics
of the phishing attacks and the sensitivity of the victims cause the number and the frequency of notifications to vary. A regular pattern in the times of the notifications suggests that the notifications
are not genuine but that an attacker is sending emails to the fraud box to divert the organization’s
attention away from an attack.
• Dr. M. (one of the experimental design gurus in the statistics department) wants to compare a new
procedure to a standard method for manufacturing computer chips, in hope that the new procedure
will lead to fewer defective chips. He randomly assigns 50 sets of parts to one procedure and 50 sets
of parts to the other procedure. He counts the number of defective chips from both methods. (This is
an example of an experiment in the usual sense.) The number of defective chips is subject to sources
of variation besides differences between the two procedures. If Dr. M. repeats the experiment, he may
get a slightly different outcome.
• A survey asks corn farmers how interested they are in selling corn stover (residue from harvesting corn)
to biorefineries for use in biofuels. Each farmer indicates his/her interest level on a 1-5 scale, where a
1 indicates that the farmer is not interested at all and a 5 means that the farmer is very interested.
Even if we identify factors that influence farmers’ interest levels, we can not predict farmers’ responses
exactly because individual preferences that we have not identified cause farmers’ responses to vary.
Statistical analyses of probabilistic models that describe the random components of physical processes
can help us better understand these processes. One way (among many) to define probability and statistics
is as follows:
Probability - mathematical theory for modeling “experiments” where outcomes occur randomly.
Statistics - theory of information that uses data to make inferences about questions of interest, under the
assumption that there is a random component to the process that generated the data.
Because statistical inference makes use of probability models, probability is a foundation for statistics.
Mathematical Models for Random Processes
To construct mathematically coherent probability models, we need a formal framework for talking about
random processes and the elements that comprise random experiments.
Definition:
Probability Experiment- A process with random outcomes.
Examples
1. A message can take two network routers to reach a recipient computer. We record the status of router
1, the status of router 2, and the status of the recipient coputer, where the status is either up (U) or
down (D). We may also record if the message was transmitted successfully (S) or if the transmission
failed (F).
2. Record the time for a webpage to respond.
3. Toss a coin until we get a head.
2
Components of Probability Experiments
• Elementary Outcome (ω) - an outcome of a random process.
Examples
1. Network Routers: ω = (router 1 down, router 2 down, recipient computer up) = DDU
2. Time to access webpage: ω = 3.527 seconds
3. Toss a coin until a head: ω = T T T T H
• Sample Space (Ω) - set of all possible outcomes.
Examples
1. Network Routers:
= {ordered triples of U’s and D’s}
Ω
= {DDD, DDU, DU D, U DD, U U D, U DU, DU U, U U U }
|Ω| =
8 = 23
2. Time to access webpage: Ω = (0, ∞)
3. Toss coin until a head: Ω = {H, T H, T T H, T T T H, . . .}
- Discrete Sample Space - sample space with a finite our countably infinite number of elements.
Examples
1. Network Routers: Discrete
2. Time to access webpage: Not discrete
3. Toss coin until a head: Discrete
- Note that there are usually multiple ways to express the sample space for a particular situation.
Example 3. Toss coin until a head: Ω = {1, 2, 3, . . .} is an equivalent expression for the sample space.
• Event (A) - subset of Ω. (A collection of elementary outcomes).
Example
1. Suppose the message is transmitted successfully if at least one router is up and the recipient
computer is up.
A
=
Successful transmission
= {DU U, U DU, U U U }
2. Time to access a webpage:
B
=
More than 10 seconds
=
(10, ∞)
3. Toss coin until a head:
C
=
first head occurs between 5 and 11 tosses (inclusive)
= {T T T T H, T T T T T H, T T T T T T H, T T T T T T T H, T T T T T T T T H, T T T T T T T T T H, T T T T T T T T T T H}
=
{5, 6, 7, 8, 9, 10, 11}
3
Summary of the Components of a Probability Experiement
Probability Experiments
Real World
Mathematical World
Observation/Experiment with random outcome
Random Experiment
List of all possible outcomes
Sample Space (Ω)
An individual outcome
Elementary Outcome (ω ∈ Ω)
Collection of outcomes
Event A ⊂ Ω
4
Download