Probabilistic Models for “Real-World” Processes Many physical processes involve a random component – an element that cannot be described exactly by a deterministic algorithm. A name for such a process is a “random experiment”. The term “experiment” used here does not necessarily have its usual meaning as a controlled situation in which treatments are randomly assigned to units and reponses are observed, etc. Some examples of what we consider to be “random experiments” are below. I’m sure you can think of many more... Examples of Random Experiments • We try to access a web page and record the time it takes for the webpage to respond. The time varies depending on how busy the network is, the speed of the connection, and other factors beyond our control. • Consumers can send an email to an organization’s fraud box to report a phishing attack. The organization records the number of notifications and the time between notifications. Specific characteristics of the phishing attacks and the sensitivity of the victims cause the number and the frequency of notifications to vary. A regular pattern in the times of the notifications suggests that the notifications are not genuine but that an attacker is sending emails to the fraud box to divert the organization’s attention away from an attack. • Dr. M. (one of the experimental design gurus in the statistics department) wants to compare a new procedure to a standard method for manufacturing computer chips, in hope that the new procedure will lead to fewer defective chips. He randomly assigns 50 sets of parts to one procedure and 50 sets of parts to the other procedure. He counts the number of defective chips from both methods. (This is an example of an experiment in the usual sense.) The number of defective chips is subject to sources of variation besides differences between the two procedures. If Dr. M. repeats the experiment, he may get a slightly different outcome. • A survey asks corn farmers how interested they are in selling corn stover (residue from harvesting corn) to biorefineries for use in biofuels. Each farmer indicates his/her interest level on a 1-5 scale, where a 1 indicates that the farmer is not interested at all and a 5 means that the farmer is very interested. Even if we identify factors that influence farmers’ interest levels, we can not predict farmers’ responses exactly because individual preferences that we have not identified cause farmers’ responses to vary. Statistical analyses of probabilistic models that describe the random components of physical processes can help us better understand these processes. One way (among many) to define probability and statistics is as follows: Probability - mathematical theory for modeling “experiments” where outcomes occur randomly. Statistics - theory of information that uses data to make inferences about questions of interest, under the assumption that there is a random component to the process that generated the data. Because statistical inference makes use of probability models, probability is a foundation for statistics. Mathematical Models for Random Processes To construct mathematically coherent probability models, we need a formal framework for talking about random processes and the elements that comprise random experiments. Definition: Probability Experiment - A process with random outcomes. Examples: 1. A message can take two network routers to get to a recipient computer. We record the status of router 1, the status of router 2, and the status of the recipient coputer, where the status is either up (U) or down (D). We may also record if the message was transmitted successfully (S) or if the transmission failed (F). 2. Record the time for a webpage to respond. 3. Toss a coin until we get a head. Components of Probability Experiments • Elementary Outcome (ω) - an outcome of a random process. Examples 1. Network Routers: ω = (router 1 down, router 2 down, recipient computer up) = DDU 2. Time to access webpage: ω = 3.527 seconds 3. Toss a coin until a head: ω = T T T T H • Sample Space (Ω) - set of all possible outcomes. Examples 1. Network Routers: Ω = {ordered triples of U’s and D’s} = {DDD, DDU, DU D, U DD, U U D, U DU, DU U, U U U } |Ω| = 8 = 23 2. Time to access webpage: Ω = (0, ∞) 3. Toss coin until a head: Ω = {H, T H, T T H, T T T H, . . .} - Discrete Sample Space - sample space with a finite our countably infinite number of elements. Examples 1. Network Routers: Discrete 2. Time to access webpage: Not discrete 3. Toss coin until a head: Discrete - Note that there are usually multiple ways to express the sample space for a particular situation. Example 3. Toss coin until a head: Ω = {1, 2, 3, . . .} is an equivalent expression for the sample space Ω = {H, T H, T T H, T T T H, . . .}. • Event (A) - subset of Ω. (A collection of elementary outcomes). Example 1. Suppose the message is transmitted successfully if at least one router is up and the recipient computer is up. A = Successful transmission = {DDU, U DU, U U U } 2. Time to access a webpage: B = More than 10 sectonds = (10, ∞) 3. Toss coin until a head: C = first head occurs between 5 and 11 tosses (inclusive) = {T T T T H, T T T T T H, T T T T T T H, T T T T T T T H, T T T T T T T T H, T T T T T T T T T H, T T T T T T T T T T H} = {5, 6, 7, 8, 9, 10, 11} Summary of the Components of a Probability Experiement Probability Experiments Real World Mathematical World Observation/Experiment with random outcome Random Experiment List of all possible outcomes Sample Space (Ω) An individual outcome Elementary Outcome (ω ∈ Ω) Collection of outcomes Event A ⊂ Ω