Business Statistics for Managerial Decision Producing Data Producing Data Numerical data are the raw material for sound conclusions Executives, investors, and managers want to base their decisions on data rather than relying on subjective impressions. Statistics is concerned with producing data as well as with interpreting already available data. Observational versus Experimental Studies An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. An experiment deliberately imposes some treatment on individuals to observe their responses. Observational versus Experiment For example; To answer the question: We want to know what percent of American adults agree that the economy is getting better? we interview American adults. We can’t afford to ask all adults, so we put the question to a sample chosen to represent the entire adult population Sample surveys are one kind of observational study. Observational versus Experiment To answer the question: Which TV ad will sell more toothpaste? We show each ad to a separate group of consumers and note whether they buy the tooth paste. Experiments, like samples provide useful data only when properly designed. Population, Sample The population in a statistical study is the entire group of individuals about which we want information. A sample is part of the population from which we actually collect information, used to draw conclusions about the whole. Simple Random Sample A simple random sample(SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. Random Digits A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these two properties: 1. 2. Each entry in the table is equally likely to be any of the 10 digits 0 through 9. The entries are independent of each other. That is knowledge of one part of the table gives no information about any other part. Table of Random Numbers (Table B) of the textbook is an example of a random digits table. This table may be used to draw random samples. Choosing a SRS Choose a SRS in two steps: 1. 2. Label, assign a numerical label to every individual in the population. Use table of random numbers to select labels at random. Stratified Random Sample To select a stratified random sample: 1. 2. first divide the population into groups of similar individuals, called strata. Choose a separate SRS in each stratum and combine these SRS to form the full sample. A stratified design can produce more exact information than an SRS of the same size by taking advantage of the fact that individuals in the same stratum are similar to one another. Under-Coverage and Non-Response Under-coverage occurs when some groups in the population are left out of the process of choosing the sample. Non-response occurs when an individual chosen for the sample can’t be contacted or refuses to cooperate. Designing Experiments The individuals studied in an experiment are often called subjects, especially if they are people. The explanatory variables in an experiment are often called factors. A treatment is any specific experimental condition applied to the subjects. If an experiment has several factors, a treatment is a combination of each of the factors. Example A chemical engineer is designing the production process for a new product. The chemical reaction that produces the product may have higher or lower yield, depending on the temperature and the stirring rate in the vessel in which the reaction takes place. The engineer decides to investigate the effects of combination of two temperatures (50°C and 60°C) and three stirring rates (60 rpm, 90 rpm, and 120 rpm) on the yield of the process. She will produce two batches of the product at each combination of temperature and stirring rate. Example What are the individuals and the response variable in this experiment? How many factors are there? How many treatments? How many individuals are required for the experiment? Designing Experiments Completely Randomized Design In a completely randomized experimental design, all the subjects are allocated at random among all the treatments Principles of Experimental Design 1. 2. 3. Control the effect of the lurking variables on the response, most simply by comparing two or more treatments Randomize- use impersonal chance to assign subjects to treatments Replicate each treatment to enough subjects to reduce chance variation in the results. Example Many utility companies have introduced programs to encourage energy conservation among their costumers. An electric company considers placing electronic indicators in households to show what the cost would be if the electricity use at the moment continued for a month. Would indicators reduce electricity use? Would cheaper methods work almost as well? Example One cheaper approach is to give customers a chart and information about monitoring their electricity use. The experiment compares these two approaches (indicator, chart) and also a control. The control group of the customers receives information about energy conservation but no help in monitoring electricity use. The company finds 60 single-family residences in the same city willing to participate. Example Designing Experiments Matched pair Design Compares just two treatments. Choose pair of subjects that are as closely matched as possible. Assign one of the treatment to each subject randomly. Sometimes each “pair in a matched pairs design is one subject. In this model each subject serves as his or her own control. Designing Experiments Block Design A block is a group of subjects that are known before the experiment to be similar in some way expected to affect the response to the treatments. In a block design, the random assignment of individuals to treatment is carried out separately within each block. Statistical Inference A market research firm interviews a random sample of 2500 adults. Results: 66% find shopping for cloths frustrating and time consuming. That is the truth about the 2500 people in the sample. What is the truth about almost 210 million American adults who make up the population? Since the sample was chosen at random, it is reasonable to think that these 2500 people represent the entire population pretty well. Statistical Inference Therefore, the market researchers turn the fact that 66% of sample find shopping frustrating into an estimate that about 66% of all adults feel this way. Using a fact about a sample to estimate the truth about the whole population is called statistical inference. To think about inference, we must keep straight whether a number describes a sample or a population. Parameters and Statistics A parameter is a number that describes the population. A parameter is a fixed number, but in practice we do not know its value. A statistic is a number that describes a sample. The value of a statistic is known when we have taken a sample, but it can change from sample to sample. We often use statistic to estimate an unknown parameter. Example A public opinion poll in Ohio wants to determine whether registered voters in the state approve of a measure to ban smoking in all public areas. They select a simple random sample of 50 registered voters from each county in the state and ask whether they approve or disapprove of the measure. The proportion of registered voters in the state who approve of banning smoking in public areas is an example of (parameter, or statistic) Example A survey conducted by the marketing department of Black Flag asked whether the purchasers of a new type of roach disk found it effective in killing roaches. Seventy-nine percent of the respondents agreed that the roach disk was effective. The number 79% is a (parameter, or statistic) Example In the marketing research example, the survey asked a nationwide random sample of 2500 adults if they agreed or disagreed that “ I like buying new cloths, but shopping is often frustrating and time consuming.” Of the respondents, 1650 said they agreed. The proportion of the sample who agreed that cloths shopping is often frustrating is: 1650 Pˆ .66 66% 2500 Example The number P̂ = .66 is a statistic. The corresponding parameter is the proportion (call it P) of all adult U.S. residents who would have said “agree” if asked the same question. We don’t know the value of parameter P, so we use P̂ as its estimate. Sampling Variability, Sampling Distribution If the marketing firm took a second random sample of 2500 adults, the new sample would have different people in it. It is almost certain that there would not be exactly 1650 positive responses. That is, the value of P̂ will vary from sample to sample. Random samples eliminate bias from the act of choosing a sample, but they can still be wrong because of the variability that results when we choose at random. Sampling Variability, Sampling Distribution The first advantage of choosing at random is that it eliminates bias. The second advantage is that if we take lots of random samples of the same size from the same population, the variation from sample to sample will follow a predictable pattern. All statistical inference is based on one idea: to see how trustworthy a procedure is, ask what would happen if we repeated it many times. Sampling Variability, Sampling Distribution Suppose that exactly 60% of adults find shopping for cloths frustrating and time consuming. That is, the truth about the population is that P = 0.6. What if we select an SRS of size 100 from this population and use the sample proportion P̂ to estimate the unknown value of the population proportion P? Sampling Variability, Sampling Distribution To answer this question: Take a large number of samples of size 100 from this population. Calculate the sample proportion P̂ for each sample. Make a histogram of the values of P̂ . Examine the distribution displayed in the histogram for shape, center, and spread, as well as outliers or other deviations. Sampling Variability, Sampling Distribution We can not afford to actually take many samples from a large population such as all adult U.S. residents. We can imitate many samples by using random digits. Using random digits from a table or computer software to imitate chance behavior is called simulation. Sampling Variability, Sampling Distribution The result of many SRS have a regular pattern. Here we draw 1000 SRS of size 100 from the same population. The histogram shows the distribution of the 1000 sample proportions P̂ Sampling Distribution The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. Sampling Distribution The distribution of sample proportionsP̂ for 1000 SRS of size 2500 drawn from the same population as in previous figure. The two histograms have the same scale The statistic from larger sample is less variable. Bias and Variability Bias concerns the center of the sampling distribution. A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. The variability of a statistic is described by the spread of its sampling distribution. This spread is determines by sampling design and the sample size n Statistics from larger samples have smaller spread (variability). Bias and Variability Managing Bias and Variability To reduce Bias: use random sampling. When we start with a list of the entire population, simple random sampling produces unbiased estimates The value of a statistic computed from an SRS neither consistently overestimate nor consistently underestimate the value of the population parameter. To reduce variability of a statistics from an SRS Use a larger sample. You can make the variability as small as you want by taking a large enough sample. Sampling Distribution of sample mean The sampling distribution of sample mean X for 1000 SRSs of size 10 from the domestic gross sales (millions of dollars) of all movies released in The U.S. in the 1990s. X Sampling Distribution of sample mean The distribution of sample means X for 1000 SRSs of size 100 from the domestic gross sales (millions of dollars) of all movies released in the U.S. in the 1990s. Note the change in variability and shape of the distribution. X Probability and sampling distribution What is the mean income of households in the United States? The Bureau of Labor Statistics contacted a random sample of 55,000 households in March 2001 for the current population survey. The mean income of the 55,000 households for the year 2000 was X $57,045. $57,045 is a statistic that describes the CPS (Current Population Survey) sample households. Probability and sampling distribution We use sample mean to estimate an unknown parameter, the mean income of all 106 million American households. We know that X would take several different values if the Bureau of Labor Statistics had taken several samples in March 2001. We also know that this sampling variability follows a regular pattern that can tell us how accurate the sample result is likely to be. That pattern obeys the laws of probability. Probability and sampling distribution To estimate a population parameter we have information from one and only one sample. Therefore we would like to know what is the probability that the sample summary value (Statistics) is equal to a certain value?