Stat 512: Lecture 1 A brief Review of Inference for 2 sample means and some Design Vocabulary Since the idea of the class is ANALYIS of DESIGNED EXPERIMENTS I would like to focus on the three key words. 1. Analysis 2. Design 3. Experiment I am going to assume that you ALL have a working knowledge of the following: 1. Random Variable 2. Mean and Variance 3. Calculating mean and variance 4. Expectation 5. The Normal Distribution 6. Looking up tables for Normal, t, chi-square and F I would like us all to design the following simple experiments for me: 1. Interested in seeing if there is a difference in average prices between IGA and Safeway. 2. Interested in seeing if there is a difference in the mean GRE Quantitative Scores between males and females among students in the 512 class. Study 1: DESIGNING THE STUDY Here we probably want to pick 10 items (say) at random that are available in each store and get the prices from both stores: Example of a possible Layout: item Price Price at at IGA Safeway 2%Milk Yukon Potatoes 1lb Walla Walla Onions 1lb Fresh Express Salad 1lb Tyson Whole Chicken 2lb Mott Apple Juice 1lt Ritz Crackers 2lb Oreo Cookies 1 lb Atlantic Salmon 1lb Dannon Yogurt 1 lb So here the data are paired and we would perform a PAIRED t test. (Pairing on each item) How to conduct the ANALYSIS: Hypothesis: H0: d H1: d To find the test statistic we need to first take differences (price at IGA– price at SAFEWAY). Based on the differences calculate d (mean of the differences) and sd (standard deviation of the differences). Test statistic (d d ) is t . sd n Reject if obs | t| > t(, n-1) Design 2: DESIGNING THE STUDY Here we randomly pick 5 males and 5 females from the class and get their GRE Quant Scores. Here there is no pairing done among the males and females. Layout: Males GRE Score Females GRE Score This is not paired and so we consider the data independent. Analyzing the DATA: Hypothesis: 0 1 We will also be given (or we can calculate) the sample means x1 , x 2 , standard deviation s1, s2 and the sample sizes n1, n2. RECALL: HERE we assume that both the populations have equal variance. Calculate, sp2 = (n1-1)s12 + (n2-1)s22 ------------------------(n1+n2-2) This is the “pooled” variance. Then sp is the “pooled” standard deviation. Define the pooled t-statistic as follows. ( y1 y 2 ) (1 2 ) t 1 1 sp n1 n2 This follows a t distribution with (n1+n2-2) degrees of freedom. Reject H0 in favor of 1 if observed |t| >t(, n1+n2-2) This is essentially a review for all of you. But I want you to think of things from a DESIGN perspective now. 1. WHY did we randomly select the items or students? 2. WHY did we use more than one item in our study? 3. WHAT advantage does pairing in STUDY 1 give us? The answers to these questions lead to the basic tenets of experimental design as suggested by Sir RA Fisher. 1. Randomization 2. Replication 3. Local Control Randomization: This is the procedure of selecting units at random from available units or assigning units to treatments at random. This reduces bias. Replication: This means using more than one unit for a treatment for comparison. This establishes experimental error and reduces bias. Local Control: It’s the process of stratifying the units to homogenous groups or blocks and assigning treatments at random within the homogenous group. This reduces bias and reduces experimental error. In ANY Design context think of these three tenets. Some Definitions and Vocabulary in the context of DESIGN: 1. Factor, Levels, Treatment: Factor: any substance or item whose effect on the data is to be studied. An experiment involving two or more treatment is called factorial experiment. Levels: values of the factor used in the experiment. The levels of a factor are the specific types or amounts of the factor that will actually be used in the experiment. For example, in an experiment to assess the effects of different amounts of UV radiation upon the growth rate of smolt, the UV radiation was held at normal, 1/2 normal, and 1/5 normal levels. These would the three levels for this factor UV Radiation and we could call them TREATMENTS. 2. UNIT: Experimental Unit: the unit to which the treatment is applied. Observational unit (or Measurement unit): the unit on which the response is measured. In some cases, the observational unit may be different from the experimental unit - be careful! CAUTION: A common mistake in the analysis of experimental data is to confuse the experimental and observational unit. For example, consider an experiment to investigate the effects of UV levels on the growth of smolt. Two tanks are prepared; one tank has high levels of UV light, the second tank has no UV light. Many fish are placed in each tank. The individual fish are measured. In this experiment, the observational unit is the smolt, but the experimental unit is the tank. The treatments are NOT individually administered to single fish. 3. Block: A homogenous group of units is a block. 4. Replicate: The multiple units used in the experiment is the replicate. 5. Response Variable: The outcome that is being measured. For example, in an experiment to measure smolt growth in response to UV levels, the response variable for each smolt could be final weight after 30 days. 6. Experimental error is the variation among identically treated experimental units. Terminology Types of Studies Comparative experimental studies are experiments in which the treatments or conditions are assigned by the researcher to the experimental units. Comparative observational studies are experiments in which the treatments or conditions are observed by the researcher on the experimental units Examples: Let’s figure out the following for our two studies Study 1: Comparing prices at IGA and Safeway 1. Factor 2. Level 3. Treatment 4. Response Variable 5. Block 6. Replicate 7. Whether it’s an experiment or an observational study Study 2: Comparing prices males and females GRE Quant Score 1. Factor 2. Level 3. Treatment 4. Response Variable 5. Block 6. Replicate 7. Whether it’s an experiment or an observational study a. An agricultural experimental station is going to test two varieties of wheat. Each variety will be planted on 3 fields, and the yield from the field will be measured. 1. Factor 2. Level 3. Treatment 4. Response Variable 5. Block 6. Replicate 7. Whether it’s an experiment or an observational study b. An agricultural experimental station is going to test two varieties of wheat. Each variety will be tested with two types of fertilizers. Each combination will be applied to two plots of land. The yield will be measured for each plot. 1. Factor 2. Level 3. Treatment 4. Response Variable 5. Block 6. Replicate 7. Whether it’s an experiment or an observational study c. Fish farmers want to study the effect of an anti-bacterial drug on the amount of bacteria in fish gills. The drug is administered at three dose levels (none, 20, and 40 mg/100L). Each dose is administered to a large controlled tank through the filtration system. Each tank has 100 fish. At the end of the experiment, the fish are killed, and the amount of bacteria in the gills of each fish is measured. 1. Factor 2. Level 3. Treatment 4. Response Variable 5. Block 6. Replicate 7. Whether it’s an experiment or an observational study