Statistics 402C May 5, 2004 Final Exam Name: INSTRUCTIONS: Read the questions carefully and completely. Answer the questions and show work in the space provided. This is the only work that I will look at. Partial credit will not be given if work is not shown. Be sure to answer all questions within the context of the problem. Refer to the computer printout and graphs provided when appropriate. Pace yourself, do not spend too much time on any one problem. Point values for each problem are given. 1. [40 pts] Traffic engineers are interested in the effect of erecting signs that say “Accident Reduction Project Area” and metering the flow of vehicles onto freeway on-ramps on the average traffic speed. Twenty similar freeway interchanges are chosen. Each interchange has a traffic light at the on-ramp. The interchanges are spread widely around a single large metropolitan area in Southern California. Ten of the intersections are chosen at random to get “Accident Reduction Project Area” signs; the other ten get no signs. The traffic lights can be turned off (no minimum time between vehicles) or set to require 3 or 6 seconds between entering vehicles. Average traffic speed during “rush hour” will be measured at each interchange on three consecutive Tuesdays in June. At each interchange, the three settings of traffic lights are assigned at random to the three Tuesdays. Refer to the JMP output entitled Accident Reduction Project. (a) [4] What are the treatments, experimental units and response? (b) [6] Explain why this is a split-plot design. Be sure to mention what the whole-plot factor and the sub-plot factor are. 1 (c) [6] Below are main effect plots of average speed. Describe the apparent effects of each factor. (d) [4] Below is an interaction plot. Is there an apparent interaction between the two factors? Be sure to indicate what you see in the plot that supports your answer. 2 (e) [5] Does the average speed of traffic differ significantly for interchanges with signs when compared to those without signs? Support your answer statistically. (f) [5] Is their a significant interaction between timing and signs? Support your answer statistically. (g) [5] Is there a significant difference in average speed of traffic for the three traffic light timings? Support your answer statistically. (h) [5] Construct an adjLSD (use t associated with 98% confidence) for comparing the three traffic light timings. Which, if any, traffic light timings are significantly different? 3 2. [40 pts] A study was done involving a large number of family doctors. Each doctor was given a “fable” about a female patient less than 18 years old. After hearing the “fable” the doctor was asked whether she/he would keep patient confidentiality and not inform the patient’s parents. There were 16 different “fables” constructed by factorial crossing of four factors. The percentage of doctors who would keep patient confidentiality and not inform the patient’s parents for each treatment combination is recorded. Factor Low Level (−1) High Level (+1) M: Maturity of patient immature for age mature for age L: Length of time doctor has known family less than 1 year more than 5 years A: Age of patient 14 years 17 years C: Complaint drug problem venereal disease (a) [4] What are the response, conditions (treatments) and units in this experiment? (b) [8] Below is a plot of the factor level means. Describe the effect of each of the four factors. 4 (c) [5] Below are the estimated full effects and a normal plot of those effects. Identify and label on the normal plot the estimated full effects that appear to be significant. Effect M L A C Estimate 0.1369 −0.0181 0.1549 0.2364 SS Effect 0.0749 ML 0.0013 MA 0.0959 MC 0.2235 LA LC AC Estimate −0.0269 −0.0264 −0.0214 0.0326 0.0351 0.0316 SS Effect 0.0029 MLA 0.0028 MLC 0.0018 MAC 0.0043 LAC 0.0049 MLAC 0.0040 Estimate 0.0034 −0.0056 −0.0286 −0.0221 0.0016 SS 0.0000 0.0001 0.0033 0.0020 0.0000 (d) [6] Use the 3- and 4-way interaction terms to compute a substitute estimate of error variability. Be sure to indicate how many degrees of freedom are associated with this estimate. 5 (e) [6] It appears that factor L: Length of time doctor has known family is not important either by itself or with any other factors. Given that factor L is no different from error, explain HOW you could compute an estimate of error variability. The estimate must be different from the one in d). How many degrees of freedom are associated with this estimate? DO NOT COMPUTE THE ESTIMATE OF ERROR VARIABILITY. (f) [6] Below is the analysis of the data using just factors M, A and C. M SError =0.00194 Source M A MA C MC AC MCA df 1 1 1 1 1 1 1 Sum of Squares 0.0749 0.0959 0.0028 0.2235 0.0018 0.0040 0.0033 F-Ratio Prob > F 38.6 0.0003 49.4 0.0001 1.4 0.2656 115.1 <0.0001 0.9 0.3604 2.1 0.2891 1.7 0.2301 According to this analysis, what factor(s) and/or interaction(s) are statistically significant? Be sure to support your answers by referring to the analysis. 6 (g) [5] Using only those terms that are statistically significant and the fact that the overall mean response is 0.660, give the prediction equation. Use it to predict the percentage of family doctors who would keep confidentiality for a 17 year old female patient with a drug problem. This patient appears immature for her age and the doctor has known the family for less than one year. 3. [24 pts] Name that design! For each of the following scenarios indicate what design is used. Indicate the factors of interest, nuisance factors and provide an ANOVA table listing all sources of variation and associated degrees of freedom. (a) [8] A company that cuts and freezes french fries wants to know which machine of the four they own produces the most waste when cutting the fries. The four fry cutters, and their operators, are constantly in use. Different operators may produce different amounts of waste. Each day a new load of potatoes is used and there will be day to day variation in waste due to size and shape of potatoes in a days load. Each operator will operate each machine once. Each operator will operate a different machine each day. 7 (b) [8] An ornithologist is interested in the time it takes red-shouldered hawks to respond to calls of other birds that may invade their territory. The type of forest; old growth or new growth may have an effect. Also the type of intruding bird may have an effect. There may also be an interaction between type of forest and type of bird. Two forests, one old growth and one new growth, are used. In each forest, ten nests are chosen at random from known nesting sites. At each nest, two pre-recorded calls are played over a loudspeaker. One call is a red-shouldered hawk call, the other is a great horned owl call. The calls are played several days apart and the order is randomized for each nest. The response is the time until the nesting hawks leave the nest to drive off the intruder. (c) [8] A study is performed to investigate the effect of depth of planting (2 levels) and date of planting (3 levels) on corn yields. The study is performed using 6 plots near Nevada, IA, 6 plots near Clear Lake, IA, 6 plots near New Ulm, MN and 6 plots near Decorah, IA. The six combinations of depth and date will be randomly assigned to plots at each location. 8 4. [24] For each of the following situations give the response, conditions of interest and units. Explain what design you would use to accomplish the purpose stated. (a) [8] The investigator wishes to see if smoking a marijuana cigarette changes people’s heart rate. (b) [8] In order to track migration, butterflies will be marked and released. The placement of the mark may affect how attractive the butterflies are to predators. The investigator wishes to see if the placement of a mark on the wing of different butterfly species affects the chances of successfully migrating. There are six locations on the wing and two species of butterfly. 9 (c) [8] Do students from different colleges score differently on multiple choice and problem solving statistics tests. Students from Engineering, Business, Agriculture and Liberal Arts and Sciences Colleges will participate. Each student will take both a multiple choice test and a problem solving test over the same material in an introductory statistics course. The order of the test, multiple choice or problem solving first, will be randomized for each student. 5. [7 pts] A friend comes to you for advice on an experiment she is going to conduct with pigs. There are two types of pigs she can use. In addition, there is a second factor involving the amount of antibiotic they will be given in their food (none, low level and high level). She will factorially cross type of pig and amount of antibiotic. She does not have to pay for the antibiotic as it is being provided to her by the manufacturer. (a) [3] If she wants to be able to detect a two standard deviation difference in treatment means with alpha of 0.05 and beta of 0.10, how many pigs of each type does she need? (b) [4] When you tell her the number, she says that she can get that many of the first type of pig but twice that many of the second type. How many pigs should she use in her experiment? Explain briefly 10 6. [15 pts] The three fundamental principles of a well designed experiment are control of outside variables, randomization and replication. (a) [5] Explain why control of outside variables is important. (b) [5] Explain why blocking is important and how it differs from control of outside variables. (c) [5] Explain why randomization is important and how it can help when there is an outside variable that you did not control. You may pick up your corrected final exam and a copy of the critique of your project at my office starting Monday, May 10. If you cannot pick them up and would like them sent to you, write the address where you would like these things sent in the space below. 11