Assignment #3~ Biometry - STAT 305 Spring 2014 Conditional Probabilities, Relative Risk, Odds Ratio, Mosaic Plots, Correspondence Analysis, and Baye’s Rule & Medical Screening Tests. 1. Low Birth Weight Risk Factors (Lowbirth.JMP) The purpose of this study was to identify potential risk factors for low birth weight. The following categorical variables were measured: previous history of premature labor (Prev?), hypertension during pregnancy (Hyper?), smoking, uterine irritability during pregnancy (Uterine), and minority status (Minority). a) For each risk factor calculate P(Low|risk factor present) and P(Low|risk factor absent) for each of the FIVE potential risk factors. What do these tell you about each of the potential risk factors? (8 pts.) b) Use your answers in part (a) to calculate the relative risk (RR) associated with each factor and interpret. (4 pts.) c) Calculate the odds ratio (OR) associated with each characteristic. Discuss. (4 pts.) d) Which factor do you think poses the greatest risk of having a child with low birth weight? the least? Explain your answers. (3 pts.) To answer part (d) complete the table shown below. Risk Factor RR OR Rank Smoked During Pregnancy History of Premature Labor Hypertensive During Pregnancy Mother is an Ethnic Minority Uterine Irritability During Pregnancy 2. Myocardial Infarctions and Oral Contraceptive Use (Case-Control Study) CODING USED IN TABLES: Case-Control Status 1 = Case (Myocardial Infarction (MI)) 2 = Control Oral Contraceptive Use? 1 = Yes 2 = No Age Group 1 = 25 - 29 yrs., 2 = 30 – 34 yrs., 3 = 35 – 39 yrs., 4 = 40 – 44 yrs., 5 = 45 – 49 yrs. Overall Table (aggregated across age group) a) What is the OR for myocardial infarctions associated with oral contraceptive use for all women in this study? Use the table above. Interpret the resulting OR. (3 pts.) If we take age of the women into account using the ordinal age group variable defined above, we obtain the following 2 X 2 tables relating MI status and OC use status. Age Specific Tables Age Group = 1 Age Group = 2 ORˆ Age Group = 4 ORˆ Age Group = 5 ORˆ ORˆ Age Group = 3 ORˆ b) For each of the age specific table above calculate the OR for having a myocardial infarction associated with being an oral contraceptive user. (5 pts.) Age Group 1 Age Group 2 Age Group 3 Age Group 4 Age Group 5 ORˆ ORˆ ORˆ ORˆ ORˆ c) When comparing these odds ratios to the overall odds ratio (ignoring age) from part (a) what do we find? How can these results be explained and more importantly what does this say about the risk for having an MI associated with oral contraceptive (OC) use? Hint: Think about how age of the women would be related to oral contraceptive use and with myocardial infarction status. Explain. (4 pts.) 3. HIV ELISA Tests The enzyme-linked immuosorbent assay (ELISA) test was the main test used to screen blood samples for antibodies to the HIV virus (rather than the virus itself) in 1985. It gives a measured mean absorbance ratio for HIV (previously called HTLV) antibodies. The table on the following page gives the absorbance ratio values for 297 healthy blood donors and 88 HIV patients. Healthy donors tend to give low ratios, but some are quite high, partly because the test also responds to some other types of antibody, such a human leucocyte antigen or HLA. HIV patients tend to give high ratios, but a few give lower values because they have not been able to mount a strong immune reaction. To test this in practice we need a cutoff value so that those who fall below the value are deemed to have tested negatively and those above to have tested positively. Any such cutoff will naturally involve misclassifying some people without HIV as having a positive HIV test (which will be a huge emotional shock), and some people with HIV as having a negative HIV test (with consequences to their own health, the health of people around them, and the integrity of the blood bank, etc.). MAR (mean absorb. ratio) <2 2 – 2.99 3 – 3.99 4 – 4.99 5 – 5.99 6 – 11.99 12+ Total Health Donor 202 73 15 3 2 2 0 297 HIV Patients 0 2 7 7 15 36 21 88 a) If we regard a MAR value > 3 as a positive test result for having the HIV virus what are the sensitivity, specificity, false-positive rate, and false-negative rate of the ELISA test? (4 pts.) Interesting fact: The Economist (July 4, 1992) told a story of a young American who committed suicide on learning that he had tested positive for HIV. b) In 1992 the number of Americans who were HIV positive was estimated to be 218,301 out of a population of 252.7 million. Assuming this estimate is correct, what is the probability that a randomly selected American is HIV positive, i.e. P(D+)? (1 pt.) c) Using the estimate from part (b) and provided with no other information about the young American referred to above, what is probability that they actually had HIV given a positive ELISA test result, i.e. find the Positive Predictive Value P(D+|T+)? (3 pts.) d) If the ELISA test for a blood sample is negative what is the probability that the blood sample is actually HIV free, i.e. find the Negative Predictive Value P(D-|T-)? Again use your answer from part (b) doing this calculation. (3 pts.) e) If we changed the MAR cutoff value to > 4 what would happen to the following probabilities in terms of an increase or decrease? You should calculate them and then state whether they have increased or decreased. (6 pts.) Sensitivity Specificity False-negative False-positive Positive-predictive value Negative-predictive value 4. Pottery Fragments in an Archeological Dig Data File: Arch-pottery.JMP in the Biometry JMP folder Key Words: Bar graphs, Mosaic Plots, Conditional Probabilities, Correspondence Analysis Review the JMP tutorial Bivariate Displays for Categorical Data… before beginning this problem. The purpose is to understand the distribution of certain pottery types within seven different archeological dig sites. The variables in the data file are: Site - dig site (P0, P1, ..., P6) Pottery Type - A,B,C, or D Freq - # of pottery type fragments found within each site a) By using the Distribution of Y option examine univariate displays for both Site and Pottery Type. Briefly summarize what is displayed in each. (2 pts.) b) Use the Fit Y by X option to examine the relationship between Site (X) and Pottery Type (Y). Use the mosaic plot and correspondence analysis to examine this relationship. Discuss. (3 pts.) c) Using JMP calculate all of the conditional probabilities of the form: P( Pottery Type | Site) These are easily obtained by simply looking at the Row %’s found in the contingency table. Use these probabilities to compare and contrast the sites in terms of the pottery types found in each. Discuss them briefly. How do these probabilities relate to the results from your graphical analysis in part (c). (3 pts.)