Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control Evaluation of a Laboratory Diagnostic Procedure for Mycoplasma pneumoniae • We believe that serum levels of the immunoglobulin M antibody may have diagnostic significance for identification of Mycoplasma pneumoniae. • First thing we need to know is if people who have the pneumonia show higher serum levels of the antibody. Experimental Design • We will select two groups of subjects – Experimental Group: Persons with clinically defined pneumonia – Control Group: Asymptomatic cases • We will draw serum samples from each person and evaluate the serum level of immunoglobulin M antibody in each sample. Step 1: State the Null and Alternative Hypotheses • H0: The mean serum level for the experimental group will not be different from the mean serum level for the control group (no difference/ nothing is happening) • Ha: The mean serum level for the experimental group will be different from the mean serum level for the control group (there is a real difference/ something is happening) Select Statistical Test and Specify the Region of Rejection • We will use a t-test for two independent samples • We will have 20 people in each group (degrees of freedom = 38) • We will reject the null hypothesis if the probability of it being true is less that 5 chances in 100 (alpha = .05) Conduct Experiment and Collect Data Serum Levels of IgM Data Table Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Mean SD Control Group Experimental Group 59 57 69 30 34 68 64 27 77 62 60 47 61 83 82 62 57 57 56 56 97 75 78 85 60 75 87 87 78 93 63 76 49 80 51 76 65 66 66 65 58.67 75.60 17.50 14.28 Compute the Test Statistic Group Statistics Group IgM Serum Level Control Group Experimental Group N Mean Std. Deviation Std. Error Mean 20 58.4 15.06966 3.369679 20 73.6 12.94685 2.895005 Compute the Test Statistic Independent Samples Test for IgM Serum Level Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference F Equal variances assumed Equal variances not assumed Sig. 0.00077 0.978001 t Sig. (2tailed) df Mean Difference Std. Error Difference Lower Upper -3.4215 38 0.001503 -15.2 4.442498 -24.1934 -6.20663 -3.4215 37.15644 0.001529 -15.2 4.442498 -24.2001 -6.19992 Accept or Reject H0: • As seen in the previous table, the probability that these two means are samples from the same population (that the difference is zero) is • p = .001503 • That is less than our chosen alpha = .05 • Reject the Null hypothesis. • Conclude that the experimental group has significantly higher serum levels of IgM Effectiveness of a Program to Increase Seatbelt Use Among High School Seniors • We have developed a program for use with High School seniors to increase seatbelt use and wish to determine if the program is effective. Experimental Design • The school has a separate parking lot of seniors. There is only one entrance and the students must swipe their ID to enter or leave the lot. A security camera positioned at the entrance photographs every driver as they enter and exit. This system has been in place for a couple of years. • Students and their parents will sign a release granting permission to participate in the study. • Two weeks later, unannounced, we will begin reviewing the security camera data and recording the drivers ID and if he/she was wearing a seatbelt. • We will record for 2 weeks before the program is presented. (Pretest) • All seniors will then complete the course and accompanying workbook. • Then we will record for another two weeks. (Posttest) • Each student who regularly drives to school during the period (must drive at least 3 days a week during both pretest and posttest) will become subjects in the experiment. • Subjects score will be the percent of time they were wearing a seatbelt when they exited the gate – Number of times wearing seatbelt/number of times exiting * 100 • We will have a pretest score and a posttest score for each person. Step 1: State the Null and Alternative Hypotheses • H0: The mean percent seatbelt usage on the posttest will not be different from the mean percent seatbelt usage on the pretest. (The program did nothing, nothing happened). • Ha: The mean percent seatbelt usage on the posttest will be different from the mean percent seatbelt usage on the pretest. (The program changed the seatbelt usage, it did something.) Select Statistical Test and Specify the Region of Rejection • We will use a t-test for paired samples – Paired samples = repeated measures = matched samples = pretest posttest • We will reject the null hypothesis if the probability that it could be true is less than 5 chances in 100, ie: • Alpha = .05 • In this case we don’t know in advance how many subjects we will get so we can’t specify the degrees of freedom until after we finish data collection. That’s OK as long as you specify alpha. Conduct Experiment and Collect Data Percent Seatbelt Use Data Table Subject Pretest 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Mean SD Posttest 100 11 0 100 41 0 0 40 8 55 100 0 71 39 54 43 24 23 21 14 100 0 28 82 8 100 100 50 100 86 100 35 0 38 100 66 50 48 47 48 37.20 59.30 33.94 35.15 Compute the Test Statistic Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pretest 37.2 20 33.9374 7.588634 Posttest 59.3 20 35.15395 7.860662 Compute the Test Statistic Paired Samples Test 95% Confidence Interval of the Difference Paired Differences Std. Std. Error Mean Deviation Mean Lower Pair 1 Pretest Posttest -22.1 42.37663 9.475703 -41.9329 Upper -2.26713 t- test t Sig.(2tailed) df -2.33228 19 0.030838 Accept or Reject H0: • As seen in the previous table, the probability that these two means are samples from the same population (that the difference is zero) is • p = .030838 • That is less than our chosen alpha = .05 • Reject the Null hypothesis. • Conclude that the Posttest mean is significantly higher than the Pretest mean. The program significantly increased seatbelt usage among our Highschool Seniors.