Chad Taylor Math 1040: Statistics Professor Alia Maw Term Project Part 1 1.) For my Term Project, I have selected Data Set 5: IQ and Lead Exposure. These data come from the study: “Neuropsychological Dysfunction in Children with Chronic Low-Level Lead Absorption,” by P. J. Landrigan, R. H. Whitworth, R. W. Baloh, N. W. Staehling, W. F. Barthel, and B. F. Rosenblum, Lancet, Vol. 1, Issue 7909. 2.) The data which I selected were from a study done to compare Intelligence Quotients (IQ) with the subjects’ blood lead levels from individuals (children) living near a lead smelter over the course of two years. Various other variables from each subject were taken from the subjects including age (in years),and gender (represented as 1 and 2 for males and females respectively). IQ scores were taken and noted on the standard IQ scale, and blood lead levels were measured in micrograms per 100 ml of blood once a year for two consecutive years. These measurements are noted as YEAR1 and YEAR2 for each of the two years. Exactly how all the data was extracted is unsure, although interviews, IQ tests, and blood extraction were most likely utilized to procure said data. At first glance, no noticeable correlations exist between IQ and blood lead levels, but this can be discussed and reproved if necessary post-analysis. 3.) variable name in describe what the variable means/how it is the data measured (include units) set is the variable what is the level quantitative or of measurement qualitative? for this variable? Qualitative. Ordinal Quantitative. Ratio Qualitative. Nominal Quantitative. Ratio Quantitative. Ratio Given on a scale of 1, 2, or 3 representing low medium and high blood lead levels, respectively. Low lead represents ˂ 40 μg Pb/ Lead 100 ml Blood for both years. Medium Lead represents ≥ 40 μg Pb/ 100 ml Blood in one of two years. High Lead represents ≥ 40 μg Pb/ 100 ml blood in both years. Age Sex Year 1 Year 2 Age of subject in years Gender, represented as 1 or 2 for male and female respectively. Blood lead levels for first year of observation (μg Pb/ 100 ml) Blood lead levels for second year of observation (μg Pb/ 100 ml) IQV Verbal IQ Score Quantitative. Ratio IQP Performance IQ Score Quantitative. Ratio IQF Full IQ Score Quantitative. Ratio Term Project Part 2: Individual Portion 1.) Figure 1: Randomly Selected Individuals' Genders Figure 2: 35 Randomly Selected Individuals' Genders Figure 3: Systematically Selected Individuals' Genders (Every 2nd Subject) Figure 4: Systematically Selected Individuals' Genders (Every 2nd Subject) 2.) Obtaining the data for question two of the assignment was rather easy. We used a systematic approach which involves selecting the data from every nth individual from the original set, where n is an arbitrarily selected value by the person conducting the study (in this case us.) We decided to select every 2nd person from the list. In order to do this, we first numbered every individual from 1 to 121 (as there were 121 participants.) We then threw out the data from every oddly numbered individual to leave us with only the evens. We did this only to the point at which we had 35 data, per the assignment regulations. These 35 systematically selected data points were then used in our graphs. 3.) As seen in the graphs above, when utilizing either 35 randomly selected individuals or 35 systematically selected individuals, our results were changed dramatically. In one there were far more greater males, and in the other, females. This is likely due to the small sample size of only 35 individuals, and the fact that subjects were merely giving one answer out of two options, male or female. The difference between the two selection methods would likely be less noticeable given a larger sample and more complex question. 4.) In this case, the method of systematic selection of data was far more accurate to the population frequencies than was the random selection. Notice that the systematic selection gave us 60% males while the whole population was approximately 61.16% males. I believe this to be merely a coincidence, since the random selection just as easily could have granted the same ratio given a larger sample size. So all in all, I’ve learned that the larger the sample size, the more likely your results are to be accurate. Figure 5: Systematically Selected Individuals' Genders (Every 2nd Subject) Individual Term Project Part 3 1.) Summary Statistics: Sample Set Random Systematic 2.) Mean Std. dev. Min Q1 Median Q3 Max 34.6 13.229201 2 24 36 44 59 32.085714 10.407496 2 27 32 40 53 Randomly Selected Systematically Selected 3.) The distribution in each of these two samples vary significantly from each other as well as from the population. Although both the random and systematic selections follow a normal distribution for the most part, increasing toward the middle, there are stark contrasts to one another. For instance, their maximum number of frequencies in the middle vary greatly as to how many people on average had that midrange level of lead exposure, despite the fact that both samples contained but 35 individuals. In comparison to the population, both selected samples seemed to have IQR’s significantly lower suggesting that, in these two samples, the lead levels were generally higher. The differences in median values is astonishingly unique to each of the sample methods used. Random Year 1 Row I.D. 107 118 120 84 114 121 46 76 8 24 98 93 16 95 14 68 88 116 1 32 20 39 70 60 96 73 34 19 79 35 Random Year 1 Lead Value 49 51 44 41 40 42 27 18 24 19 51 40 29 44 36 2 44 48 25 20 21 24 33 10 45 35 36 34 42 24 Systematic Year 1 Row I.D. (3’s) 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 Year 1 (Systematic) 30 29 24 29 30 28 35 19 22 32 2 38 24 36 33 24 29 28 27 10 34 23 32 35 34 36 40 41 51 43 10 108 11 100 111 31 52 21 50 59 93 96 99 102 105 40 45 42 53 45 Individual Term Project Part 4 1.) The interpretation for each of the confidence intervals is relatively the same, although for posterity’s sake, I will proceed to type them all out. a. Our first confidence interval states that: given the sample proportion of men to women taken from our “population” we are 95% confident that the population parameter of proportion will fall into the confidence interval which we calculated (theoretically, omitting any calculating errors.) b. Our second confidence interval states that: given the sample lead-blood levels taken from our “population” we are 95% confident that the population parameter of mean (μ) will fall into the confidence interval (30.06 μg Pb/100 ml, 39.14 μg Pb/100 ml) c. Our third confidence interval states that: given the sample lead-blood levels taken from our “population” we are 95% confident that the population parameter of standard deviation (σ) will fall into the confidence interval (11.26, 18.83). 2.) a. Unfortunately, there was a slight decimal error which ensued in the calculation of our first interval for proportion of men to women. Before I could mention and correct this slight error, the group paper had already been submitted. Nevertheless, I calculated the interval to be (.33, .64) which just barely encompasses the true proportion of men to women. This value, taken from the total population was 74/121 or .612. b. Our second confidence interval was calculated accurately, since the mean of the population was calculated from the entire data set to be 34.6 which wonderfully falls into our interval of (30.06 μg Pb/100 ml, 39.14 μg Pb/100 ml). c. Finally, our third confidence interval designed for standard deviation, which was (11.26, 18.83), completely enshrouded and included standard deviation taken from the entire population which was found to be 13.36 using StatCrunch. Term Project Part 5: Individual Portion 1.) For this portion of the term project, I have selected a value of α= 5% or .05. This corresponds to a 95% confidence interval, and seems to be a very common confidence level to select. 2.) In a randomly selected sample of 35 individuals from part 2 of this assignment, we found that the proportion of men was 45.71%. This seems a little low to me, as I would expect the proportion of men to women to be 61.16% (from the population). My claim is that: the proportion p of the population will be equal to p=0.6116. Therefore: Ho: p=0.6116 H1: p≠0.6116 I used my TI-83 calculator to compute both the z and p values, although for posterity’s sake, here is the calculation for the test statistic z: z= (p^ - p)/(√((pq)/n) = (.4571-.6116)/ √((0.6116*0.3884)/35)= -2.56 Z= -2.56 and p=.0106. We therefore reject Ho. Since p is less than a, we can conclude that there is not sufficient evidence to support the claim that the proportion of men in the population is 0.6116. 3.) From part 3 of this assignment we took random individual lead/blood levels from 35 randomly selected individuals. We calculated the mean of lead in the blood of the individuals was 34.6 μg Pb/100 ml blood. The standard deviation in these 35 individuals was 13.229 μg Pb/100 ml blood. The mean calculated from the population of all participating individuals was 34.603 μg Pb/100 ml blood. My claim is that the population mean based off of the sample mean will equal μ=34.603 μg Pb/100 ml blood. Therefore: Ho: μ=34.603 H1: μ≠34.603 I used my TI-83 calculator to compute both the T and p values, although for posterity’s sake, here is the calculation for the test statistic T: T= (xbar – μx)/(s/√n) = (34.6-34.603)/(13.229/√35) = -0.0013. T= -0.0013 and p=.9989. We therefore fail to reject Ho. Since p is much greater that a, we can conclude that there is not sufficient evidence to warrant rejection of the claim that the population mean: μ=34.603. a. Although hypothesis testing can often be accurate and helpful, we have read and been taught that they are not always 100% accurate. We see this phenomenon in the proportion hypothesis test in question 2. In this case we actually knew the population proportion, but the simple random sample which we took simply did not accurately represent the population. In fact, it was wildly off by about 20%. This was a fantastic example of when-hypothesis-testing-goes-wrong. b. On the other hand, the hypothesis test for the population mean went much better according to plan and design. Our random sample almost exactly matched that of the population mean, making the calculations very extreme (low T value, and very high p value.) I felt that this example, in the shadow of the last, was a great example of how hypothesis testing can be very reassuring when determining if the population parameter can be estimated using a sample statistic. All in all, I learned a great deal in this assignment, and it really helped me piece together module 8 in one flowing idea. Term Project Part 6: Reflection I am very glad that I have the opportunity at the end of this project to express my feelings and thoughts concerning the quality and helpfulness of this assignment. I think I learned many things from this project, especially how to work with other people. Oddly enough I think, besides the content of the assignment itself, the most important thing I learned is that working with peers can be difficult. Some people make promises to accomplish something, others promise nothing. Some can meet your standards of quality and hard work, and others can’t. I found a delicate balance between delegation and selfmotivation that held our group together and allowed us to finish assignments on time. This was a great learning experience for me that will help me, not only in my academic career, but my future career as a surgeon as well. Unfortunately, I made the decision of taking Genetics last semester, before I had a firm grasp on Statistic concepts which I learned this summer. Since genetics mostly revolves around statistics and chances of events happening, it would have been very helpful to understand the actual meaning of critical values, p-values, regression coefficient values, and how they all can be analyzed together to create a larger picture. Lastly, I believe this project greatly changed the way in which I view statistics, and its importance in real world situations. I used to think that statistics simply meant “what is the likelihood that _____ will happen?” whereas now I see deeper applications. For instance, my group did IQ vs. blood lead values. Not only could you test to see if there is a correlation between them, but also calculate theoretical values, and potentially change health and safety codes to harbor a more intelligent populace. All in all I found this project very helpful to learn and apply the statistics concepts which we strove to learn all summer. Thank you for your organization and dedication, and sorry for going over one page; I just had so much to say!