Data Analysis Project Math 1040 Jazz Wilkey Part I I selected the exhale study for my data analysis project. I have to admit I have no idea what this project is about. I have less information than anyone who will be reading this report. As I determine what the overall problem is within my data set as I manage through the data I will bring it to light. Part II I created two different samples from within my population based on age and how many subjects were in each age out of the total population of the 655 subjects surveyed. My samples are of a random selection of 50 subjects. I have created pie charts and Pareto charts for each of the samples. Similarities between the two samples as seen on the pie chart and the pareto is that they both have a fairly negative linear regression line for the age, going from oldest to youngest and a positive linear regression line for the cumulative percentage from 0 to 100%. Differences show in the 1st sample a fairly steep initial drop in how many individuals in the older ages then tapers off and gets a bit flat as there are more individuals with in each age group. The cumulative percentage has a very slight curve from 0 to 100. In sample 2 there is a more distinct “X” pattern. The number of individuals in each age group is more even or has a closer amount of individuals in each group. The percentage is also more linear and almost a straight line from 0-100%. Both of the pie charts seem to be fairly evenly distributed though compared to the Pareto charts. Part III I created a frequency histogram, and boxplot for the population as well as both samples. I also computed the population mean is 9.93 and population standard deviation is 2.95. For sample 1 the mean and standard deviation are 10.06 and 3.23. For sample 2 the mean and standard deviation is 3.84 and 10.61. I used computer software to compute these values on an excel spread sheet. Comparing these values the samples seem to be very similar. There is only .13 difference between the population mean and sample 1 mean and .91 difference between the population mean and sample 2 mean. With the standard deviations there is a .28 difference with sample 1 and the population sd, and .89 difference between sample 2 and the population. In all three data sets the histograms all show a normal distribution curve unlike the charts that were made in part 1. Part 4 The confidence intervals (CI) were calculated as follows in order for the population, sample 1 and sample 2. The population CI was (9.75, 10.11); sample 1CI was (9.16, 10.96); and sample 2 CI was (9.78, 11.90). These were computed at a 95% confidence level. In all instances the intervals captured the parameters. The margin of errors for the population is .2261; sample 1 is .8953; and sample 2 is 1.0644. Part 5 With a level of significance of .05 hypothesis tests were done for the population proportion and the mean. For the proportion test H0:p=.06 and H1:p doesn’t =.06. H0 (null hypothesis) is the value that you are testing and H1 is the opposite of the tested variable or null hypothesis. The Z-score was computed to 42.94; p-value = 0 and p-hat= .4587. In this test we would fail to reject the null hypothesis because the pvalue is less than the significant level of .05. The conclusion is there is not significant evidence to show that H0 is incorrect. In the hypothesis test for the mean for H0:m = 8 and H1:m doesn’t = 8. The tscore was 16.73; the pvalue is 1.47 E^-52. This tells us that we need to reject H0. The data backs this up, and our decision would be to reject H0. There is sufficient evidence supporting that H0 is not 8. Part 6 In summary of this project I think it was difficult as we didn’t know exactly what our parameters were. This has lead to a lot of frustration with fellow classmates. We all struggled. There is a lot of math that can be used from this in my real world. I have already over the years used a lot of the ideas behind the math. In science classes you use or have to interpret a large amount of data or statistics and be able to explain the information or understand where it came from. In Animal Genetic you use a lot of population parameters and need to understand the standard deviations that shift the population distribution curve one way or another or narrow or widening of there of. In most science based careers statistical methods are used in everything. Again being able to understand where data came from and how to manipulate it is a big concern. In doing research of any kind but in my fields involving animals, genetics and actual population of either animals, bacteria the method of creating a hypothesis for an experiment and going through the steps to determine if your hypothesis was correct or incorrect is extremely important. Although I can say that I doubt I will get very heavy into doing the calculations myself I do know that I need to understand what the data tells me. I need to understand how that data was computed and where it goes from there. This will help me greatly in my science based career. It will also help me stand out among the many of applicants as I look for a job. This project helps you get to know excel very quickly. I have always been a proficient user of excel but I have never needed to do the types of calculations that were necessary on this project.