Data Analysis Project

advertisement
Data Analysis Project
Math 1040
Jazz Wilkey
Part I
I selected the exhale study for my data analysis project. I have to admit I have no idea what this
project is about. I have less information than anyone who will be reading this report. As I determine
what the overall problem is within my data set as I manage through the data I will bring it to light.
Part II
I created two different samples from within my population based on age and how many subjects
were in each age out of the total population of the 655 subjects surveyed. My samples are of a random
selection of 50 subjects. I have created pie charts and Pareto charts for each of the samples.
Similarities between the two samples as seen on the pie chart and the pareto is that they both have a
fairly negative linear regression line for the age, going from oldest to youngest and a positive linear
regression line for the cumulative percentage from 0 to 100%.
Differences show in the 1st sample a fairly steep initial drop in how many individuals in the older
ages then tapers off and gets a bit flat as there are more individuals with in each age group. The
cumulative percentage has a very slight curve from 0 to 100. In sample 2 there is a more distinct “X”
pattern. The number of individuals in each age group is more even or has a closer amount of individuals
in each group. The percentage is also more linear and almost a straight line from 0-100%. Both of the
pie charts seem to be fairly evenly distributed though compared to the Pareto charts.
Part III
I created a frequency histogram, and boxplot for the population as well as both samples. I also
computed the population mean is 9.93 and population standard deviation is 2.95. For sample 1 the
mean and standard deviation are 10.06 and 3.23. For sample 2 the mean and standard deviation is 3.84
and 10.61. I used computer software to compute these values on an excel spread sheet. Comparing
these values the samples seem to be very similar. There is only .13 difference between the population
mean and sample 1 mean and .91 difference between the population mean and sample 2 mean. With
the standard deviations there is a .28 difference with sample 1 and the population sd, and .89 difference
between sample 2 and the population. In all three data sets the histograms all show a normal
distribution curve unlike the charts that were made in part 1.
Part 4
The confidence intervals (CI) were calculated as follows in order for the population, sample 1
and sample 2. The population CI was (9.75, 10.11); sample 1CI was (9.16, 10.96); and sample 2 CI was
(9.78, 11.90). These were computed at a 95% confidence level. In all instances the intervals captured
the parameters. The margin of errors for the population is .2261; sample 1 is .8953; and sample 2 is
1.0644.
Part 5
With a level of significance of .05 hypothesis tests were done for the population proportion and
the mean. For the proportion test H0:p=.06 and H1:p doesn’t =.06. H0 (null hypothesis) is the value that
you are testing and H1 is the opposite of the tested variable or null hypothesis. The Z-score was
computed to 42.94; p-value = 0 and p-hat= .4587. In this test we would fail to reject the null hypothesis
because the pvalue is less than the significant level of .05. The conclusion is there is not significant
evidence to show that H0 is incorrect.
In the hypothesis test for the mean for H0:m = 8 and H1:m doesn’t = 8. The tscore was 16.73; the
pvalue is 1.47 E^-52. This tells us that we need to reject H0. The data backs this up, and our decision
would be to reject H0. There is sufficient evidence supporting that H0 is not 8.
Part 6
In summary of this project I think it was difficult as we didn’t know exactly what our parameters
were. This has lead to a lot of frustration with fellow classmates. We all struggled. There is a lot of
math that can be used from this in my real world. I have already over the years used a lot of the ideas
behind the math. In science classes you use or have to interpret a large amount of data or statistics and
be able to explain the information or understand where it came from.
In Animal Genetic you use a lot of population parameters and need to understand the standard
deviations that shift the population distribution curve one way or another or narrow or widening of
there of. In most science based careers statistical methods are used in everything. Again being able to
understand where data came from and how to manipulate it is a big concern.
In doing research of any kind but in my fields involving animals, genetics and actual population
of either animals, bacteria the method of creating a hypothesis for an experiment and going through the
steps to determine if your hypothesis was correct or incorrect is extremely important.
Although I can say that I doubt I will get very heavy into doing the calculations myself I do know
that I need to understand what the data tells me. I need to understand how that data was computed
and where it goes from there. This will help me greatly in my science based career. It will also help me
stand out among the many of applicants as I look for a job. This project helps you get to know excel very
quickly. I have always been a proficient user of excel but I have never needed to do the types of
calculations that were necessary on this project.
Download