Statistics Term Project (1)

advertisement
Statistics Term Project
My first numerical variable is age. The unit of measurement
for this variable is years . A few possible values for this first numerical
variable are 15, 30, and 65.
My second numerical variable is number of states/countries lived in. The unit of
measurement
for this variable is number of states/countries. A few possible values for this second
numerical variable are 1, 3, and
8.
My research question is “Is age related to the number of states or countries lived?”
(first variable)
(second variable)
To answer this research question, I would gather data as follows:
I will go to a few public places with a wide range of age groups available ie: Liberty
Park, Coffee Break, and Tower Theater, and Trader Joes
I will ask any patrons that I have the time and opportunity to survey. I will ask any
person that I happen to come by, as I go from one part of the store/park to the other.
My sample will be forty people.
The Purpose of this study will be to find the correlation between age and the
number of states or countries people have lived in. I expect to see a rise in the number
of states or countries lived in, the older the person sampled is. I also expect to see the
graph level off after a certain age around the mid-forties.
I gathered the data for this study by going to Trader Joes and asked the survey
question to ten patrons who were shopping around the store and one worker, as they
were walking in and out of the store. I went on to Liberty Park and gathered data as I
walked from one end of the park to the other, I asked people as they were jogging,
walking, and biking. It was getting a little darker at this point, and there were hardly any
children. Out of all the people I asked at the park, all twenty people answered. Some
people looked very uncomfortable answering the questions, until I told them that I only
needed their first name and that the data collected was for a school project they
seemed willing to help. As I walked back to my car I passed by a large crowd at the
tower theater and asked the people outside my survey question. I asked seven people
the survey question and also two people walked away as I tried to ask. I also went to
coffee break and got a small sample of eight people I asked the workers and the few
customers there at the time. Overall, I was surprised by how willing most people were to
participate in the survey. Gathering the data was surprisingly much less intimidating
than I thought it would be. Most of the people also wanted to tell me a story behind all
the places they have lived, it was really fun to talk to these people. The population I tried
to cover by these places were the general population of Salt Lake City dwellers. I chose
the locations I did because they seemed to be places where I could find very diverse
age groups and different lifestyles.
1. Statistics for your first quantitative variable organized in a table: mean, standard
deviation, five-number summary, range, mode, outliers
1a.) Which would be a better measure of center for this variable (mean or
median)?
I will use the MEAN to best measure the center of this variable because there were
no outliers and it seems to have a symmetrical distribution.
Summary statistics:
Column
Age
Mean
Std. Dev. Median Range Min Max Q1 Q3 Mode Outliers
33.68182 16.957912
28.5
72
2.Histogram for the first quantitative variable
4
76 24 47
26
none
3. Box-plot for the first quantitative variable.
4.Statistics for the second quantitative variable organized in a table: mean, standard
deviation, five number summary, range, mode, outliers
4a.) Which would be a better measure of center for this variable (mean or
median)?
Summary statistics:
Column
# of States/Countries Lived
Mean
Std. Dev. Median Range Min Max Q1 Q3 Mode Outliers
3.1818182 2.6960154
2.5
13
1
14
2
3
3
5. Histogram for the second quantitative variable
6. Box plot for the second quantitative variable
7. Statistics for testing the correlation between the two variables: linear correlation
coefficient (USE R and Not R2) and equation for line of regression.
Simple linear regression results:
Dependent Variable: var3
Independent Variable: var2
var3 = 1.9900125 + 0.03538424 var2
Sample size: 44
R (correlation coefficient) = 0.2226
Estimate of error standard deviation: 2.659499
Parameter estimates:
Parameter
Estimate
Std. Err.
Intercept
1.9900125
0.8998044
Slope
Alternative DF
0.03538424 0.023916256
Model
SS
MS
F-stat
1 15.482214 15.482214 2.188938
P-Value
≠ 0 42 2.2116058
0.0325
≠ 0 42
0.1465
Analysis of variance table for regression model:
Source DF
T-Stat
P-value
0.1465
1.479506
Error
42 297.06323
Total
43 312.54544
7.072934
8. Scatter plot that includes line of
regression
9. The equation for the line or regression.
Statistical Analysis:
For the X variable in my study (Age) I used the mean as the center of the
distribution histogram I created looked symmetrical and there were no outliers, so there
would be no significant changes in the data. For the Y variable in this study (#of
states/countries lived) I used the median to best describe the center of my data set. The
histogram I created showed that there was not a symmetrical or normal distribution and
the median works for any shaped graph. There were also a number of outliers in the Y
variable data set so using the mean would throw the true center off by a large margin.
The median is a resistant center of the variable because large outliers do not
significantly change the data. When I ran the computations for the correlation coefficient
I got the number R (correlation coefficient)= 0.2226 which means that the two variable X
and Y have a WEAK/POSITIVE relationship. When X increases Y also slightly
increases.
10. Interpretation and Conclusions: discuss how you would answer the original question.
My original question was: “Is Age related to the number of States of Countries
lived?” Meaning, does the number of states or countries you’ve lived in increase with
age? My survey showed that there was a weak but positive correlation between the two
variables. So it suggests that with age there may be an increase of the number of states
you’ve lived in. Although there were a number of outliers, I believe with a larger
population the correlation will increase slightly. The problem I saw with this survey is
that after a certain point just because you are ninety years old will not mean that you will
have lived in 50 states/countries. Many people tend to settle down and find a home at
some point in their lives so there will not always be a linear line to best interpret the data.
It may have a curve to the line.
I was honestly dreading the gathering data part of this survey. I actually did this
twice. The first time I used a system of every “nth” person, this was not a good way to
collect data because it automatically eliminated a portion of my population and was not
appropriate to collect a random sample. The second time around I asked everybody I
came across and anybody that would answer my questions. I covered a lot of ground
and I feel that I got a better representation of the community ie: Salt Lake. What I wish I
had done differently was to go to the park at an earlier time when there were more
children because I don’t feel that I got a big enough sample of the younger population. I
ended up really having fun the second time around and people were really willing to
help me out. I was surprised at how easy most people were to approach, although I did
have a few people who avoided me like the plague. It was an eye opening experience
and most people wanted to tell me stories about their travels. For the most part I ended
up getting into a l;ot of conversations with the people who participated in my survey. I
was pleasantly surprised by the kindness of the people in Salt Lake.
Download