Correlation of height and shoe size

advertisement
C ORRELATION OF HEIGHT
AND SHOE SIZE
By Rebecca Dick & Tyler Dunyon
Math 1040 Final Project
12/13/2012
12/13/2012
Correlation of height and shoe size
By Rebecca Dick & Tyler Dunyon
Introduction:
There are many claims that a person’s shoe size is directly related to their height.
Usually, a person who is taller is figured to have a larger shoe size than that of a person who
is shorter and figured to have a smaller shoe size.
Statement of Task:
The main purpose of this investigation and report is to determine and present the
question: Is there a correlation between the height of a person and their shoe size?
Hypothesis:
Our hypothesis is that there is in fact a direct correlation between the height of a
person and their shoe size. A persons shoe size will be dependent upon their measured
height; concluding that the independent variable will be height, and the dependent variable
will be the shoe size.
Plan of Investigation:
Data was collected from 40 students and/or teachers. The two types of data collected
from the 40 individuals were their height, and then their shoe size. In order to get the most
accurate calculation, only people who were over the age of sixteen were measured; younger
people would still have time to grow taller causing a fault in the data. The independent
variable for the set of data was the individual’s height. The data will be presented in a
variety of ways and then tested to see if the linear model assumption is valid. The dependent
variable is the individuals shoe size; this will test to see if their shoe size is dependent on
their height. The presented data for the investigation was retrieved from:
http://www.slideshare.net/deloti/correlation-of-height-and-shoe-size?from=share_email.
1
12/13/2012
Collected Data:
Below is a table of the collected data for the investigation.
# of Students
Height
(Inches)
Shoe Size
# of Students
Height
Shoe Size
(Standard
(Inches)
(Standard
American
American
units)
Units)
1
153
5
21
170
8.5
2
154
6
22
171
9
3
154
6
23
173
10
4
155
6
24
174
8
5
158
5
25
174
10
6
159
7
26
174
9
7
160
6
27
175
12
8
161
5
28
175
11
9
163
6
29
176
9
10
164
7
30
177
10
11
165
7
31
178
11
12
165
6
32
178
11
13
165
7
33
178
12
14
166
10
34
179
10.5
15
167
9.5
35
179
11.5
16
167
10
36
179
11
17
168
10
37
180
13
18
168
9
38
180
12
19
170
10.5
39
183
12.5
20
170
9.5
40
185
13
TABLE: 1.1: Table 1.1 Displays the data that was collected from each number of individuals.
Their height and shoe size was appropriately documented.
Analysis and Summary of the Data:
Below will be several tables and charts that display the data organized and
summarized in a variety of ways. By analyzing the data in different ways, we are able to
determine shape, outliers, mean, and variability in the data. Ultimately, we will be able to
then determine if there is in fact a correlation between height and shoe size.
Column
Height
Shoe
Size
n
Std.
Dev.
Mean
Variance
Std. Err. Median Range
40
169.75
74.19231 8.613496 1.361913
170
32
153
40
9.0375
5.735737
9.5
8
5
2.39494 0.378673
Min
Max
Q1
Q3
185
164.5
177.5
13
7
11
Table 1.2: Table 1.2 displays the summary statistics for each data set.
2
12/13/2012
We will use table 1.2 to determine if there are any outliers in our data. If there
proves to be any outliers, we would want to exclude them from our data and recalculate our
data in order to have the most accurate data; outliers could skew our data and charts form
the true correct answer. A value will prove to be an outlier by being outside of the lower and
upper fences. The lower fence = Q1 – 1.5(IQR). The upper fence = Q3 + 1.5(IQR). The IQR =
Q3 – Q1. The data was taken directly from table 1.2.
Height: q3 = 177.5, Q1 = 164.5, IQR= 13, Lower fence= 151.5, Upper fence = 190.5
Shoe Size: q3 = 11, Q1 = 7, IQR = 4, Lower fence = 3, Upper fence = 15
All of our data collected was found to be in between the lower and upper fence of the
data set. We can move forward with our study confidently without any outliers to skew our
data.
Shoe Size vs. Height
Shoe Size
(Standard American Units)
15
y = 0.25x - 33.4
R² = 0.8084
13
11
9
7
Shoe Size
5
Linear (Shoe Size)
3
1
75
125
175
225
Height
(Inches)
Graph 1.1: Graph 1.1 is a graph of the plotted data set. Additionally, the “R” value and “Best
fit line” are displayed.
By formulating graph 1.1, we have been given two very important pieces of
information. One, the least squares regression line, also known as the best fit line; We will
use this formula later in the study to make predictions provided our linear assumption model
is valid. The other important piece of information is the calculated “r” value. This value
compared against the table, “critical values for correlation coefficient” (Table 2; 3rd Ed.
Prentice Hall “Statistics”, Sullivan). Table 2 asks that for 30(n) number of values, your r
must be greater than or equal to 0.361. An R value of .8084 is clearly large enough to
represent a direct correlation. This is the first step in proving our linear assumption model is
valid.
3
12/13/2012
Table 1.3: Table 1.3 shows the data and its corresponding actual vs predicted and residuals
values. The predicted value is calculated by entering the height (x) into the least squares
regression formula (y=0.25x – 33.4). The residual is calculated by subtracting the predicted
value from the actual observed value.
Residuals
Residuals
2.5
2
1.5
1
0.5
0
-0.5 140
-1
-1.5
-2
-2.5
150
160
170
180
190
Shoe Size
(Standard American Units)
Graph 1.2: Graph 1.2 is the residuals plotted from the data. Residuals are found by
subtracting predicted y values from observed y values and comparing them against our x
values.
4
12/13/2012
Table 1.3 and Graph 1.2 are used to determine our final two criteria in
our linear model assumption. First, the graph 1.2 is lacking a discrete
pattern proving our data has a clear linear relation. Graph 1.3 and 1.4
illustrate this.
Graphs 1.3 and 1.4 – illustrating the difference between an appropriate and inappropriate
linear model.
Second, graph 1.2 does not spread while
increasing or decreasing. This proves our
data has a linear relation. Graph 1.5 is an
example of data with no linear relation.
Graph 1.5 – an example of
increasing pattern. This data does
NOT have a linear relation.
5
12/13/2012
Predicting Values
Since our linear model assumption proves to be valid, we are safe to
make several predictions about a person’s height and what shoe size they
wear with our linear regression equation. Table 1.3 below displays several
predictions and values. The predictions are shoe sizes predicted by selected
height that is within our data range. y = 0.25x - 33.4
Trial #
1
2
3
Height (X)
156
175.5
181
Equation
y = 0.25(156) - 33.4
y = 0.25(175.5) - 33.4
y = 0.25(181) - 33.4
Shoe Size (Y)
6
10.5
12
Table 1.3: Table 1.3 shows predicted values of height(X), plugged into linear regression model
to predict shoe size(Y).
The predicted shoe size is similar to the actual data. The equation
given in graph 1.1 allows us to make safe predictions.
Conclusion and Summary:
Our investigation was very successful. Our data had adequate sample size
and data points. The data has been represented very well in a number of ways in
both graphs and tables for the reader to understand completely. Our analysis of the
data and hypothesis proved to be true by proving the linear assumption model to be
valid by meeting all three criteria. A more pleasing visual representation of the data,
such as pie charts and graphs, could have been used. This would have made the
report more enticing and therefore more worthwhile, as it would have been read
more. It is also important to note that this data was collected from a group and was
not specific of gender, race, or age; although only samples were collected from age 16
and older. Also, when using the linear regression equation, shoe size values(Y) were
rounded up to the next full size or half size; this is to accommodate the size that
shoes are sold in, half and full. Having collected the data through a survey on a high
school campus the data was collected through random sampling.
To summarize all 3 Criteria were appropriately met in our investigation. The
R Value of .8084 represents a strong correlation of the data; the R value is large
enough to represent a direct correlation of the data. The residual values represented
by Graph 1.2 show that there is not a discrete pattern amongst the data’s residuals.
Had there been a definite pattern it would have warranted our data NOT linear
related. The residual values represented by Graph 1.2 do not spread while
increasing or decreasing. Had this been the case, the pattern would have warranted
our data NOT linear related.
6
Download