Uploaded by joserr

Pset 1 stats

advertisement
(i)
(ii)
Calculate the sample mean and median of homevalue2006Mar. Comment on the
shape of the distribution without creating a histogram.
shape of the distribution --> graph is positively skewed since the median is
smaller than the mean.
Report the five number summary and box-whisker plots for homevalue2006Mar
and homevalue2010Jul.
(iii)
Compute the standard deviation and IQR of homevalue2006Mar and
homevalue2010Jul, respectively.
(iv)
Summarize your findings from (ii) and (iii).
Home values seem to be down in July of 2010 when compared to prices in March
of 2006. The minimum value for a house was lower in 2010 by a difference of 3k,
which pales in comparison to the 22k difference in maximum values between
both years. Median home value was also down when compared to prices in 2006.
Furthermore, the range and IQR for housing prices are lower in July of 2010,
showing that middle values of my dataset cluster more tightly in the middle,
meaning there is less variability in housing value.
(v)
What do you think caused your findings in (iv)? Explain. What additional data
would be helpful to support your explanation?
A myriad of factors might’ve caused this drastic change in home values. However,
I believe the most important factor for this price disparity might be that of a
financial crisis. The impact of the 2008 recession could’ve played a part in
decreasing prices for housing. Foreclosures and financial hardships affected
people’s ability and willingness to purchase housing, decreasing the values of
said homes in the process. To support this explanation, I could use more data,
which would include quarterly housing prices after March of 2006 and leading up
to July of 2010. This would provide a more complete picture of whether or not
prices were affected by the recession.
Part II
Q-26
It appears that the vast majority of runners slowed down toward the end of the race, although
most did by less then 300 seconds, making the histogram positively skewed, unimodal and
assymetric. The typical difference value seems to be the mark from 50 to 100 seconds,
followed closely by the 100-150 mark, and the 150-200 second mark. That means that while
runners mostly did slow down, they didn’t do so by a large margin. Only a very small
percentage of runners sped up by the last stretch of the race, probably less than 8% of the
total number of marathon runners.
Q-32
1.
2.
3.
1000-1190
63.7%
Q-44
1.
18.7
2.
Mean is 19.25743, median is 19.2. The representative value wasn’t that far off, from both terms, I
just didn’t account for some of the spikes in the data. Median and mean is almost exactly the same
value, which means that the data set has a symmetrical distribution.
Q-48
Let x1;...;xn be a sample, and let a and b be constants with a=/=0. Define a new sample y1;...;yn by y1 = ax1 +
b, ..., yn = axn + b.
1.
The mean of sample y1;...;yn will increase/decrease by a factor of constant a, plus the value of b.
To illustrate this I can use the following example:
xi’ x1;...;xn = 2,3,4,5
yi’ y1;...;yn = 12, 18, 24, 30, if a = 6 and b = 0.
The mean of xi’ is 3.5, while the mean for yi’ is 21. That means that xi’mean increased by a factor
of 6, as 3.5 x 6 = 21. Additionally, if I were to assign a different value to b, like b = 1, the mean for
yi’would increase by 1.
2.
Again, the median of yi’ would increase/decrease by a factor of a, and added/subtracted b. The
median in xi’ would is 3.5, while the median in yi’ is 21.
Q-54
1.
2.
Mean is 115.58, and deviations are: .8200015, .3200015, -.9800015, -.3800031, .2200031.
Sample standard deviation is 694264, while the sample variance is .4820025.
3. Sxx is .120500625, that is the sum of all deviations, which divided by n-1 (or 4) is equal to .
4820025.
4. The sample variance was the same, that of .4820025. This is because decreasing every
observation by 100 decreases the mean by 100 as well, creating the same deviations, leading to
the same value for s2
Q-66
a. The Great Divide Hercules is an outlier, as 9.10 > (5.95 + 1.5(1.6)). There is only one extreme
outlier, that being the Rogue Imperial Stout, as 11.6 > (5.95 + 3(1.6)).
b.
This box plot is fairly symmetric, with the median close to the middle of the box. It features one
outlier, and one extreme outlier, signaled by the dot and the x, respectively.
Download