Uploaded by bxye

Facebook Usage v2

advertisement
Ye, Bo Xian
1. To what extent does Facebook usage vary across states? Plot a histogram and
check for the extent of variation in fb_usage_perc.
Across 50 states, Facebook usage rates have a mean of 40.78%, a standard deviation of
6.92%, and a coefficient of variance of 0.17. Out of the 50 states, 44 states fall in the
range of 30% to 50% usage. Below is a histogram of usage rates across states.
2. Do the census characteristics explain the variation? Does regressing
fb_usage_perc on the 20 census characteristics yield sensible
results? Please explain.
Running a linear regression for fb_usage_perc on all available variables except
“state” and “state_symbol”, I see none of the coefficients of the 20 census
characteristics are statistically significant, with p-values much greater than 0.05.
The R2 value is 0.4574. Therefore, the census characteristics can only explain
45.74% of the variance.
Intuitively, some of the characteristics should be correlated to facebook usage
rate, e.g., percent_highschool_higher, percent_college_higher, percent_poverty,
per_capita_income, median_household_income. These variables do make
statistical significance when regressing only on them, not significant when
combined with all other variables. This is probably because we have only 50
samples, and 20 independent variables, leaving the degree of freedom at only
29. This means that the more characteristics we are surveying to solve a
problem, the more data points we would need.
Principle Component Analysis
Ye, Bo Xian
Due to lack of data points and degree of freedom, we should conduct PCA to eliminate
multicollinearity. The PCA reduced the 20-multivariant problem down to 4 principle
components that have eigenvalues greater than 1. Then, we run regressions for the 50
rows on only the 4 principle components. This produces statistical significance on 2 of
the 4 principle components.
Appendix. R Code
FB=read.csv("FB_usage_by_states_data.csv")
hist_info <- hist(FB$FB_usage_perc, plot=FALSE)
plot(hist_info, xaxt="n", main="Facebook Usage Across States",
xlab="Facebook Usage Rate", ylab="Number of States", col="green")
axis(side=1, at=hist_info$breaks, labels = paste0(100*H$breaks, "%"))
mean(FB$FB_usage_perc)
sd(FB$FB_usage_perc)
LRmodel1 = lm(FB_usage_perc ~ . - state - state_symbol, data=FB)
summary(LRmodel1)
Download