Uploaded by Junyang Xiao

STAT 610 Homework 1

advertisement
STAT 610 Homework 1
Due Thursday October 3, 2pm
Please finish each problem assigned below and submit the homework electronically through
Blackboard. You can copy and paste your R/RStudio console/plots to a document, which you can
submit to Blackboard. Some of the commands were not covered in class, please refer to the text
for a description of them (mostly in chapter 3).
1. Please study the data in rivers (You can run ?rivers in R to see details.)
(a) Run and explain the following commands:
mean(rivers)
> mean(rivers)
[1] 591.1844
Average of the 141 rivers is 591.1844
var(rivers)
> var(rivers)
[1] 243908.4
Variance of the 141 rivers is 243908.4
sd(rivers)
> sd(rivers)
[1] 493.8708
Standard deviation of the 141 rivers is 493.8708
sqrt(var(rivers))
> sqrt(var(rivers))
[1] 493.8708
Square root of variance equals standard deviation
median(rivers)
> median(rivers)
[1] 425
Median number of the 141 rivers is 425
fivenum(rivers)
> fivenum(rivers)
[1] 135 310 425 680 3710
This five numbers represent the minimum, lower quartile, median, upper quartile,
and maximum in the “rivers” sequence.
These are basic commands not requiring additional packages installed.
(b) What proportion of the rivers have length larger than 1000 miles?
length(which(rivers>1000))/length(rivers)
> length(which(rivers>1000))/length(rivers)
[1] 0.1134752
sum(rivers>1000)/length(rivers)
> sum(rivers>1000)/length(rivers)
[1] 0.1134752
Can you explain why each of these formulas works?
(c) What proportion of the rivers have length within one standard deviations of the mean?
length( which( abs(rivers-mean(rivers)) < sd(rivers) ) ) / length(rivers)
> length( which( abs(rivers-mean(rivers)) < sd(rivers) ) ) / length(rivers)
[1] 0.9007092
(d) We can visualize data with many functions, e.g.,
hist(rivers)
boxplot(rivers)
Please run and explain the results.
hist(rivers)
Among the lengths (in mile) of 141 “major” rivers in North American, more than
80 rivers have length less than 500 miles, about 40 miles have length between 500 and
1000 miles, about 10 rivers have 1000-1500 miles long, and about 10 rivers have more
than 1500 miles long.
> boxplot(rivers)
In the boxplot, the bold dark line represents the median number which is 425. The box’
s upper line and lower line are upper quartile and lower quartile. The outside upper and lower lin
e of the box are Q3+1.5IQR and Q1-1.5IQR (IQR=Q3-Q1). Any number beyond or below these
two lines is outlier. There is no abnormally small value in this boxplot but are some abnormally l
arge values in it.
2. We can also study the data in state:division about states of the United States of America. You
can run
?state:division to see details. Please run and explain the following:
hist(state.division)
boxplot(state.division)
table(state.division)
barplot(table(state.division), las=2)
In particular, do the first two lines work? why?
3. Please review chapter 4 from the book (which is what will be covered next week) and use
plain language to explain the concepts: sample space, event, and probability.
Download