STAT 610 Homework 1 Due Thursday October 3, 2pm Please finish each problem assigned below and submit the homework electronically through Blackboard. You can copy and paste your R/RStudio console/plots to a document, which you can submit to Blackboard. Some of the commands were not covered in class, please refer to the text for a description of them (mostly in chapter 3). 1. Please study the data in rivers (You can run ?rivers in R to see details.) (a) Run and explain the following commands: mean(rivers) > mean(rivers) [1] 591.1844 Average of the 141 rivers is 591.1844 var(rivers) > var(rivers) [1] 243908.4 Variance of the 141 rivers is 243908.4 sd(rivers) > sd(rivers) [1] 493.8708 Standard deviation of the 141 rivers is 493.8708 sqrt(var(rivers)) > sqrt(var(rivers)) [1] 493.8708 Square root of variance equals standard deviation median(rivers) > median(rivers) [1] 425 Median number of the 141 rivers is 425 fivenum(rivers) > fivenum(rivers) [1] 135 310 425 680 3710 This five numbers represent the minimum, lower quartile, median, upper quartile, and maximum in the “rivers” sequence. These are basic commands not requiring additional packages installed. (b) What proportion of the rivers have length larger than 1000 miles? length(which(rivers>1000))/length(rivers) > length(which(rivers>1000))/length(rivers) [1] 0.1134752 sum(rivers>1000)/length(rivers) > sum(rivers>1000)/length(rivers) [1] 0.1134752 Can you explain why each of these formulas works? (c) What proportion of the rivers have length within one standard deviations of the mean? length( which( abs(rivers-mean(rivers)) < sd(rivers) ) ) / length(rivers) > length( which( abs(rivers-mean(rivers)) < sd(rivers) ) ) / length(rivers) [1] 0.9007092 (d) We can visualize data with many functions, e.g., hist(rivers) boxplot(rivers) Please run and explain the results. hist(rivers) Among the lengths (in mile) of 141 “major” rivers in North American, more than 80 rivers have length less than 500 miles, about 40 miles have length between 500 and 1000 miles, about 10 rivers have 1000-1500 miles long, and about 10 rivers have more than 1500 miles long. > boxplot(rivers) In the boxplot, the bold dark line represents the median number which is 425. The box’ s upper line and lower line are upper quartile and lower quartile. The outside upper and lower lin e of the box are Q3+1.5IQR and Q1-1.5IQR (IQR=Q3-Q1). Any number beyond or below these two lines is outlier. There is no abnormally small value in this boxplot but are some abnormally l arge values in it. 2. We can also study the data in state:division about states of the United States of America. You can run ?state:division to see details. Please run and explain the following: hist(state.division) boxplot(state.division) table(state.division) barplot(table(state.division), las=2) In particular, do the first two lines work? why? 3. Please review chapter 4 from the book (which is what will be covered next week) and use plain language to explain the concepts: sample space, event, and probability.