Math 338 Summer section Lab Activity #2 Jordan, Daniel

advertisement
Math 338 Summer section
Lab Activity #2
Jordan, Daniel
Application1: Suppose that 25 people are surveyed as to their beer-drinking preference. The categories
were respectively: 1. "Domestic Can", 2. "Domestic bottle",3. "Import Can", 4. "Import bottle".
a. Enter the data using R as a vector called beer, then verify that you have 25 values.
b. Create a table showing the categories of beer and the corresponding frequencies. Rename the table
as y. Add the frequencies manually and verify that the sum of frequencies is 25.
c. Use the table from part b , find out How many people have the preferences of Domestic or imported
cans?
18 people have the preference of Domestic or imported cans.
d. Create a table showing the categories of beer and the corresponding percentages. Rename the table
as y.
e. Create a pie chart for the vector y= table(beer)/length(beer), comment on the pie graph. “ the
graphs will appear after you complete 4 graphs”
f.
Give names to the four categories of beer using this R command:
g. Create a pie chart for the vector y, note that the names will appear on the pie.
h. Create a barplot for the table(beer)
i. Create barplot for the vector y,
Now you can see the above four graphs on one page.
j.
Compare the two pie charts and do compare the barplots.
k. Which graph you prefier, the pie or the bar? Justify your answer.
Because we’re only dealing with 4 categories, I prefer the barplot with percentages. If we were comparing
20 types of beer, I would prefer a pie chart, since I feel it would represent the data better than a very wide
barplot.
Application2: Here is a compilation of the most common types of spam. Enter the data as a vector
called percent.spam. Give the following names: Ad, Fin., H, Leis,Prod, Scames.
a. Create a barplot for the data percent.spam, enter the title “barplot for spam”
b. Create a pie chart for the vector percent.spam, enter the title” Pie Chat for spam”
c. Comment on the graphs you just obtained. Which one you prefer? Justify.
I prefer the pie chart, for the same reason mentioned in application 1. When dealing with a large number of
categories, I prefer using a pie chart to graph the data.
Application 3: Statistical Measures Using data Movies Gross
a. Enter the data using the scan command
b. Use the Code movie.gross=x, this will change the name of vector of data.
c. Verify the length of the vector to be 22, using Code: >length(movies.gross).
d. Obtain the mean of the movies.gross, using >mean(movies.gross).
e. Obtain the median of the movies.gross, using >median(movies.gross).
f. Obtain the variance of the movies.gross using > var(movies.gross)
g. Obtain the square root of the variance using > sqrt(var(movies.gross)
h. Obtain the standard deviation using > sd(movies.gross), report your answer. Compare it with part g.
This value is identical to the square root of the variance from part g.
i. Obtain the quantiles using > quantile(movies.gross)
j. Obtain the 20th percentile using > quantile(movies.gross, 0.2)
k. Obtain the 20th , 75th , 90th percentile using > quantile(movies.gross, c(.2,.75,.90))
l. Obtain the 5 number summary using >fivenum(movies.gross).
m. Obtain summary of the data using > summary(movies.gross), compare your answer with part l.
Comment.
These values match the 5 number summary from part l.
Application 4: Some graphical representation Using data : Movies Gross
a. Obtain the stem and leaf plot using > stem(movies.gross)
b. Obtain the stem and leaf plot using >stem( movie.gross, scale=2), what is the difference in the graph
of the stem comparing your answers in a. and b.? Describe the features of the distribution. Comment
on the pattern and measures of centers and variation.
By adding scale=2, this plot splits values less than 5 and values greater than five into two different stems.
This distribution is extremely skewed to the right, with a median of 0.85 and variance of 86.86
c. Graph the histogram using > hist(movies.gross) , comment on the shape of the histogram. Again
Comment on the pattern and measures of centers and variation.
This distribution is extremely skewed to the right, with a median of 0.85 and variance of 86.86
d. Graph the histogram using > hist(movies.gross, probabilities=TRUE), compare with the graph in
part c.
This histogram matches the graph from part c.
e. Graph the boxplot using > boxplot(movies.gross), Use the graph to obtian approximate value of , Q3.
Is there any outliers.
Q3 is approximately 12. There is an outlier at approximately 30.
f. Compute the interquartile range using > IQR(movies.gross)
g. Compute manually or using “R as a calculator” the Upper fence UF=Q3+3/2 * IQR.
Use your answer to detect outliers in this data set. Report the outliers.
The upper fence is 28.8. This value matches the graph, which shows an outlier at approximately 28-30.
h. Use the R command > par(mfrow=c(3,1)). This is to include 3 plots in one graph. This should be
followed by three graphs:
Application 5: Reading Data sets downloaded on your Desktop
a. Download scores, print values
b. Download sex, print values.
c. List the scores for Female only. > list(scores[1:30])
d. List the scores for Men only. > list(scores[31:60])
e. Obtain the sd for scores
f.
Obtain the mean for scores
g. Obtain the histogram for scores >hist(scores)
h. Use the R command hist(scores,probability=TRUE)
i.
Compare the last two graphs and comment.
The shape and features of both graphs are equal. Both graphs have an approximate median of 125,
with a Q1 of approximately 90, and Q3 of approximately 180. Both graphs are very slightly skewed
right.
j.
Obtain the 5 number summary > fivenum(scores)
k. Obtain summary of scores > summary(scores)
l.
Obtain the stem plot ., comment on the distribution. >stem(scores)
The stemplot matches the shape of the histogram above. The distribution is skewed right, with a
median of approximately 125, Q1 of approximately 90 and Q3 of approximately 180.
m. Obtain the 20th percentile > quantile(scores,0.25)
n. Obtain the 20th, 70th, 90th percentiles > quantile(scores,c(0.20,0.70,0.90))
o. Graph the boxplot for Female and for Male. Compare the differences and similarities. Use >
boxplot(scores ~ sex)
Download