STAT 250 Summer 2025 Data Analysis Assignment 5
You may not upload this file or solutions to any online homework help sites. In addition, you
may not discuss this assignment with any individuals (either in this course or not) either inperson or using group chats. Do not use AI for assistance. Please see our course syllabus for
honor code rules. Thank you.
Investigation 1: Designer or Replica?
A statistics student at George Mason was interested in determining whether there is a difference
in the ability to identify designer handbags between individuals in their 20s and 30s. Specifically,
she wanted to know if there is a difference in the proportion of correct identifications of designer
handbags between these groups. To test this, a blind identification experiment was conducted.
Participants were randomly assigned to be shown either a designer handbag or a high-quality
non-designer replica. Then, they were then asked to guess whether the handbag was a designer
item or not. The researcher obtained random samples and participants were asked to identify the
bags. Of the 90 individuals in their 20s, 52 made the correct identification. Of the 90 individuals
in their 30s, 44 made the correct identification. The student is interested in testing whether a
difference exists in the proportion of correct identifications between these two groups. There is
no posted dataset file for this question (just use the data presented in the prompt).
a) What is the parameter of interest? Use words and symbol(s) in context in your answer.
Define any subscripts that you use. Consider the 20s age group as Population 1 and the
30s age group as Population 2.
b) State the hypotheses using correct notation and symbols (as seen in our course notes).
c) Check the specific conditions necessary to consider using theory-based (or distributionbased) inference methods. There are two to consider: (1) Was the data collected randomly
from the population? and (2) Are there at least ten successes and failures in each group?
Answer both of these questions in one to two sentences each with a valid reason for each
condition.
d) Calculate the estimate of the parameter you stated in (a). Show the work needed to obtain
this statistic. Consider using the information provided above to help you obtain your
answer. Round your statistic to three decimal places.
e) Calculate the standard error of your estimate using the theoretical formula (found in your
course notes). Type your work and show all calculations. Round your result to three
decimal places.
f) To determine if a difference exists, calculate the 95% confidence interval to estimate the
parameter and round the upper and lower limits of confidence interval to three decimal
places. Use the estimates from part (d) and (e) and type all calculations and work.
g) Verify your confidence interval using Rguroo by following these directions.
1
•
•
•
•
•
Go to Analytics → Analysis → Proportion Inference → Two Populations.
In the Response label type “Guess”, in success label type “Correct Guess” and
Failure type “Incorrect Guess”. In the Population label box, type “Bags”. Use
the image presented below as guide.
Under Population 1, type “20s” as label and under Population 2, type “30s” as
label. Then enter the correct sample size and # of successes for each group. See
the image below to fill in each box correctly. (You will need this step later in (l)).
Click the Confidence Interval tab, keep the Confidence Level at 0.95 and select
Large Sample z under Methods, click Preview.
Copy and paste the two-way table, output and table displayed under the title
“Confidence Interval for Difference of Two Population Proportions.”
h) Create a bootstrap distribution by following these instructions. In StatKey under the
middle area labeled ‘Bootstrap Confidence Intervals’, click CI for Difference In
Proportions. Click ‘Edit Data’, then enter in the count and sample size for each group
with the data from your two-way table from (g) above. Next, click ‘Generate 1000
Samples’ 10 times to generate 10,000 samples. Take a screenshot of your bootstrap
distribution including the mean and standard error box and paste it in your solutions
document.
i) Construct a 95% bootstrap confidence interval using the percentile method. Provide the
StatKey image with all blue boxes displayed and type your answer as (lower value, upper
value) below your image.
j) Compare your Confidence intervals from parts (f), (g), and (i) in one sentence.
2
k) Based on your confidence intervals from parts (f), (g), and (i), make a decision about the
null hypothesis and draw a conclusion about the claim whether there is evidence of a
difference between correctly guessing designer and replica bags. State your decision and
conclusion in one to two sentences.
l) Calculate your test statistic and p-value using Rguroo. Follow the steps in part (g) again
but this time select Test of Hypothesis tab. Choose Large Sample z under method.
Correctly set your alternative hypothesis from part (b) and your null values. Set your
significance level to 0.05 and click Preview. Copy and paste only the output and table
displayed under the title “Two Population Proportion Test of Hypothesis.”
m) Verify your p-value from part (l). Start with your test statistic obtained in Rguroo and go
to StatKey. Click Theoretical Distributions → Normal and click on the correct tail button
based on the hypotheses in part (b) and enter the standardized test statistic from part (k)
in the correct bottom box (you may double your upper probability or click two-tail).
Copy and paste (or take a screenshot of) the image of the standard Normal distribution
with all blue boxes displayed and type the value of the p-value below the image.
n) Verify the decision and conclusion you made in part (l) by comparing your p-value from
part (m) to the significance level. Write your answer in one sentence.
Investigation 2: Time it takes to Complete STAT 250 Exam 2
A random sample of 23 STAT 250 students from a previous semester was collected and the time
it took each of them to complete Exam 2 was recorded. The data was measured in minutes and
the data set is called Exam2Time. The professors of the course claim that the time it takes a
student to complete the exam will be greater than 60 minutes. Consider the population of all
times to be left skewed. Using = 0.01, is there sufficient evidence to conclude that the mean
time to complete Exam 2 is greater than 60 minutes? Conduct a full hypothesis test by following
the steps below. Enter an answer for each of these steps in your document.
a) What is the population of interest? Answer this question in one sentence.
b) Define the parameter of interest in context using symbol(s) and words in one complete
sentence.
c) State the null and alternative hypotheses using correct notation.
d) Check the following conditions necessary to consider conducting inference using theorybased methods and the t-distribution. There are three to consider: (1) Was a random
sample collected; (2) Is the population where the sample comes from Normal; and (3) Is
the sample size greater than or equal to 30? Check these conditions in one sentence each
and provide a reason why each condition is or is not met.
3
e) For the Time variable, construct a frequency histogram in Rguroo. Remember to
properly title and label the graph. Copy and paste this graph into your document.
Instructions to construct this graph are provided on Data Analysis Assignment 2.
f) Describe the shape of the Time histogram in one sentence.
g) For the Time variable, construct a horizontal boxplot in Rguroo. Remember to properly
title and label the graph. Copy and paste this graph into your document. Instructions to
construct this graph are provided on Data Analysis Assignment 2.
h) Does the Time boxplot show any outliers? Answer this question in one sentence and
identify any outliers if they are present.
i) Considering your interpretation of each graph, is theory-based inference using the tdistribution appropriate for the Time variable? Answer the question and provide a reason
for your response.
j) Obtain the mean and standard deviation of the Time variable in Rguroo. You may screen
shot the table provided in the Rguroo Output. Use Analytics → Analysis → Numerical
Summaries. Select our dataset in the “Select a Dataset” menu and select Time as the
variable. Then, click Univariate and select the statistics mean and standard deviation.
k) Regardless of your answers above, calculate the standardized test statistic value using the
statistics from part (j) and your null value stated in part (c). Type all work and round your
test statistic value to three decimal places.
l) Obtain your p-value in StatKey using Theoretical Distributions → t. First, enter your
degree of freedom and click on the correct tail button based on the hypotheses in part (c)
and enter the standardized test statistic from part (k) in the correct bottom box. Copy and
paste the image of the t -distribution with all blue boxes displayed and type the value of
the p-value below the image. Round your p-value to three decimal places.
m) Verify your test statistic and p-value using Rguroo. Go to Analytics → Analysis → Mean
Inference → One Population. Import the Exam2Time data (or find it in your Rguroo
group) and select it as the data set and Time as the variable. Then, click the Test of
Hypothesis tab, choose t-statistic under method. Finally, correctly set your alternative
hypothesis and significance level to = 0.01 and click Preview. Copy and paste only the
output and table displayed under the title “One Mean Test of Hypothesis.”
n) State whether you reject or do not reject the null hypothesis based on the p-value from
part (m) and the significance level 0.01 and a reason for your decision in one sentence.
o) Based on the above decision, state your conclusion addressing the alternative hypothesis,
in context in one sentence.
4