Solutions_Activity_10 - Penn State Department of Statistics

advertisement
SOLUTIONS LAB ACTIVITY 10
Activity 10.1

Non-response bias refers to a bias that occurs if people with a particular opinion or trait tend to refuse
to answer certain survey questions (or possibly refuse to participate in the survey at all).
 Response bias refers to the systematic effect of false answers given by survey participants.
Suppose the office that does the Penn State Pulse surveys does a survey about the frequency of academic
cheating (and they have!). In the context of such a survey about cheating, explain the difference between
nonresponse bias and response bias.
Non-response: perhaps those who cheat a lot won’t answer questions, or will refuse to be in the survey.
Response: perhaps those who cheat a lot will lie about their cheating.
Activity 10.2 Suppose you want to sample the music of a band you haven’t heard before. You’re given 5
CDs, each with 12 tracks. You’ll use a probability method to sample 5 tracks to play.
a. Describe (briefly) how you would carry out a simple random sampling method to pick the 5 tracks.
Number the 60 total tracks from 1 to 60. Randomly pick 5 numbers between 1 and 60. Listen to those
tracks. (The “random shuffle” button of a 5 cd player might accomplish this)
b. Describe (briefly) how you would carry out a stratified sampling method to pick the 5 tracks.
The strata = CD’s. Randomly pick one track from each CD.
c. Describe briefly how you would carry out a cluster sampling method to pick the 5 tracks.
Clusters = CD’s. Perhaps randomly pick one CD and then randomly pick 5 tracks from this CD, or
randomly pick all the same track number.
Activity 10.3 Open the Class Survey data and select a column of your choice to answer the questions
following the special notes.
Special Notes:
1. To perform a proportion interval your column can only have two options (e.g. Male/Female for Gender,
Yes/No for smoking, etc.) However, you can manipulate your data to create a column with just two
outcomes. For example, if you wanted to manipulate the GPA data to either ≥ 3.00 or < 3.00 you can use
Calc > Calculator, enter a new variable name in the text box “Store result in variable” and then in the
expression Window enter ‘GPA’ ≥ 3.00. This will create in your new variable name column a series of
‘1’ and ‘0’ where the 1’s are for those with a 3.00 GPA or higher and 0’s for those less than 3.00.
2. When Minitab calculates a proportion, if using variables with Text responses (e.g. Male/Female;
Yes/No) the proportion will be calculated on the second word listed alphabetically (i.e. Male or Yes).
You will see this in the output where it reads “Event = Male” or “Event = Yes”. So if you want Female
or No you will have to refer to Special Note 1.
a. What column of data did you select?
Answers vary
b. What would be your population of interest?
Answers vary, but some examples are “All PSU undergraduate students at University Park”; “All PSU
undergraduate students at University Park”; “All PSU Stat200 Students”. The population would NOT be
all stat 200 students in our section(s) since this survey would not be a sample but instead the population as
all students took the survey
c. How is the class survey representative of that population?
The class is probably similar to these populations in make-up regarding percent that are female,
international, high/low GPA, etc.
d. How would you best describe the sampling technique used in attaining the Class Survey?
Definitely not one that involves statistical methods as no random sampling was done. Most likely you
would consider this sample to be a convenient sample.
e. Define a parameter value of interest (e.g. from Activity 9 the U.S. Government reported that 23% of
US adults ages 18-24 smoked cigarettes). If you cannot easily find one think of one. For instance
(although you cannot use this parameter value now since I am giving it to you!), maybe you believe that
25% of college students cheat or have cheated on a significant other.
Answers will vary
f. Referring to condition 3 on page 297 regarding using the normal approximation for sample proportions
(i.e. is n p̂ ≥ 5 and n(1- p̂ ) ≥ 5) verify that this condition has been met thus allowing you to select in
Minitab the option “Use test and interval based on Normal Distribution”.
Just need to plug in the numbers to verify
g. Use Minitab to create a 90% 1-proportion confidence interval on the column you selected in part a.
What is your interval?
Answers will vary
h. For your interval, answer the following:
What is the sample proportion, p̂ ? Answers will vary
What is the z* multiplier used for your interval? 1.645 or 1.65
What is the standard error? Answers will vary
What is the margin of error? Answers will vary, but is found by multiplier*standard error.
Confidence intervals consist of statistic ± margin of error
i. What is the sample size? The number of students who submitted a survey was 226. If your sample size
is not 226 why do you believe your number differs? Answers will vary
j. List the value and expression for the following: values will vary but expressions are:
Parameter: p
Statistic: p-hat p̂
Sample Size: n
k. If the confidence level in part g were changed to 95% would your resulting interval be wider or
narrower? The interval would get wider since you increased your confidence from 90 to 95%.
Remember, the greater the confidence level the more “confident” you become that the interval contains
the true parameter value. A wider interval would increase your confidence since a wider interval would
give more possible outcomes. Mathematically this is also true because you will notice that at the
confidence level gets larger so does the multiplier.
Download