Part 1 - Winona State University

advertisement
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
A STATISTICAL INVESTIGATION
In this section, we will discuss both descriptive and inferential methods that are appropriate
only when investigating a single categorical variable with two levels (e.g. yes or no, low birth
weight or normal birth weight, smoker or nonsmoker, etc.) .
FORCED CHOICE TECHNIQUE IN CRIMINAL INVESTIGATIONS
Example 3.1: A suspected serial-rape murderer, an ex-con with a history of sex crimes, was
interrogated by police after he was overheard bragging to others that he raped, killed, and
buried a young woman victim in an isolated valley outside of the city in which he resided. He
told police that he had never met the victim and that he had never been to the valley. A series
of binary (yes/no) questions embedded within the interrogation was designed to test his
knowledge of victim characteristics that only the perpetrator would know.
Of the 20 questions regarding victim characteristics, clothing, and information obtained from
family and friend who last saw her, he answered 3 correctly. Does this provide evidence that
the suspect was guilty of the crime? Why or why not?
Questions:
1. What is the single categorical variable of interest in this problem (i.e., our response
variable)?
2. How many questions did the suspect answer correctly? What percentage is this? Note
that calculating and reporting this sample proportion was covered in the Descriptive
Statistics section.
3. Suppose the suspect was merely guessing the answers to the 20 questions and had no
knowledge of the victim how many questions would we expect the suspect to answer
correctly? What percent is this?
Note that the observed number of correct answers is less than would be expected. However,
even though this is less than expected, this is not necessarily enough statistical evidence to
support the suspect’s guilt.
A key question is how to determine whether the suspect’s result is surprising under the
assumption that he was merely guessing the answers to 20 questions asked about the victim.
To answer this, we will use Tinkerplots 2® to simulate the process of answering 20 questions by
merely guessing with 50% chance of answering correctly, over and over again. Each time we
simulate the process, we’ll keep track of how many questions a suspect who was simply
guessing would got right. Once we’ve repeated this process a large number of times, we’ll have
a pretty good sense for what outcomes would be very surprising, or somewhat surprising, or
not so surprising if suspect was simply guessing answers to the questions about the victim.
43
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
The Simulation in Tinkerplots (you will not be doing this yourself right now)

Drag a new Sampler to the workspace. Click and drag the Spinner over the workspace
so that you see a spinner.

Change Attr1 to Answer.

Change the options on the Spinner from a and b to Correct and Wrong. Click on the
drop-down arrow below the spinner and select Show Percentage. Change the
proportions to 50% (so that we simulate the situation in which the subject is merely
guessing the answers to the questions; i.e., there is 50% chance of answering correctly).

Change Draw to 1 to represent taking only one student from the population at a time.
Change the Repeat value to 20 to represent the 20 questions asked by the interrogators
about the victim. Your spinner should look like this:

Change the speed to Fastest, and click Run. A table should appear with 20 entries. This
represents one set of answers to the 20 questions where the suspect was merely guessing
the answers..

To count the number of correct answers , highlight the column titled Answers and drag a
new plot to the workspace. Click and drag one point from your plot all the way to the
right, and the points should organize as follows:
44
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015

Use the N button to count the number of correct (and wrong) answers. How questions
in your first sample did the guessing suspect answer correctly?

To simulate this process 99 more times, place your cursor over the number of correct
answers displayed on the plot, right-click, and select Collect Statistic. Ask Tinkerplots
to collect 99 more samples (note that each is of size 20) and click Collect. A table should
appear containing the results of all 100 simulated trials.

To summarize these results, highlight the column containing these counts and drag a
new plot to the workspace.

Click and drag a point all the way to the right to organize the points. Double-click on
either endpoint, and change Bin Width to 1.

Use the vertical stack option to better organize the data points.

Select N to count the number of times each outcome occurred in your 100 simulations.
Sketch your results on the graph on the next page.
45
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
Questions:
4. What does each dot on the above plot represent?
5. Based on the results of the simulation study, would you consider the result actually
obtained during the interrogation of the suspect (3 out of 20 correct answers) to be
surprising or unusual if the he was simply guessing the answers to questions about the
victim?
6. Do you think that the interrogation results provide sufficient evidence of the suspect’s
guilt? Explain why or why not.
46
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
MAC vs. PC and the WSU Laptop Program
Example 3.2: Consider a survey administered to a random sample of 318 Winona State
undergraduate students. One of the items on the survey was which platform they chose for
their laptop, PC or Mac. Suppose that a college professor hypothesizes that the majority of all
WSU undergraduate students prefer MACs. Of the 318 students surveyed, 171 reported that
they preferred MACs. Do these data support the professor’s hypothesis? We will carry out a
statistical investigation in order to answer this question.
Questions:
1. What is the single categorical variable of interest in this problem (i.e., our response
variable)?
2. How many of the students in the sample favored the MAC? What percentage is this?
Note that calculating and reporting this sample proportion was covered in the
descriptive statistics section 3.
3. Suppose the overall population of WSU undergraduate students has no real preference
for either the PC or the Mac; that is, the population of all students is split evenly in terms
of their preference. If this is the case, what percentage of the 318 students would you
expect to choose a MAC when taking the survey? How many students is this?
Note that the observed number of students who chose the MAC is greater than what would be
expected if the population of all WSU undergraduates had no real preference. However, even
though more than half of our sample chose the MAC, this is not enough statistical evidence to
support the professor’s claim that the majority of all WSU undergraduates prefer the MAC.
A key question is how to determine whether the survey’s result is surprising under the
assumption that the population of WSU undergraduates overall has no preference. To answer
this, we will use Tinkerplots 2® to simulate the process of 318 students choosing either a MAC
47
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
or a PC each with a 50% probability, over and over again. Each time we simulate the process,
we’ll keep track of how many times a student chose the MAC (note that you could also keep
track of the number of times a student chose the PC). Once we’ve repeated this process a large
number of times, we’ll have a pretty good sense for what outcomes would be very surprising,
or somewhat surprising, or not so surprising if the population of all WSU undergraduates has
no real preference.
The Simulation in Tinkerplots

Drag a new Sampler to the workspace. Click and drag the Spinner over the workspace
so that you see a spinner.

Change Attr1 to Computer Preference.

Change the options on the Spinner from a and b to PC and Mac. Click on the drop-down
arrow below the spinner and select Show Proportion. Change the proportions to .50 (so
that we simulate the situation in which the population has no preference; i.e., a student
randomly selected from the population has a 50% chance of choosing a PC).

Change Draw to 1 to represent taking only one student from the population at a time.
Change the Repeat value to 318 to represent taking a random sample of 318 students.
Your spinner should look like this:

Change the speed to Fastest, and click Run. A table should appear with 318 entries.
This represents one random sample of 318 students taken from a population with no real
preference for either the MAC or the PC.

To count the number of students who chose the MAC in this first sample, highlight the
column titled Computer_Preference and drag a new plot to the workspace. Click and drag
one point from your plot all the way to the right, and the points should organize as
48
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
follows:

Use the N button to count the number who chose the MAC (and the PC). How many
students out of the 318 in your first sample chose the MAC?

To simulate this process 99 more times, place your cursor over the number who chose
the PC displayed on the plot, right-click, and select Collect Statistic. Ask Tinkerplots to
collect 99 more samples (note that each is of size 318) and click Collect. A table should
appear containing the results of all 100 simulated trials.

To summarize these results, highlight the column containing these counts and drag a
new plot to the workspace.

Click and drag a point all the way to the right to organize the points. Double-click on
either endpoint, and change Bin Width to 1.

Use the vertical stack option to better organize the data points.

Select N to count the number of times each outcome occurred in your 100 simulations.
Sketch your results on the graph below.
Number Choosing MACs
49
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
Questions:
4. What does each dot on the above plot represent?
5. Based on the results of the simulation study, would you consider the result actually
obtained in the survey study (171 out of 318 preferring MACS) to be surprising or
unusual if the population of all WSU students had no real preference for either type of
computer?
Do you think that the survey data provide evidence that the majority of all WSU
undergraduates prefer MACs? Explain why or why not. Also, note that using the data
obtained from the sample to make a generalization about the population of all WSU
undergraduates is an example of inferential statistics.
50
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
Example 3.3: Evaluating Deafness
Consider the case study presented in an article by Pankratz, Fausti, and Peed titled “A ForcedChoice Technique to Evaluate Deafness in the Hysterical or Malingering Patient.” Source: Journal
of Consulting and Clinical Psychology, 1975, Vol. 43, pg. 421-422. The following is an excerpt from
the article:
The patient was a 27-year-old male with a history of multiple hospitalizations for idiopathic
convulsive disorder, functional disabilities, accidents, and personality problems. His hospital records
indicated that he was manipulative, exaggerated his symptoms to his advantage, and that he was a
generally disruptive patient. He made repeated attempts to obtain compensation for his disabilities.
During his present hospitalization he complained of bilateral hearing loss, left-sided weakness, leftsided numbness, intermittent speech difficulty, and memory deficit. There were few consistent or
objective findings for these complaints. All of his symptoms disappeared quickly with the exception of
the alleged hearing loss.
To assess his alleged hearing loss, testing was conducted through earphones with the subject
seated in a sound-treated audiologic testing chamber. Visual stimuli utilized during the
investigation were produced by a red and a blue light bulb, which were mounted behind a oneway mirror so that the subject could see the bulbs only when they were illuminated by the
examiner. The subject was presented several trials on each of which the red and then the blue
light were turned on consecutively for 2 sec each. On each trial, a 1,000-Hz tone was randomly
paired with the illumination of either the blue or red visual stimulus, and the subject was
instructed to indicate with which stimulus the tone was paired.
Questions:
1. What is the single categorical variable of interest in this problem (i.e., our response
variable)?
2. Suppose the subject is presented with 100 trials. If he truly has suffered hearing loss, he
is essentially guessing on each trial. If this is the case, in how many trials would you
expect the suspect to correctly identify with which stimulus the tone was paired?
3. Suppose the subject correctly identifies with which stimulus the tone was paired in only
45 out of 100 trials. A researcher argues that since this was less than the expected
number of correct matches, the subject must be intentionally answering incorrectly in
order to convince them he can’t hear. What is wrong with their reasoning?
4. Suppose the subject correctly identifies with which stimulus the tone was paired in none
of 100 trials. A researcher believes this result provides evidence that the subject must be
intentionally answering incorrectly in order to convince them he can’t hear. Do you
agree?
51
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
Once again, the key question is how to determine whether the subject’s result on the 100 trials is
surprising under the assumption that he truly has hearing loss and is simply guessing on each
trial. To answer this, we will simulate the process of guessing on 100 trials of this experiment,
over and over again. Each time we simulate the process, we’ll keep track of how many times the
subject was incorrect (note that you could also keep track of the number of times he was
correct). Once we’ve repeated this process a large number of times, we’ll have a pretty good
sense for what outcomes would be very surprising, or somewhat surprising, or not so
surprising if the subject is really guessing.
Use the instructions outlined above for Examples 3.1 and 3.2 to carry out the Tinkerplots
simulation. Note that you will have to revise a few elements of the simulation that relate to the
following questions:

What are the two possible outcomes on each of the trials? Change the values on your
spinner accordingly.

What is the probability that each outcome occurs, given that the subject is just guessing
on each trial? Change your spinner accordingly.

Be sure to change the Draw value to 1 since the subject is guessing only one color at a
time.

In how many trials does the subject participate overall? Keep this value in mind when
setting the Repeat value.
Carry out the simulation study 100 times overall, keeping track of the number of times the
subject was incorrect in each trial. Sketch in your results below:
Questions:
5. What does each dot on this plot represent?
52
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
6. Suppose a subject was incorrect on 57 out of 100 trials. Would you believe they were
probably just guessing, even though they had more than the expected number of
incorrect answers? Why or why not?
7. The actual subject was incorrect on 64 out of 100 trials. Based on this statistical
investigation, do you believe he was just guessing, or do these indicate that he may have
been answering incorrectly on purpose in order to mislead the researchers into thinking
he was deaf? Explain your reasoning.
Example 3.4: Are Women Passed Over for Managerial Training?
This example involves possible discrimination against female employees. Suppose a large
supermarket chain occasionally selects employees to receive management training. A group of
female employees has claimed that they are less likely than male employees of similar
qualifications to be chosen for this training.
The large employee pool that can be tapped for management training is 60% female and 40%
male; however, since the management program began, 9 of the 20 employees chosen for
management training were female (only 45%).
The question of interest is as follows: Do the data provide evidence of gender discrimination
against females?
Questions:
1. What is the population of interest?
2. What is the sample?
3. What is the variable of interest?
53
STAT 110: Section 3 – Methods for Analyzing a Single Categorical Variable
Fall 2015
Simulation Study
To investigate this research question, we will carry out a simulation in Tinkerplots 2®. Once
again, note that you will have to revise a few elements of the simulation that relate to the
following questions:

What are the two possible outcomes on each of the trials? Change the values on your
spinner accordingly.

What is the probability that each outcome occurs, given that there is no discrimination?
Change your spinner accordingly.

Be sure to change the Draw value to 1 since only one employee is selected from
management at a time.

How many employees were selected for management overall in the study? Keep this
value in mind when setting the Repeat value.
Carry out the simulation study 1,000 times overall, keeping track of the number of times a
female was chosen in each trial. You should see something similar to the following:
Questions:
1. What does each dot represent?
2. Would you say that this observed result (9 out of 20) provides evidence of gender
discrimination against females? Explain.
54
Download