Statistics 103 Probability and Statistical Inference Instructions for lab

advertisement
Statistics 103 Probability and Statistical Inference
Instructions for lab 10
Lab Objective
Perform chi-squared goodness of fit and independence tests using JMP.
Lab Procedures
In the U.S., you are supposed to be tried by a jury of your peers. Does this really happen in
practice? A study in the UCLA Law Review (1973) of grand juries in Alameda County,
California, compared the demographic characteristics of a random sample of jurors with the
general population. Below are the data for age and educational level. Only persons 21 and over
are considered; the population data are known from the Public Health Department.
Age
County-wide %
# of jurors
---------------------------------------------21-40
42
5
41-50
23
9
51-60
16
19
> 61
19
33
---------------------------------------------Total
100
66
Educational Level
County-wide %
# of jurors
------------------------------------------------------Elementary
28.4
1
Secondary
48.5
10
Some college
11.9
16
College degree
11.2
35
------------------------------------------------------Total
100.0
62
Questions:
1) Test whether the juries appear to be randomly selected with respect to the distribution of
age. Report the sample and population percentages in each age group, the value of the chisquared test statistic and its degrees of freedom, the p-value, and your conclusion.
To perform a chi-squared goodness of fit test in JMP:
1. Enter the data in two columns, the first containing the age categories and the second
containing the counts in those categories (not the percentages). Make the column of
labels a character variable and the column of counts a continuous variable.
2. Select Analyze-Distribution. Enter the name of the column with labels as the Y variable,
and enter the column of counts as the Freq variable. This tells JMP that the column of
counts is the frequency of each category. Hit OK to get the sample percentages in each
age category.
3. On the red arrow next to the variable name, select Test Probabilities. Enter in the
probabilities from the null hypothesis where indicated, and select Done. The output for
the chi-squared goodness of fit test is in the row labeled "Pearson." The first entry is the
value of the chi-squared test statistic; the second entry is the degrees of freedom (number
of categories - 1); and the last entry is the p-value from the appropriate chi-squared
distribution.
2) Perform the chi-squared goodness of fit test for the education data by hand. That is, show in
your report the null hypothesis, the four pieces of the chi-squared test statistic including all values
of (observed - expected)2/expected, the degrees of freedom, the p-value, and your
conclusions. You can use JMP to check your answer, but all the by hand work must appear to get
full credit.
Unit 2: Independence tests
Do people's opinions of their appearance change with age?
In a survey reported
in Newsweek magazine (Spring/Summer 1999), 747 randomly selected women were asked, "How
satisfied are you with your overall appearance?" The numbers of women who chose each of four
answers are shown in the table below.
Age
Very
Somewhat
Not Too
Not At All
----------------------------------------------------------Under 30
45
82
10
4
30 - 49
73
168
47
6
Over 50
106
153
41
12
-----------------------------------------------------------
Questions:
3) Test the null hypothesis that women's satisfaction with their appearance is not associated with
age. Report the sample percentages in each age group, the value of the chi-squared test statistic
and its degrees of freedom, the p-value, and your conclusion.
To perform a chi-squared independence test in JMP,
1. Enter the data in three columns, the first containing the age labels, the second containing
the satisfaction labels, and the third containing the counts in those categories (not the
percentages). You should have 12 rows total in the dataset. There should be one row
corresponding to each cell (e.g., the first row has "under 30" in the age column, "very" in
the satisfaction column, and 45 in the count column.) Make the columns of labels
character variables and the column of counts a continuous variable.
2. Select Analyze-Fit Y by X. Enter the name of the column labels (satisfaction) as
the Y variable, the row labels (age) as the X variable, and the column of counts as
the Freq variable. Hit OK to get a contingency table of percentages in each category.
3. The output from the chi-squared test of independence is in the row labeled
"Pearson." The first entry is the value of the chi-square test statistic; the second entry is
the degrees of freedom; and the last entry is the p-value from the appropriate chi-squared
distribution.
In the contingency table, there are three probabilities below each count. The top one in each cell
is the percentage of units in the entire data set that fall in the cell of the table. The middle one in
each cell is the percentage of units in the row, given that they are in the column. The bottom one
in each cell is the percentage of units in the column, given that they are in the row. You can see
the expected count in each cell by clicking on the red arrow next to Contingency Table, and
selecting Expected.
4) Assuming the null hypothesis is true, obtain by hand the expected number of women under
age 30 in a random sample of 747 women who would be very satisfied with their
appearance. Show exactly what you multiplied together to obtain the expected count.
Download