Chi-Square Analysis

advertisement
Chi-Square Analysis
Mendel’s Peas
and the
Goodness of Fit Test
We will develop the use of the χ2 distribution through an
example from the history of biology.
In Austria in the mid 1800s, an Augustine monk, Gregor
mendel, studied the garden pea and seven of its traits, such a
shape and color of the peas, position of flowers on the plant,
etc.
He is credited with discovering patterns of inheritance, the
basis of the field of genetics.
Curiously, Mendel studied seven traits, one from each of
the pea’s seven chromosomes. His theory of the
independent assortment of genes occurs only when genes
are on different chromosomes.
We will use one of Mendel’s studies, and some of his
original data, to explore the χ2 test of significance.
Consider two different characteristics of peas, color and shape.
The peas may be yellow or green, round or wrinkled.
If we cross a plant with yellow round peas with a plant having
green wrinkled peas, and examine the progeny we will discover
a uniform F1 generation.
The traits yellow and round are each dominant, while green and
wrinkled are recessive. We use the letter Y for color, and R for
pea shape, so the alleles are Y, y, R, and r.
This is a Punnett square to illustrate this dihybrid cross.
Green wrinkled pea
Yellow round pea
Notice the uniformity among the offspring, as all are YyRr.
Now we cross the F1 among themselves to produce the F2:
gametes
YR
Yr
yR
yr
YR
YYRR
YYRr
YyRR
YyRr
Yr
YYRr
YYrr
YyRr
Yyrr
yR
YyRR
YyRr
yyRR
yyRr
yr
YyRr
Yyrr
yyRr
yyrr
Now we identify the yellow round peas:
gametes
YR
Yr
yR
yr
YR
YYRR
YYRr
YyRR
YyRr
Yr
YYRr
YYrr
YyRr
Yyrr
yR
YyRR
YyRr
yyRR
yyRr
yr
YyRr
Yyrr
yyRr
yyrr
Now we identify the yellow wrinkled peas:
gametes
YR
Yr
yR
yr
YR
YYRR
YYRr
YyRR
YyRr
Yr
YYRr
YYrr
YyRr
Yyrr
yR
YyRR
YyRr
yyRR
yyRr
yr
YyRr
Yyrr
yyRr
yyrr
Next we identify the green round peas:
gametes
YR
Yr
yR
yr
YR
YYRR
YYRr
YyRR
YyRr
Yr
YYRr
YYrr
YyRr
Yyrr
yR
YyRR
YyRr
yyRR
yyRr
yr
YyRr
Yyrr
yyRr
yyrr
Finally, the last type of pea is green and wrinkled:
gametes
YR
Yr
yR
yr
YR
YYRR
YYRr
YyRR
YyRr
Yr
YYRr
YYrr
YyRr
Yyrr
yR
YyRR
YyRr
yyRR
yyRr
yr
YyRr
Yyrr
yyRr
yyrr
So now we have four phenotypes (different physical forms) of
peas originating from the single phenotype of the F1 generation.
They are, along with their genotypes and expected frequencies:
YYRR, YYRr,
Yellow round
YyRR, YyRr
Yellow
wrinkled
Green round
Green
wrinkled
9
16
YYrr, Yyrr
3
16
yyRR, yyRr
3
16
yyrr
1
16
If Mendel’s understanding of genetics were correct, and the
crosses made as he believed, the proportions of the four
phenotypes should fit the calculations from the Punnet square.
Using the χ2 distribution, we are able to test to see if groups of
individuals are present in the same proportions as expected.
This is rather like conducting multiple Z-tests for proportions
at once.
In this example Mendel carried out the dihybrid cross to
produce an F1 generation, and as expected, the F1 were all of
the same phenotype, yellow and round.
Further, the F1 were crossed among themselves to produce the
F2 generation. Mendel recorded the numbers of individuals in
each category.
The following table gives the observed numbers of each category.
Phenotype
Observed
Expected
frequency
Yellow round
315
9
16
Yellow
wrinkled
101
3
16
108
3
16
32
1
16
Green round
Green
wrinkled
To make a χ2 test for “goodness of fit” we start as with all other
tests of significance, with a null hypothesis.
Step 1:
H0: The F2 generation is comprised of four
phenotypes in the proportions predicted by
Mendelian genetics.
Ha: The F2 generation is not comprised of four phenotypes in
the proportions predicted by Mendelian genetics.
Another way of saying this is that the null hypothesis claims the
population fits our expected pattern, while the alternate
hypothesis says it does not.
Assumptions: Our first assumption is that our
Step 2:
data are counts. (We cannot use proportions or
means.) With χ2, we do not always have a
sample of a population, and sometimes examine an entire
population, as with this example. When working from a sample
we must ensure that the sample is representative.
In order to check assumptions for this goodness of fit test we
must calculate the expected counts for each category. Then
we must meet two criteria:
1. All expected counts must be one or more.
2. No more than 20% of the counts may be less than 5.
We calculate the expected counts by finding the total number of
observations and multiplying that by each expected frequency.
Phenotype
Observed
counts
Expected
frequency
Expected counts
315
9
16
9
( 556 ) » 312.75
16
Yellow
wrinkled
108
3
16
3
( 556 ) » 104.25
16
Green round
101
3
16
3
( 556 ) » 104.25
16
32
1
16
1
( 556 ) » 34.75
16
Yellow round
Green wrinkled
As you can see, all expected counts are greater than 5, so all
assumptions are met.
Step 3:
The formula for the χ2 test statistic is:
2
(o
e)
c2 = å
e
where o = observed counts, and
e = expected counts
This calculation needs to be made in the graphing calculator.
Enter the observed counts in L1. Enter the expected frequencies
in L2, as exact numbers. (Enter numbers like 1/3, directly, as
fractions, never round to just .3 or .33.)
In L3 multiply L2 by 556. This will give the expected counts.
The sum of L1 can be found using 1-Var Stats.
Now in L4, enter (L1-L3)2/L3, this will give you the χ2
contribution for each category.
Finally, χ2 is the sum of L4.
For this problem, the χ2 statistic is .4700.
In χ2, we always need to know and report the degrees of
freedom. The degrees of freedom are the number of categories
minus one.
Here we have 3 degrees of freedom.
Step 4:
Step 5:
P( c 2 > .4700) = .9254
The area can also be found with
c 2cdf(.4700,10^99,3).
Step 6:
Step 7:
Fail to reject H0, as p = 0.9254 > α = .05.
We lack evidence that the pattern of pea
phenotypes is different from expected. That is, the
F2 generation are present in the expected
proportions, 9:3:3:1.
Gregor Mendel did not have modern statistics to rely on for
his data analysis, but none-the-less analyzed data in a way
that led to this major scientific discovery, important to this
day.
There has been speculation about his studies, or how he
reported them, as the data is almost better than chance
variation would produce.
He was, however, an Augustine monk, so perhaps he
had a little help…
Download