BIO105 Chi-Squared Analysis of Fruit Fly Data

advertisement
BIO105
Chi-Squared Analysis of Fruit Fly Data
Introduction: Based on your understanding of how the red and white-eyed alleles are transmitted
to offspring, you should be able to predict the phenotypic ratios that will be present in the progeny
flies. However, even if your predictions are correct, you may observe slightly different results than
expected, simply due to chance effects. For example, if you flip an evenly-balanced coin 100 times,
the predicted result is 50 heads and 50 tails, but you may well get 49 heads and 51 tails, simply due
to chance. This result does not indicate that the coin was not evenly balanced. We use a statistical
test - called the Chi-square test - to compare observed data with data that we would expect if our
hypothesis was true.
How much can your observed results vary from your expected results without causing you
to reject your hypothesis? Scientists have agreed that they will use the following standard: if you
expect to see as much or more variance from the predicted results more than 5% of the times
that you carry out the experiment, simply due to chance effects, you do not need to reject your
hypothesis.
The chi-square test is always testing what scientists call the null hypothesis (H0 ) which
states that there is no significant difference between the expected and the observed result. The
formula for calculating the chi-square is the sum of the squared difference between the observed(o)
and expected (e) data, divided by the expected data in all possible categories.
The chi-squared value ( χ 2 ) is calculated using the following equation (compute to 3 significant
figures)
(observed - expected)
2
expected
So you obtain some number known as the chi-square statistic...but what does it mean????
In order to determine how good our data fits to our expected results - we first need to determine the
degrees of freedom (df). Degrees of freedom (df), refers to the number of phenotypic classes
minus 1. Then, we determine a relative standard to serve as the basis for rejecting the hypothesis.
The relative standard we typically use in biology is p<0.05. The p value is the probability of
rejecting the null hypothesis (H0 ) when the null hypothesis is true. When we chose p<0.05, we state
that there is less than 5% chance of error in stating there is a difference, when there is no significant
difference
Procedure:
Step 1: State the hypothesis being tested and determine what your expected phenotypic ratios (and
corresponding numbers) are.
Step 2: gather your data.
Step 3: Determine the expected numbers (not ratios) for each observational class based on the total
amount of data you collected (i.e. how many red-eyed males, while eyed males, etc would
you expect in the 93 flies you counted).
Step 4: As outlined above, you will then compare your expected results with your observed results.
The difference between your observed results and your expected results will then be used to
calculate a chi-squared statistic. Compute your chi-squared statistic to 3 significant figures.
Step 5: Use the Chi-square distribution table to determine the significance of the value.
a. Determine the degrees of freedom (one less than the number of categories), and
locate on chart.
b. Locate the chi-squared value for your significance level (p=0.05 or less)
c. Compare this Chi-squared value from the table with the one that you calculated
in Step 4.
1
Step 6:. State your conclusions in terms of your hypothesis.
a.) If your calculated χ 2 value is greater than the χ 2 value for your particular degrees of
freedom and p value (0.05), then you reject your null hypothesis (i.e. you reject your null
hypothesis of no difference between expected and observed results. Thus, you conclude that
the observed numbers are significantly different than what was expected based on the
theoretical expected distribution. These differences cannot be due to chance events alone.
b.) If your, calculated χ 2 value is less than the χ 2 value for your particular degrees of
freedom and p value (0.05), then you fail to reject your null hypothesis. You conclude that
the deviations you observed were due to chance events or sampling error. Although it is
easier to say accept vs. "fail to reject," this is the lingo of statisticians
Chi-squared Table of p values
Where p = Frequency of Chance Occurrence
no reason to doubt hypothesis
Degrees of 70%
50%
30%
20%
Freedom
1
0.148
0.455
1.074
1.642
2
0.713
1.386
2.408
3.219
3
1.424
2.366
3.665
4.642
10%
2.706
4.605
6.251
X reason to doubt hyp.
5%
1%
(Signif)
3.841
6.635
5.991
9.210
7.815
11.341
Graph of distribution of Chi-squared values Y axis is fraction of times that you would expect to observe that chi-squared value
(for example, if you carried out your crosses 1000 times, and calculated a chi-squared value
for each of the separate 1000 experiments, how often you would see a particular value)
X axis is chi-squared values, increasing from left to right
Area under the curve = probability of observing that much variation due to chance alone, if your
hypothesis is correct. The darkened area represents 5% of the total area under the curve. If your
chi-squared value is so high that it is in the darkened area, you need to reject your hypothesis that
your observed variation is due to chance alone.
2
Calculations for cross between parental white male and parental red female
Predicted phenotypic ratios for F1: ________________________
Total number of F1 flies counted: ___________________________
Expected # of F1 red males= 1/2 * total #= ____________ Actually observed _________
Expected # of F1 w red females = 1/2 * total # = _________Actually observed _________
2
For red males, what is the value of (observed - expected) ?
expected
2
For red females, what is the value of (observed - expected) ?
expected
______________
______________
Now, add up the two numbers that you have calculated for your variance. The total is your chisquared test statistic.
Chi-squared test statistic for F1 cross 1: ________________________
Now you need to compare your chi-squared test statistic to the table on p.2 of this handout, to
determine how often you will see as much variance as you observed, simply due to chance alone.
Note that on this table there are three rows, which report different values based on different degrees
of freedom. For our purposes, degrees of freedom is equal to the number of phenotypic classes
minus one. For this cross, there are two phenotypic classes (red male and red female), so we have
one degree of freedom.
P-value for F1 of cross one: __________________________
In your own words, state what this p-value indicates:
3
Predicted phenotypic ratios for F2 : ___________________________________
Total number of F2 flies counted: ___________________________
Expected # of F2 red males= 1/4 * total #= ____________ Actually observed _________
Expected # of F2 white males = 1/4 * total # = _________ Actually observed _________
Expected # of F2 red females = 1/2 * total # = _________
Actually observed _________
2
For red males, what is the value of (observed - expected) ?
expected
______________
2
For white males, what is the value of (observed - expected) ?
expected
2
For red females, what is the value of (observed - expected) ?
expected
______________
______________
Now, add up the three numbers that you have calculated for your variance. The total is your chisquared test statistic.
Chi-squared test statistic for F2 of cross 1: ________________________
Now you need to compare your chi-squared test statistic to the table on p. 2 of this handout, to
determine how often you will see as much variance as you observed, simply due to chance alone.
Note that on this table there are three rows, which report different values based on different degrees
of freedom. For our purposes, degrees of freedom is equal to the number of phenotypic classes
minus one. For this cross, there are three phenotypic classes (red male, white male, red female), so
we have two degrees of freedom.
P-value for F2 of cross one: __________________________
In your own words, state what this p-value indicates:
4
Calculations for cross between parental red male and parental white female
Predicted phenotypic ratios for F1 : ________________________________
Total number of F1 flies counted: ___________________________
Expected # of F1 red females= 1/2 * total #= ____________Actually observed _________
Expected # of F1 white males = 1/2 * total # = _________ Actually observed _________
2
For red females, what is the value of (observed - expected) ?
expected
______________
2
For white males, what is the value of (observed - expected) ?
expected
______________
Now, add up the two numbers that you have calculated for your variance. The total is your chisquared test statistic.
Chi-squared test statistic for F1 of cross 2: ________________________
For this cross, there are four phenotypic classes (red male, white male, red female, white female), so
we have three degrees of freedom.
P-value for F1 of cross two: __________________________
In your own words, state what this p-value indicates:
5
Predicted phenotypic ratios for F2 : ________________________________
Total number of F2 flies counted: ___________________________
Expected # of F2 red males= 1/4 * total #= ____________ Actually observed _________
Expected # of F2 white males = 1/4 * total # = _________ Actually observed _________
Expected # of F2 red females = 1/4 * total # = _________
Actually observed _________
Expected # of F2 white females = 1/4 * total # = _______ Actually observed _________
2
For red males, what is the value of (observed - expected) ?
______________
expected
2
For white males, what is the value of (observed - expected) ?
expected
2
For red females, what is the value of (observed - expected) ?
expected
______________
______________
2
For white females, what is the value of (observed - expected) ?
expected
______________
Now, add up the four numbers that you have calculated for your variance. The total is your chisquared test statistic.
Chi-squared test statistic for F2 of cross 2: ________________________
For this cross, there are four phenotypic classes (red male, white male, red female, white female), so
we have three degrees of freedom.
P-value for F2 of cross two: __________________________
In your own words, state what this p-value indicates:
6
Bio 105 Lab Reports
Objective: The two main objectives of this lab report are to: a) acquaint you with the specific
sections that make up a formal scientific report, in particular what is and is not contained in each
section; and b) give you the chance to have your scientific writing critiqued and corrected. In light
of the first objective, please pay particular attention to the detailed description of the components of
the lab report that can be found below.
Format 1. See pp. 4-6 in your lab manual for information on each required sections.
2. All reports must be typed (12 pt Times font preferred)
3. Use double line spacing to facilitate grading
4. Margins - 1" on all sides
5. Paragraphs should be indented 1 tab, with no extra spacing between paragraphs.
6. Body of text should be left justified.
7. All pages (except title page) should be numbered consecutively.
8. Include Boldface (Times 14 font), page-centered headers for new sections (i.e Abstract)
9. Title and Abstract should be alone on its own page.
11. Include bold subheadings when appropriate in methods, result and discussion sections.
(see journals for examples).
12. All figures and graphs should be prepared on the computer; each must contain title, labels, and
caption. Colored inks are fine for figures. Hand-drawn graphs are acceptable providing they are
neat and legible. DO NOT PREPARE GRAPHS OF
% TRANSMITTANCE VS.
INDEPENDENT VARIABLE. All %T must be
converted to absorbance (A) prior to
graphing.
General advice - Outlines are the most effective way to organize the information you will be
presenting. I strongly suggest producing a detailed outline as your first step; this will make the
actual writing easier since you have a framework from which to work. It will also produce a
sequential, ordered report. When you begin writing, do not concentrate too much on economic word
use, sentence variety, punctuation, organization, etc. The initial writing effort should be more freeflowing - get your thoughts out on paper in a brainstorming fashion. Once the information is in
front of you, it will be much easier to form it into a draft, and then into a coherent written work.
A Title should briefly but descriptively identify the content of the article. 'Experiments with Plants'
is a vague and meaningless title, unless it is the title of a comprehensive lab manual addressing
botany. Compare that title with 'The inheritance of flower and seed characteristics in Pia sativum'.
An abstract should contain the following elements: 1-2 sentences on the big picture importance
of your experimental subject, 1-3 sentences describing exactly (i.e. no generalities) what was
investigated in your study, ~ 2-3 sentences on specific results, 1-2+ sentences on how your results
agree with predicted results, and a sentence describing rthe big picture significance of your results.
An introduction should contain sufficient background necessary to interpret the study. This
WILL require doing outside reading, obtaining sources from the library, and CITING all factual
information obtained in a standard format. Do not include elaborate descriptions of what your
study entails. The last paragraph of the introduction may explain what you are doing in this study
(i.e. In this study we are investigating the activity of potato-derived catecholase as it is affected by
heat, organic and inorganic inhibitors...etc...
The Methods section should only contain information on WHAT WAS DONE in this study. It
should not contain interpretation of methods, or any information on WHY you did what you did.
Wording should be succinct and should provide everything necessary to interpret the results. For
7
example do not tell me about numbering and labeling test tubes - this is not essential for data
interpretation. You can cite material from your lab manual when appropriate- {i.e. The potato extract
containing the catecholase enzyme was prepared as previously described (Biol. 105 staff, 2000).
However, you need to include information such as tube contents, wavelength for spec readings etc...
because all of this is essential for interpreting data. Tube contents can be arranged in tables as was
done in the lab manual. Note that each table should have a title, and each figure should have a title
and a caption. I suggest breaking up the methods section into subsections for each experiment;
label each subsection with a bold-face title to allow the reader to easily navigate between sections.
The Results sections should only include results from the experiments (not interpretation of those
results). Results are typically arranged in tables and graphs with a minimum of text. A narrative
description of the results is nonetheless required, even if it is mildly repetitive of the
information in figure captions. All figures, tables and graphs should include a title and a number
that you can use when discussing the data title. Figures should also contain a caption that
explains what is going on in the figure (see lab manual intro). In the figure caption, refer the
reader to the appropriate section where information on methods can be found. No interpretation of
results should appear in this section of the report......simply state the results. I suggest breaking up
the results section into subsections for each experiment; label each subsection with a boldface title
to allow the reader to easily navigate between sections. Make sure that you label your axes on
graphs.
Regarding figures, the goal is to produce a figure that stands alone--that is, that a reader
could fully interpret without referring to or reading the text. Two major things that help accomplish
this are:
a) descriptive labels of lines on graph (i.e., labels that describe treatments, such as '50 °C' or '10
drops unboiled enzyme' as opposed to 'Tube 2')
b) caption needs to describe the experiment
The Discussion section is the most important part of the lab report. It is the most intellectually
challenging part of the report. This is the place where you interpret the data from each experiment.
Although it is fine to also break up the discussion into subheadings, make sure you also compare
data from different experiments when appropriate - i.e. it may be useful to directly compare the
results from section 1 and 2 together. Make sure that when you are discussing data, you refer the
reader to the appropriate figure/table containing the data being discussed ( i.e. Although we saw that
enzyme activity increased with increasing temperature (Fig. 1), there were some interesting
anomalies that need be discussed.) In the Discussion, be sure to address (1) the
meaning/interpretations of the results....i.e. if substrate concentration results in increasing reaction
rate - what does that mean?- will you always expect that to be the case? what does this finding tell
you?, (2) whether the results agree with expectations, and if not, why?, (3) if the results are
ambiguous, what could you do to modify the experiment or test new predictions?, (4) if the results
do not agree with predictions - Why not?, What could've gone wrong? How would you fix it? It is
in the discussion section where you show me that you understood the point of the experiment. The
easiest way to organize this is:
- state what the expected results were, and why those results were expected
- state what actually happened
- state how the actual results differed from the expected
- state what the possible causes of the discrepancy are
- state what could be done to modify the experiment if you were to repeat it, and why
The Conclusion should restate the major findings in a few succinct sentences.....i.e. We
found that enzyme activity was increased by...... etc.)
8
Basic Guidelines
- Use exact, simple sentences. Use words with precision, clarity and economy.
- Read every sentence over and make sure that it has something to say and that it says what
you mean. When you finish your first draft, study each sentence to see if you can shorten
or even omit it. The final test (to be applied constantly, sentence by sentence) is whether the
meaning is simply and clearly stated.
- Use the first person, active voice whenever possible; avoid the passive voice.
Common problems
(that you are now forewarned about, and you will have points deducted for)
1. Do not use direct quotes. Direct quotes are almost never used in scientific writing, except
when they are being used to indict someone's mistaken ideas, or when someone has said something
so incredibly eloquently and with such great insight that it simply cannot be re-phrased. Neither of
these is appropriate for your report. Don't do it.
2. Make sure that subjects and verbs are in agreement.
3. Make sure that pronouns have a clear referent.
4. Make sure that your sentence structures follow the rules of grammar. Be particularly on the
guard for awkward sentence constructions, sentence fragments, and comma splices.
5. Don't use the future tense in the introduction or abstract "we will..." --You already did it by the
time you are writing the report. Use past tense for things that you did.
6. You are not writing a lab manual, and the format for reports is different than the format for lab
manuals. Don't write the methods in 'lab manual" style.
7. The discussion should not be a restatement of results. It should interpret the results, and
attempt to mechanistically and physically explain the underlying basis of the results. It is not
sufficient to say " We hypothesized that at higher temperatures the reaction rate would decrease."
Why? what would physically occur at higher temperatures and how would this
mechanistically affect the reaction rate?
8. It is not sufficient to say "Human error probably affected the results." This is meaningless unless
you go into detail about the specific errors that may have taken place. You should point out specific
things that could be done differently, that may have contributed to a discrepancy between your
expected and actual results.
9. Results might 'support' a particular hypothesis, or 'be consistent with' a model. Results don't
conclude things or infer things.
10. Know the difference between affect (verb) and effect (noun)
11. Data is the plural; datum is the singular.
12. Allele, gene, phenotype and genotype all have distinct meanings--know the meanings and do not
use these words interchangeably
13. Use parallel construction.
Here is an example of a sentence that does not convey its intended meaning:
The purpose of this experiment was to determine the phenotypic ratios of the red eye allele, versus
that of the white eyed allele, based on an understanding of how they are transmitted to offspring.
Read it a few times. For starters, the word 'allele' is misused, the word 'they' has an unclear referent,
and it's unclear why it is red EYE but white EYED. More fundamentally, though, the sentence says
that we determined phenotypic ratios based on an understanding of how traits are transmitted--that
would be true if we made predictions using Punnet squares, but it isn't true of this experiment. We
experimentally determined phenotypic ratios by breeding flies of known genotype and observing
the distribution of phenotypes in the offspring.
9
Name: ___________________________________________ Grade ___________/30_
Please attach this sheet to your report. Reports are due by 3 pm on 11/21/03.
Bio105 Principles of Biology
Drosophila Inheritance Experiments
Grading Sheet
Reports should contain: Title, Abstract (0.5 - 0.75 page), Discussion (1.5 pages), and
data tables (P, F1, F2, chi-squared). See handout for format.
Content
Title: Was the title brief but descriptive?
Yes
No
Did the Abstract:
Have an introductory sentence or two, explaining why the work was done?
Explain the experiment that was performed?
Yes
Content: Did the Discussion:
Describe the expected results?
No
No
Describe what the results of the experiment were?
State the significance of the results?
Yes
Yes
Yes
No
No
Yes No
Describe how the actual results varied from the expected?
Yes No
Discuss the causes of the difference between expected and actual results? Yes No
Correctly interpret the chi-squared value?
Yes No
Discuss the possible significance of the results - i.e the mode of inheritance? Yes No
Suggest ways that difficulties might be overcome if experiment was repeated? Yes No
Content: Did the Data Tables contain all the necessary information?
Style: Misspellings
0
1
2
3+
Awkward sentence constructions
0
1
Run-on sentences & Sentence fragments
2
0
3+
1
2
Subject/verb agreement, pronouns with unclear referent
Misuse of technical terms
0
1
2
Other _________________________0
3+
1
Are the Data Tables neat and easy to interpret?
10
Yes No
3+
0
allele
2
gene
3+
Yes
1
No
2
3+
phenotype
Download