BIO105 Chi-Squared Analysis of Fruit Fly Data Introduction: Based on your understanding of how the red and white-eyed alleles are transmitted to offspring, you should be able to predict the phenotypic ratios that will be present in the progeny flies. However, even if your predictions are correct, you may observe slightly different results than expected, simply due to chance effects. For example, if you flip an evenly-balanced coin 100 times, the predicted result is 50 heads and 50 tails, but you may well get 49 heads and 51 tails, simply due to chance. This result does not indicate that the coin was not evenly balanced. We use a statistical test - called the Chi-square test - to compare observed data with data that we would expect if our hypothesis was true. How much can your observed results vary from your expected results without causing you to reject your hypothesis? Scientists have agreed that they will use the following standard: if you expect to see as much or more variance from the predicted results more than 5% of the times that you carry out the experiment, simply due to chance effects, you do not need to reject your hypothesis. The chi-square test is always testing what scientists call the null hypothesis (H0 ) which states that there is no significant difference between the expected and the observed result. The formula for calculating the chi-square is the sum of the squared difference between the observed(o) and expected (e) data, divided by the expected data in all possible categories. The chi-squared value ( χ 2 ) is calculated using the following equation (compute to 3 significant figures) (observed - expected) 2 expected So you obtain some number known as the chi-square statistic...but what does it mean???? In order to determine how good our data fits to our expected results - we first need to determine the degrees of freedom (df). Degrees of freedom (df), refers to the number of phenotypic classes minus 1. Then, we determine a relative standard to serve as the basis for rejecting the hypothesis. The relative standard we typically use in biology is p<0.05. The p value is the probability of rejecting the null hypothesis (H0 ) when the null hypothesis is true. When we chose p<0.05, we state that there is less than 5% chance of error in stating there is a difference, when there is no significant difference Procedure: Step 1: State the hypothesis being tested and determine what your expected phenotypic ratios (and corresponding numbers) are. Step 2: gather your data. Step 3: Determine the expected numbers (not ratios) for each observational class based on the total amount of data you collected (i.e. how many red-eyed males, while eyed males, etc would you expect in the 93 flies you counted). Step 4: As outlined above, you will then compare your expected results with your observed results. The difference between your observed results and your expected results will then be used to calculate a chi-squared statistic. Compute your chi-squared statistic to 3 significant figures. Step 5: Use the Chi-square distribution table to determine the significance of the value. a. Determine the degrees of freedom (one less than the number of categories), and locate on chart. b. Locate the chi-squared value for your significance level (p=0.05 or less) c. Compare this Chi-squared value from the table with the one that you calculated in Step 4. 1 Step 6:. State your conclusions in terms of your hypothesis. a.) If your calculated χ 2 value is greater than the χ 2 value for your particular degrees of freedom and p value (0.05), then you reject your null hypothesis (i.e. you reject your null hypothesis of no difference between expected and observed results. Thus, you conclude that the observed numbers are significantly different than what was expected based on the theoretical expected distribution. These differences cannot be due to chance events alone. b.) If your, calculated χ 2 value is less than the χ 2 value for your particular degrees of freedom and p value (0.05), then you fail to reject your null hypothesis. You conclude that the deviations you observed were due to chance events or sampling error. Although it is easier to say accept vs. "fail to reject," this is the lingo of statisticians Chi-squared Table of p values Where p = Frequency of Chance Occurrence no reason to doubt hypothesis Degrees of 70% 50% 30% 20% Freedom 1 0.148 0.455 1.074 1.642 2 0.713 1.386 2.408 3.219 3 1.424 2.366 3.665 4.642 10% 2.706 4.605 6.251 X reason to doubt hyp. 5% 1% (Signif) 3.841 6.635 5.991 9.210 7.815 11.341 Graph of distribution of Chi-squared values Y axis is fraction of times that you would expect to observe that chi-squared value (for example, if you carried out your crosses 1000 times, and calculated a chi-squared value for each of the separate 1000 experiments, how often you would see a particular value) X axis is chi-squared values, increasing from left to right Area under the curve = probability of observing that much variation due to chance alone, if your hypothesis is correct. The darkened area represents 5% of the total area under the curve. If your chi-squared value is so high that it is in the darkened area, you need to reject your hypothesis that your observed variation is due to chance alone. 2 Calculations for cross between parental white male and parental red female Predicted phenotypic ratios for F1: ________________________ Total number of F1 flies counted: ___________________________ Expected # of F1 red males= 1/2 * total #= ____________ Actually observed _________ Expected # of F1 w red females = 1/2 * total # = _________Actually observed _________ 2 For red males, what is the value of (observed - expected) ? expected 2 For red females, what is the value of (observed - expected) ? expected ______________ ______________ Now, add up the two numbers that you have calculated for your variance. The total is your chisquared test statistic. Chi-squared test statistic for F1 cross 1: ________________________ Now you need to compare your chi-squared test statistic to the table on p.2 of this handout, to determine how often you will see as much variance as you observed, simply due to chance alone. Note that on this table there are three rows, which report different values based on different degrees of freedom. For our purposes, degrees of freedom is equal to the number of phenotypic classes minus one. For this cross, there are two phenotypic classes (red male and red female), so we have one degree of freedom. P-value for F1 of cross one: __________________________ In your own words, state what this p-value indicates: 3 Predicted phenotypic ratios for F2 : ___________________________________ Total number of F2 flies counted: ___________________________ Expected # of F2 red males= 1/4 * total #= ____________ Actually observed _________ Expected # of F2 white males = 1/4 * total # = _________ Actually observed _________ Expected # of F2 red females = 1/2 * total # = _________ Actually observed _________ 2 For red males, what is the value of (observed - expected) ? expected ______________ 2 For white males, what is the value of (observed - expected) ? expected 2 For red females, what is the value of (observed - expected) ? expected ______________ ______________ Now, add up the three numbers that you have calculated for your variance. The total is your chisquared test statistic. Chi-squared test statistic for F2 of cross 1: ________________________ Now you need to compare your chi-squared test statistic to the table on p. 2 of this handout, to determine how often you will see as much variance as you observed, simply due to chance alone. Note that on this table there are three rows, which report different values based on different degrees of freedom. For our purposes, degrees of freedom is equal to the number of phenotypic classes minus one. For this cross, there are three phenotypic classes (red male, white male, red female), so we have two degrees of freedom. P-value for F2 of cross one: __________________________ In your own words, state what this p-value indicates: 4 Calculations for cross between parental red male and parental white female Predicted phenotypic ratios for F1 : ________________________________ Total number of F1 flies counted: ___________________________ Expected # of F1 red females= 1/2 * total #= ____________Actually observed _________ Expected # of F1 white males = 1/2 * total # = _________ Actually observed _________ 2 For red females, what is the value of (observed - expected) ? expected ______________ 2 For white males, what is the value of (observed - expected) ? expected ______________ Now, add up the two numbers that you have calculated for your variance. The total is your chisquared test statistic. Chi-squared test statistic for F1 of cross 2: ________________________ For this cross, there are four phenotypic classes (red male, white male, red female, white female), so we have three degrees of freedom. P-value for F1 of cross two: __________________________ In your own words, state what this p-value indicates: 5 Predicted phenotypic ratios for F2 : ________________________________ Total number of F2 flies counted: ___________________________ Expected # of F2 red males= 1/4 * total #= ____________ Actually observed _________ Expected # of F2 white males = 1/4 * total # = _________ Actually observed _________ Expected # of F2 red females = 1/4 * total # = _________ Actually observed _________ Expected # of F2 white females = 1/4 * total # = _______ Actually observed _________ 2 For red males, what is the value of (observed - expected) ? ______________ expected 2 For white males, what is the value of (observed - expected) ? expected 2 For red females, what is the value of (observed - expected) ? expected ______________ ______________ 2 For white females, what is the value of (observed - expected) ? expected ______________ Now, add up the four numbers that you have calculated for your variance. The total is your chisquared test statistic. Chi-squared test statistic for F2 of cross 2: ________________________ For this cross, there are four phenotypic classes (red male, white male, red female, white female), so we have three degrees of freedom. P-value for F2 of cross two: __________________________ In your own words, state what this p-value indicates: 6 Bio 105 Lab Reports Objective: The two main objectives of this lab report are to: a) acquaint you with the specific sections that make up a formal scientific report, in particular what is and is not contained in each section; and b) give you the chance to have your scientific writing critiqued and corrected. In light of the first objective, please pay particular attention to the detailed description of the components of the lab report that can be found below. Format 1. See pp. 4-6 in your lab manual for information on each required sections. 2. All reports must be typed (12 pt Times font preferred) 3. Use double line spacing to facilitate grading 4. Margins - 1" on all sides 5. Paragraphs should be indented 1 tab, with no extra spacing between paragraphs. 6. Body of text should be left justified. 7. All pages (except title page) should be numbered consecutively. 8. Include Boldface (Times 14 font), page-centered headers for new sections (i.e Abstract) 9. Title and Abstract should be alone on its own page. 11. Include bold subheadings when appropriate in methods, result and discussion sections. (see journals for examples). 12. All figures and graphs should be prepared on the computer; each must contain title, labels, and caption. Colored inks are fine for figures. Hand-drawn graphs are acceptable providing they are neat and legible. DO NOT PREPARE GRAPHS OF % TRANSMITTANCE VS. INDEPENDENT VARIABLE. All %T must be converted to absorbance (A) prior to graphing. General advice - Outlines are the most effective way to organize the information you will be presenting. I strongly suggest producing a detailed outline as your first step; this will make the actual writing easier since you have a framework from which to work. It will also produce a sequential, ordered report. When you begin writing, do not concentrate too much on economic word use, sentence variety, punctuation, organization, etc. The initial writing effort should be more freeflowing - get your thoughts out on paper in a brainstorming fashion. Once the information is in front of you, it will be much easier to form it into a draft, and then into a coherent written work. A Title should briefly but descriptively identify the content of the article. 'Experiments with Plants' is a vague and meaningless title, unless it is the title of a comprehensive lab manual addressing botany. Compare that title with 'The inheritance of flower and seed characteristics in Pia sativum'. An abstract should contain the following elements: 1-2 sentences on the big picture importance of your experimental subject, 1-3 sentences describing exactly (i.e. no generalities) what was investigated in your study, ~ 2-3 sentences on specific results, 1-2+ sentences on how your results agree with predicted results, and a sentence describing rthe big picture significance of your results. An introduction should contain sufficient background necessary to interpret the study. This WILL require doing outside reading, obtaining sources from the library, and CITING all factual information obtained in a standard format. Do not include elaborate descriptions of what your study entails. The last paragraph of the introduction may explain what you are doing in this study (i.e. In this study we are investigating the activity of potato-derived catecholase as it is affected by heat, organic and inorganic inhibitors...etc... The Methods section should only contain information on WHAT WAS DONE in this study. It should not contain interpretation of methods, or any information on WHY you did what you did. Wording should be succinct and should provide everything necessary to interpret the results. For 7 example do not tell me about numbering and labeling test tubes - this is not essential for data interpretation. You can cite material from your lab manual when appropriate- {i.e. The potato extract containing the catecholase enzyme was prepared as previously described (Biol. 105 staff, 2000). However, you need to include information such as tube contents, wavelength for spec readings etc... because all of this is essential for interpreting data. Tube contents can be arranged in tables as was done in the lab manual. Note that each table should have a title, and each figure should have a title and a caption. I suggest breaking up the methods section into subsections for each experiment; label each subsection with a bold-face title to allow the reader to easily navigate between sections. The Results sections should only include results from the experiments (not interpretation of those results). Results are typically arranged in tables and graphs with a minimum of text. A narrative description of the results is nonetheless required, even if it is mildly repetitive of the information in figure captions. All figures, tables and graphs should include a title and a number that you can use when discussing the data title. Figures should also contain a caption that explains what is going on in the figure (see lab manual intro). In the figure caption, refer the reader to the appropriate section where information on methods can be found. No interpretation of results should appear in this section of the report......simply state the results. I suggest breaking up the results section into subsections for each experiment; label each subsection with a boldface title to allow the reader to easily navigate between sections. Make sure that you label your axes on graphs. Regarding figures, the goal is to produce a figure that stands alone--that is, that a reader could fully interpret without referring to or reading the text. Two major things that help accomplish this are: a) descriptive labels of lines on graph (i.e., labels that describe treatments, such as '50 °C' or '10 drops unboiled enzyme' as opposed to 'Tube 2') b) caption needs to describe the experiment The Discussion section is the most important part of the lab report. It is the most intellectually challenging part of the report. This is the place where you interpret the data from each experiment. Although it is fine to also break up the discussion into subheadings, make sure you also compare data from different experiments when appropriate - i.e. it may be useful to directly compare the results from section 1 and 2 together. Make sure that when you are discussing data, you refer the reader to the appropriate figure/table containing the data being discussed ( i.e. Although we saw that enzyme activity increased with increasing temperature (Fig. 1), there were some interesting anomalies that need be discussed.) In the Discussion, be sure to address (1) the meaning/interpretations of the results....i.e. if substrate concentration results in increasing reaction rate - what does that mean?- will you always expect that to be the case? what does this finding tell you?, (2) whether the results agree with expectations, and if not, why?, (3) if the results are ambiguous, what could you do to modify the experiment or test new predictions?, (4) if the results do not agree with predictions - Why not?, What could've gone wrong? How would you fix it? It is in the discussion section where you show me that you understood the point of the experiment. The easiest way to organize this is: - state what the expected results were, and why those results were expected - state what actually happened - state how the actual results differed from the expected - state what the possible causes of the discrepancy are - state what could be done to modify the experiment if you were to repeat it, and why The Conclusion should restate the major findings in a few succinct sentences.....i.e. We found that enzyme activity was increased by...... etc.) 8 Basic Guidelines - Use exact, simple sentences. Use words with precision, clarity and economy. - Read every sentence over and make sure that it has something to say and that it says what you mean. When you finish your first draft, study each sentence to see if you can shorten or even omit it. The final test (to be applied constantly, sentence by sentence) is whether the meaning is simply and clearly stated. - Use the first person, active voice whenever possible; avoid the passive voice. Common problems (that you are now forewarned about, and you will have points deducted for) 1. Do not use direct quotes. Direct quotes are almost never used in scientific writing, except when they are being used to indict someone's mistaken ideas, or when someone has said something so incredibly eloquently and with such great insight that it simply cannot be re-phrased. Neither of these is appropriate for your report. Don't do it. 2. Make sure that subjects and verbs are in agreement. 3. Make sure that pronouns have a clear referent. 4. Make sure that your sentence structures follow the rules of grammar. Be particularly on the guard for awkward sentence constructions, sentence fragments, and comma splices. 5. Don't use the future tense in the introduction or abstract "we will..." --You already did it by the time you are writing the report. Use past tense for things that you did. 6. You are not writing a lab manual, and the format for reports is different than the format for lab manuals. Don't write the methods in 'lab manual" style. 7. The discussion should not be a restatement of results. It should interpret the results, and attempt to mechanistically and physically explain the underlying basis of the results. It is not sufficient to say " We hypothesized that at higher temperatures the reaction rate would decrease." Why? what would physically occur at higher temperatures and how would this mechanistically affect the reaction rate? 8. It is not sufficient to say "Human error probably affected the results." This is meaningless unless you go into detail about the specific errors that may have taken place. You should point out specific things that could be done differently, that may have contributed to a discrepancy between your expected and actual results. 9. Results might 'support' a particular hypothesis, or 'be consistent with' a model. Results don't conclude things or infer things. 10. Know the difference between affect (verb) and effect (noun) 11. Data is the plural; datum is the singular. 12. Allele, gene, phenotype and genotype all have distinct meanings--know the meanings and do not use these words interchangeably 13. Use parallel construction. Here is an example of a sentence that does not convey its intended meaning: The purpose of this experiment was to determine the phenotypic ratios of the red eye allele, versus that of the white eyed allele, based on an understanding of how they are transmitted to offspring. Read it a few times. For starters, the word 'allele' is misused, the word 'they' has an unclear referent, and it's unclear why it is red EYE but white EYED. More fundamentally, though, the sentence says that we determined phenotypic ratios based on an understanding of how traits are transmitted--that would be true if we made predictions using Punnet squares, but it isn't true of this experiment. We experimentally determined phenotypic ratios by breeding flies of known genotype and observing the distribution of phenotypes in the offspring. 9 Name: ___________________________________________ Grade ___________/30_ Please attach this sheet to your report. Reports are due by 3 pm on 11/21/03. Bio105 Principles of Biology Drosophila Inheritance Experiments Grading Sheet Reports should contain: Title, Abstract (0.5 - 0.75 page), Discussion (1.5 pages), and data tables (P, F1, F2, chi-squared). See handout for format. Content Title: Was the title brief but descriptive? Yes No Did the Abstract: Have an introductory sentence or two, explaining why the work was done? Explain the experiment that was performed? Yes Content: Did the Discussion: Describe the expected results? No No Describe what the results of the experiment were? State the significance of the results? Yes Yes Yes No No Yes No Describe how the actual results varied from the expected? Yes No Discuss the causes of the difference between expected and actual results? Yes No Correctly interpret the chi-squared value? Yes No Discuss the possible significance of the results - i.e the mode of inheritance? Yes No Suggest ways that difficulties might be overcome if experiment was repeated? Yes No Content: Did the Data Tables contain all the necessary information? Style: Misspellings 0 1 2 3+ Awkward sentence constructions 0 1 Run-on sentences & Sentence fragments 2 0 3+ 1 2 Subject/verb agreement, pronouns with unclear referent Misuse of technical terms 0 1 2 Other _________________________0 3+ 1 Are the Data Tables neat and easy to interpret? 10 Yes No 3+ 0 allele 2 gene 3+ Yes 1 No 2 3+ phenotype