Chapter 3 & Section 2.6 Dates: August 24, 26, 31, 2009 Course outline Statistics is the science of collecting, organizing, and interpreting information, which we call data, with the goal of gaining an understanding from that data. This is NOT a math class. This is a critical thinking class. My goal is to give you some statistics tools and principals that will help you make wise and educated decisions at work and in life. This course is divided into 2 parts: 1. Gathering and working with data (graphing, summarizing, designing studies to gather data). 2. Establishing relationships and drawing conclusions from the data (statistical inference). Gathering of data: How do you get data? First, decide what your population is. The Population is the entire group of individuals/units that we want information about. In a census you attempt to get information from every member of the population, ie, a 100% sample. Populations are usually very large and a 100% sample would be expensive and time consuming, so we must take a sample instead. However the method of sampling is critical to getting good unbiased information. A sample is a part of the population that we actually examine in order to gather information about the whole population. Lecture 1, Chapter 3 & Section 2.6 Page 1 SAMPLING, 3.2 Design of a sample: the method used to choose the sample from the population. There are several types of samples. One that you encounter often is a Voluntary Response Sample. consists of people who choose themselves by responding to a general appeal. biased because people with strong opinions (especially negative opinions) are most likely to respond. examples include radio or call-in shows, American Idol A voluntary response sample is not the best type of sample because the data obtained will probably not be representative of the entire population, and will probably be biased. Random Selection of a Sample: eliminates or minimizes bias by allowing impersonal chance to do the choosing of individuals for the sample. gives all units/individuals in the population an equal chance to be chosen Types of Random Sampling: Simple Random Sample (SRS): consists of n individuals selected from the population in such a way that every sample of n individuals has an equal chance of being selected. This also gives each individual in the population an equal chance of being selected. Stratified Random Sample: is obtained in steps. First divide the population into groups of similar units/individuals, called strata. Then a SRS is selected within each stratum. Then the samples are combined to form the full sample. Multistage Sample: A method in which the sampling is done in stages, selecting successively smaller groups within the population in stages, resulting in a sample consisting of clusters of individuals. Each stage may employ an SRS, a stratified random sample, or another type of sample. This is an effort to make sure you are not under-covering any groups when you choose your sample. The example on page 252 of your textbook shows how an opinion poll uses multistage sampling by first dividing the United States into 2007 geographical areas, selecting a portion of these, then dividing the selected geographical areas into smaller areas, selecting a portion of these from each, and then finally dividing the smaller areas into neighborhoods of Lecture 1, Chapter 3 & Section 2.6 Page 2 four nearby units and randomly selecting the neighborhoods to cover with the opinion poll. Capture-recapture sample: This type of sampling is done to estimate the size of the population in wildlife studies. The “capture” phase refers to capturing, tagging and releasing a certain number of birds for example. The “recapture” phase takes place later when another sample of birds is caught and a count is taken of the number which have tags from the first capture. As an example, suppose 200 birds are captured, tagged and released. The next year another sample of 120 birds are captured and 12 of them have the bands from the previous year. Then we say that the proportion banded in the sample should be an estimator of the proportion banded in the population. So: 12/120 = 200 / N where N = the population. N = 2000 (estimated value) Examples: Which type of sample is used for each of the following scenarios? 1. A study is conducted to find out how many undergraduates at Purdue own cars. It is known prior to the study that seniors are more likely to own cars than freshmen. The student population at Purdue is divided into freshmen, sophomores, juniors, and seniors and a random sample of 200 students is selected from each group. 2. The government wanted to gather some information on unemployment. They randomly selected 5 of the 50 states. From the 5 selected states they randomly selected 3 counties to participate in the study. They then randomly selected 10 individuals from each of the counties to fill out their questionnaire. Lecture 1, Chapter 3 & Section 2.6 Page 3 3. Anne Landers asked people to send to her a response to the following question. “Do you have children? If so, would you still have children knowing what you know now?” 4. Ashley wanted to determine the average height of Purdue women students. She did not have the time to measure all Purdue women student’s height so she randomly selected 50 Purdue women students and measured each student’s height and averaged the 50 heights. How do you select the units in the sample? You can use SPSS or the random number table in the back of the book (Table B). Which way is MORE random? Both methods are equally random. Example: A club has 12 members. They are: Gundlach Remke Howell Brenneman Xu Reeger Cline Mehta Tuzov Daye Zheng Kuiper Use the random number table (Table B) starting at line 130 to take a SRS of 4 members. Table B starting at line 130: 69051 64817 87174 09517 84534 06489 87201 97245 05007 16632 81194 14873 04197 85576 45195 96565 68732 55259 84292 08796 43165 93739 31685 97150 45740 41807 65561 33302 07051 93623 18132 09547 Strategy: Give all of our names numbers in order from 01 through 12. Then look at our randomized numbers from Table B. Draw a line under every 2 digits. The 1st 4 unique (not repeated) 2-digit combinations which are between 01 and 12 are your sample. Lecture 1, Chapter 3 & Section 2.6 Page 4 Use SPSS to take a SRS of 4 members. Enter all the names in one column. Click on the column and then click Data Select Cases Random sample of cases Sample Exactly 4 cases from the first 12 cases Continue OK. You will see a “1” by exactly 4 of the 12 names. These are the selected members for your sample. All the other 8 names will have a 0, meaning they are not selected for the trip. From the Data Editor page: Gundlach Remke Howell Brenneman Xu Reeger Cline Mehta Tuzov Daye Zheng Kuiper 1 0 0 1 1 0 0 0 1 0 0 0 Problems with sampling: Undercoverage: occurs when some groups in the population are left out of the process of choosing the sample Lecture 1, Chapter 3 & Section 2.6 Page 5 Nonresponse: occurs when an individual chosen for the sample can’t be contacted or does not cooperate Response Bias: occurs when the behavior of the respondent or interviewer changes the sample results, (examples: the respondent lying or having a faulty memory, the race or sex of the interviewer influencing the respondent, poor interviewing technique, wording of questions) To see how good a survey actually is, you should look for: Sampling design Wording of questions posed Amount of non-response Date of the survey Examples: (Problem 3.55, p. 260) Comment on each of the following as a sample design or a potential sample survey question. Is there any source of bias? What type of bias? a) A survey used the following question: “Do you agree that a national system of health insurance should be favored because it would provide health insurance for everyone and reduce administrative costs?” b) Alex wanted to find out people’s opinions regarding Greater Lafayette Health Services’ desire to build a new hospital. Consequently, he took a simple random sample of 500 Lafayette and West Lafayette residents listed in the phone book. He is concerned however that those not listed in the phone book may have different views. Lecture 1, Chapter 3 & Section 2.6 Page 6 c) When Alex attempted to collect data from those who made it into his sample, he was unable to contact some of them and others refused to answer his survey questions. SOURCES OF DATA: Anecdotal evidence, which consists of data based on individual cases, which often come to our attention because they are striking in some way. (“News of the Weird” or a “Dateline” lead story) These cases will probably not be representative of the population. The sample size is small, perhaps only a single case. Do not draw conclusions from anecdotal evidence. It is NOT good science! Available Data are data that were produced in the past for some other purpose but that may help answer a present question and, many times, is quite good and useful data. Examples: Libraries, Internet websites. An observational study. An observational study observes units or individuals, usually a sample of all units, and measures variables of interest but does not attempt to influence the responses. We let nature take its course and observe the response. We do not manipulate the units in any way. A designed experiment is a procedure in which you deliberately impose some treatment on individual in order to measure their responses. In an experiment, we are always interested in the influence of one or more variables or factors on the response. We always impose some type of treatment on the individuals. Experiment versus an Observational Study In an experiment a treatment is imposed on the individuals before a measure is taken. The environment is manipulated in some way. If the experiment is carefully designed and all potential lurking variables are accounted for and controlled, conclusions regarding causation can be made. In some situations it is good to conduct more than one experiment before making any decisions regarding causation. In an observational study, the environmental factors are not controlled or manipulated. A measurement is taken without a treatment being imposed on the individual. Possible lurking variables Lecture 1, Chapter 3 & Section 2.6 Page 7 may exist. Consequently, numerous surveys need to be conducted to draw conclusions regarding causation. Examples: Which of the following is an experiment and which is an observational study? 1. To determine whether a review session will improve his students test scores, a stat 301 instructor divides his class into two groups. He then requires one group to attend a study session. He compares the test results of each group. 2. To determine whether a review session will improve his students test scores, a stat 301 instructor announces a study session to be held the night before a test. The instructor lists the students who attended the session and compares their scores to the remaining stat 301 student’s scores. Lecture 1, Chapter 3 & Section 2.6 Page 8 Design of Experiments, 3.1 Again, experiments deliberately impose some treatment on individuals in order to observe their response. Vocabulary: An experimental unit is the individual or unit on which the experiment is done. Often, these units are chosen randomly from the population of units. When the units are human beings, they are called subjects. A treatment is a specific experimental condition applied to the units. Factors are the explanatory variable(s) under study. A factor level is a specific value of a factor. The response variable is what is being measured on each unit/subject. Examples: Identify the experimental units or subjects, treatments, factors, factor levels and response variable. 1. In a Food technology study involving the storage of frozen strawberries, 10 pints were stored at each of 5 storage times. Storage times were randomly assigned to the pints. The amount of ascorbic acid content for each pint was measured after storage. Experimental units: Factor: Factor levels: Treatments: Response Variable: Pints of strawberries Storage time Five different storage times Each of the five different storage times. Ascorbic acid measured after storage. 2. A sports engineer is interested in determining the effects that speed and air pressure have on throwing distance for his new mechanical trainer football throwing machine. Two speeds (40 mph and 55 mph) and three air pressures (175 psi, 200 psi, 230 psi) where chosen for the study. Thirty footballs were obtained. Treatments were randomly assigned to footballs. Experimental units: Thirty footballs Factors: Speed and air pressure Lecture 1, Chapter 3 & Section 2.6 Page 9 Factor levels: Speed at 40 or 55 mph, Pressure at 175, 200 or 230 psi Treatments: Six different combinations of speed and pressure. Five trials in each treatment group. Response Variable: Throwing distance for each of 30 trials. Advantages of Experiments: In principle, experiments can give good evidence for causation. Experiments allow us to study the specific factors we are interested in, while controlling the effects of lurking variables. Experiments allow us to study the combined effects of several factors, and possibly detect interactions between factors. Difficulties which may arise: In the simplest designed experiment we would apply a single treatment and observe the response. This is ok in very controlled situations, but you may miss lurking variables, especially if you are using living subjects May encounter the placebo effect: a patient responds favorably to being treated, not to the treatment itself (your mind tricks you into getting better even though the medicine has no effect) A control group helps to determine whether a treatment is effective. Bias: the study systematically favors certain outcomes. Lack of realism: if the subjects know they’re in an experiment, they might not behave naturally during the treatment. How can we make an experiment objective and fair? (The 3 principles of experimental design.) To help detect placebo effect, use a control group: group of patients who receives a sham treatment (sugar pills instead of the medicine). Double-blind is best because then neither the subject nor the experimenter knows whether they are in the treatment or control group until the experiment is completely finished. (This avoids unconscious bias by the experimenter.) Randomization: Leave the assignment of the individuals to the treatment groups solely to chance. Do not rely on the judgment of the Lecture 1, Chapter 3 & Section 2.6 Page 10 experimenter in any way. This reduces or eliminates bias in the formation of the treatment groups. Replication: Use as many individuals in each treatment group as your experimental budget will permit. This reduces the chance variation in the average response for each treatment. What are our choices for type of experiment? Completely randomized design: In this plan or method of randomization, the individuals are randomly assigned to the treatment groups without restriction. Group 1 Treatment 1 Random Assignment Measure Results Group 2 Treatment 2 Block design: In a block design, the random assignment of the units to the treatments is carried out separately within each block, where a block represents a group of units that are known, before the experiment, to be similar in some way that will affect the response to the treatments. (blocks can be of any size). Group 1 Block 1 Treatment 1 Random Assignment Measure Results Group 2 Treatment 2 Group 1 Treatment 1 Subjects Block 2 Random Assignment Measure Results Group 2 Lecture 1, Chapter 3 & Section 2.6 Page 11 Treatment 2 Matched pair design: A matched pairs design compares just two treatments. We impose the two treatments on a pair of subjects/units. If we don’t have perfectly matched pairs, we choose blocks of two units that are as closely matched as possible. OR, each block may consist of just one subject, who gets both treatments, one after the other. We randomly assign the different treatments to each unit in the matched pair or, in the case of a single subject/unit acting as a pair, we randomly assign the order of treatments. Example 1: One unit Trt 1 Within each pair random assignment compare difference other unit Trt 2 ……… Example 2 unit 1 Trt 1 Trt 2 Measure difference Trt 2 Trt 1 Measure difference Random treatment order unit 2 …….. …….. Example Our 12 club members need to learn a new SPSS technique. An ITaP computer trainer thinks playing classical music in the background helps people to retain information better. Another ITaP computer trainer believes drinking coffee while training helps. Their boss decides to design an experiment to test out their theories. He will divide the club members into 4 groups and then give them a multiple choice test about the new SPSS technique. a) What are the factors and their levels? Classic music during training: Yes or No Coffee during training: Yes or No Lecture 1, Chapter 3 & Section 2.6 Page 12 b) What are the treatments? The four combinations of the two factors at two levels each. Three subjects will be assigned to each treatment. c) What are the units/subjects? The 12 club members taking computer training. d) What is the response variable? The test result of each individual after training. e) Is the response variable categorical or quantitative? Quantitative probably. f) Outline the design of the experiment. What type is it? Completely randomized design. g) Use Table B at line 133 to randomly assign the members to the treatments. Gundlach Remke Howell Brenneman Xu Reeger Cline Mehta Tuzov Daye Zheng Kuiper Table B starting at line 133: 45740 41807 65561 33302 07051 93623 18132 09547 27816 78416 18329 21337 35213 37741 04312 68508 66925 55658 39100 78458 11206 19876 87151 31260 08421 44753 77377 28744 75592 08563 79140 92454 Strategy: Same as choosing a simple random sample (SRS) except we need to keep going until we have 3 members in each of the 4 treatment groups. Lecture 1, Chapter 3 & Section 2.6 Page 13 The 1st 3 2-digit combinations that are between 01 and 12 are written down under “Group 1.” The 2nd 3 2-digit combinations that are between 01 and 12 (and not repeats) are written down under “Group 2.” Do the same thing for Group 3. Match names to the numbers. The remaining 3 names form Group 4. Group 1 Group 2 Group 3 Group 4 SPSS will do this easily if you want to separate your data into just 2 groups and will also do this if you want 3 or more groups in a step-by-step way, but the more groups you need the more complicated things get with SPSS. Example 2: Twelve overweight females have agreed to participate in a study of the effectiveness of four reducing regimens, A, B, C, and D. The researcher first calculates how overweight each subject is by comparing the subject’s actual weight with her “ideal” weight. The response variable is the weight lost after eight weeks of treatment. For this problem, we believe the initial amount overweight will influence the response variable, so a block design is appropriate for this study. Lecture 1, Chapter 3 & Section 2.6 Page 14 Arrange the subjects in order of increasing excess weight. Form three blocks by grouping the four least overweight, then the next four, and so on. Following are the subjects and their initial amount overweight: Birnbaum Brown Brunk Dixon Moses Ram 35 23 34 21 41 26 Hernandez Jackson Tran Loren Smith Brennan 25 33 43 32 38 44 After forming the three blocks, use the random numbers below to assign each of subjects to the four reducing regiments separately within each block. 19224 95034 05756 28713 96409 12531 42544 82853 73676 46150 30568 35098 Ethics in experiments: Section 2.6 The following three principles must be used when experiments involves human beings: Planned studies should be reviewed by a board to protect the subjects from harm. All subjects must give their informed consent before data are collected. All individual data must be kept confidential. Only summaries can be made public. Bad examples (which principles were violated) : Lecture 1, Chapter 3 & Section 2.6 Page 15 Tuskegee Study (Quotation from the Report of the Tuskegee Syphilis Study Legacy Committee, May 20, 1996. A detailed history is James H. Jones, Bad Blood: The Tuskegee Syphilis Experiment, Free Press, 1993.) In 1930, syphilis was common among black men in the rural South, a group that had almost no access to medical care. The Public Health Service Tuskegee study recruited 399 poor black share croppers with syphilis and 201 others without the disease in order to observe how syphilis progressed when no treatment was given. Beginning in 1943, penicillin became available to treat syphilis. The study subjects were not treated. In fact, the Public Health Service prevented any treatment until work leaked out and forced an end to the study in 1970s. Personal Space Study (R. D. Middlemest, E. S. Knowles, and C. F. Matter, “Personal space invasions in the lavatory: suggestive evidence for arousal, “Journal of Personality and Social Psychology, 33 (1976), pp 541-546.) Psychologists observe that people have a “personal space” and get annoyed if others come too close to them. We don’t like strangers to sit at our table in a coffee shop if other tables are available, and we see people move apart in elevators if there is room to do so. Americans tend to require more personal space than people in most other cultures. Can violations of personal space have physical as well as emotional effects? Investigators set up shop in a men’s public rest room. They block off urinals to force men walking in to use either a urinal next to an experimenter (treatment group) or a urinal separate from the experimenter (control group). Another experimenter, using a periscope from a toilet stall, measured how long the subject took to start urinating and how long he kept at it. Tracking Americans Cradle-to-Grave by Katherine Haley Will (president of Gettysburg College) in the J&C 7/26/06 Does the federal government need to know whether you aced Aristotelian ethics but had to repeat introductory biology? Does it need to know your family’s financial profile, how much aid you received and whether you took a semester to help out at home? The Secretary of Education’s Commission on the Future of Higher Education thinks so. … the commission called for creation of a tracking system to collect sensitive information about our nation’s college students…It is a mandatory federal registry of all American students throughout their collegiate careers—every course, every step, every misstep. Once established, it could easily be linked to existing K-12 and work force databases to create an unprecedented cradle-to-grave tracking of American citizens, all under the watchful eye of the federal government. Lecture 1, Chapter 3 & Section 2.6 Page 16 The commission calls our nation’s colleges and universities unaccountable, inefficient, and inaccessible. In response, it seeks to institute collection of personal information designed to quantify our students’ performance in college and in the workforce. But many of us are concerned about invading our students’ privacy by feeding confidential educational and personal data, linked to Social Security numbers, into a mandatory national database… We already have efficient systems in place to collect educational statistics…Our existing systems meet the government’s need to inform public policy without intruding on student privacy because they report the data in aggregate form [gathered altogether instead of as individual reports]… This proposal is a violation of the right to privacy that Americans hold dear. It is against the law. Moreover, there is a mountain of data already out there that can help us understand higher education and its efficacy. And, finally, implementation of such a database, which at its inception would hold “unit” record data on 17 million students, would be an unfounded mandate on institutions and add greatly to the expense of education. DATA ANALYSIS AFTER THE EXPERIMENT: Now that you have carried out your experiment and you have the data from your experiment, how do you know if a treatment is effective? The response variable will be averaged for each group, and the averages of all the treatments will be compared. Large differences in the treatment averages indicate that the treatments had an effect. An observed effect so large that it would rarely occur by chance alone is called statistically significant. This can be determined statistically. Statistical Inference: use a fact about a sample to estimate the truth about the whole population. Statistics vs Parameters: Suppose we have 48 members in our class and we wanted to find the average height of these students. Because we may not have time to measure all the students we may take a SRS of size 5 and measure the heights of the Lecture 1, Chapter 3 & Section 2.6 Page 17 five students in the sample. We could then calculate the sample mean (average) for the five students which would be a statistic. We would then use the statistic as an estimate of the height for all 48 members of our class. The true average height of all 48 members of our class would be a parameter. A parameter is a number that describes the population. A parameter is a fixed number, but in practice we do not know its value. A statistic is a number that describes a sample. The value of a statistic is known when we have taken a sample, but it can change from sample to sample. We often use a statistic to estimate an unknown parameter. Sampling variability represents the variation associated with the value of the statistic that are generated by repeatedly selecting samples of the same size. In our average height example, suppose we selected all possible simple random samples of size 5 from our population of 48 members of the class. (Note: That would be 1,712,304 samples). We could then calculate the 1,712,302 sample means. If we constructed a relative frequency table for all these sample means, the corresponding relative frequency histogram is called the sampling distribution of the sample mean. The sampling distribution of a statistic is the distribution of values taken by the statistic over all possible samples of equal size selected from the same population. Properties of Sampling distribution 1. The sampling distribution of a statistic can be generated by repeatedly sampling from the population, calculating the statistic and tabulating the values obtained. 2. All statistics have sampling distributions. 3. Sampling distributions are fundamental to statistical inference because the sampling distribution describes a regular, predictable pattern of behavior that emerges with repeated sampling. 4. Sampling distributions provide us with information regarding the accuracy and precision of our statistic as an estimator of the corresponding parameter. Lecture 1, Chapter 3 & Section 2.6 Page 18 The accuracy of the statistic as an estimator of the corresponding parameter is related to the center of the sampling distribution, vs. the true value of the parameter of the population. The precision of the statistic as an estimator of the corresponding parameter is related to the spread of the sampling distribution. Bias and Variability A statistic used to estimate a parameter is unbiased if the mean of the sampling distribution of the statistic is equal to the parameter it is estimating. We then say the statistic is an unbiased estimator of the parameter. To reduce bias, use random sampling. The variability of a statistic is described by the spread of its sampling distribution. The less variability in the value of the statistic, the more precise the statistic is as an estimator of the paramater. Sampling variability is controlled by: 1. the sampling design used to generate the sample. 2. the sample size. To reduce the variability of a statistic from an SRS, use a larger sample. Population Size Doesn’t Matter The variability of a statistic from a random sample does not depend on the size of the population, as long as the population is at least 100 times larger than the sample. Example: (Problem 3.58, p. 269) Voter registration records show that 68% of all voters in Indianapolis are registered as Republicans. To test a random digit dialing device, you use the device to call 150 randomly chosen residential telephones in Indianapolis. Of the registered voters contacted, 73% are registered Republicans. Are the boldface numbers parameters or statistics? Lecture 1, Chapter 3 & Section 2.6 Page 19 Example: (Problem 3.60, p. 269) A telemarketing firm in L.A. uses a device that dials residential telephone numbers in that city at random. Of the first 100 numbers dialed, 43 are unlisted. This is not surprising, because 52% of all L.A. residential phones are unlisted. Are the boldface numbers parameters or statistics? The Question of Causation When we talked about observational studies vs. experiments, we said that an observational study can’t give good evidence towards causation. Welldesigned experiments can help you with this. In experiments, you usually try to prove that the explanatory variable causes change in the response variable. You might have a strong association, but how do you prove causation? If we go back to our sleep/weight gain story in the Journal and Courier, one experiment in the story to support an association between sleep and weight gain was given as follows: “In the study conducted by Dr. Shahrad Taheri and colleagues at Stanford University and the University of Wisconsin-Madison, the scientists examined the data from 1,024 volunteers in a long-term sleep study conducted at the Wisconsin campus. They examined the sleep logs kept by the subjects as well as the duration of their sleep during nights spent at a sleep lab. Analyzing blood samples taken from the subjects, the researchers found a clear pattern. Those who slept the least had the most ghrelin and the least leptin, and for those who slept the longest, vice versa. The scientists also found that the subjects with the least sleep had a larger body mass index.” Causation: x and y are associated x causes y to change Lecture 1, Chapter 3 & Section 2.6 Page 20 Example: The amount of sleep a person gets in a night directly causes changes in hormone levels. OR A person’s hormone levels directly effect the number of hours of sleep a person gets per night. Common response: x and y are associated z is really what causes both x and y to change Example: The amount of daily exercise a person gets affects both sleep time and hormone levels. Confounding: x is associated with y x is associated with z x and z both have effects on y it is impossible to separate which affects are from x alone or z alone x and z can be either explanatory or lurking variables Example: The amount of daily exercise and the number of hours of sleep a person gets BOTH affect hormone levels. Even a very strong association between 2 variables is not by itself good evidence that there is a cause-and-effect link between the variables. Association does not mean causation. Lecture 1, Chapter 3 & Section 2.6 Page 21 Example: Just because having gray haired people die at a higher rate than people with other hair colors doesn’t mean that the gray hair itself causes death. Even when direct causation is present, it is rarely a complete explanation of an association between 2 variables. Example: Do you think sleep is the only thing that determines whether you have a higher body mass index? Even well-established causal relations may not generalize to other settings. Example: Medicine that works for dogs might not work for people or cats. Doing well on the homework in this class may help you do better on the exams, but that might not be true in every class you take. Big Overview of how to answer a research question: 1. Pick a question your want to answer. 2. Decide on your population. 3. Select a sample. voluntary response (the only one not random) simple random sample stratified random sample multistage sample 4. Observational study or experiment? If experiment, the choices are: completely randomized design block design Lecture 1, Chapter 3 & Section 2.6 Page 22 matched pairs 5. Collect the data. Make sure you follow ethical principles of experimentation. 6. Analyze the data. Don’t forget to look at graphs. 7. State your conclusions. Lecture 1, Chapter 3 & Section 2.6 Page 23