Key Concepts of the Scientific Method There are several important aspects to research methodology. This is a summary of the key concepts in scientific research and an attempt to erase some common misconceptions in science. Steps of the scientific method are shaped like an hourglass - starting from general questions, narrowing down to focus on one specific aspect, and designing research where we can observe and analyze this aspect. At last, we conclude and generalize to the real world. Formulating a Research Problem Researchers organize their research by formulating and defining a research problem. This helps them focus the research process so that they can draw conclusions reflecting the real world in the best possible way. Hypothesis In research, a hypothesis is a suggested explanation of a phenomenon. A null hypothesis is a hypothesis which a researcher tries to disprove. Normally, the null hypothesis represents the current view/explanation of an aspect of the world that the researcher wants to challenge. Examples of the Null Hypothesis A researcher may postulate a hypothesis: H1: Tomato plants exhibit a higher rate of growth when planted in compost rather than in soil. And a null hypothesis: H0: Tomato plants do not exhibit a higher rate of growth when planted in compost rather than soil. It is important to carefully select the wording of the null, and ensure that it is as specific as possible. For example, the researcher might postulate a null hypothesis: H0: Tomato plants show no difference in growth rates when planted in compost rather than soil. There is a major flaw with this H0. If the plants actually grow more slowly in compost than in soil, an impasse is reached. H1 is not supported, but neither is H0, because there is a difference in growth rates. If the null is rejected, with no alternative, the experiment may be invalid. This is the reason why science uses a battery of deductive and inductive processes to ensure that there are no flaws in the hypotheses. Many scientists neglect the null, assuming that it is merely the opposite of the alternative, but it is good practice to spend a little time creating a sound hypothesis. It is not possible to change any hypothesis retrospectively, including H0. Question Are teens better at math than adults? Does taking aspirin every day reduce the chance of having a heart attack? Do teens use cell phones to access the internet more than adults? Do cats care about the color of their food? Does chewing willow bark relieve pain? Null Hypothesis Age has no effect on mathematical ability. Taking aspirin daily does not affect heart attack risk. Age has no effect on how cell phones are used for internet access. Cats express no food preference based on color. There is no difference in pain relief after chewing willow bark versus taking a placebo. Why Test a Null Hypothesis? You may be wondering why you would want to test a hypothesis just to find it false. Why not just test an alternate hypothesis and find it true? The short answer is that it is part of the scientific method. In science, propositions are not explicitly "proven." Rather, science uses math to determine the probability that a statement is true or false. It turns out it's much easier to disprove a hypothesis than to positively prove one. Also, while the null hypothesis may be simply stated, there's a good chance the alternate hypothesis is incorrect. For example, if your null hypothesis is that plant growth is unaffected by duration of sunlight, you could state the alternate hypothesis in several different ways. Some of these statements might be incorrect. You could say plants are harmed by more than 12 hours of sunlight or that plants need at least three hours of sunlight, etc. There are clear exceptions to those alternate hypotheses, so if you test the wrong plants, you could reach the wrong conclusion. The null hypothesis is a general statement that can be used to develop an alternate hypothesis, which may or may not be correct. Research methodology involves the researcher providing an alternative hypothesis, a research hypothesis, as an alternate way to explain the phenomenon. The researcher tests the hypothesis to disprove the null hypothesis, not because he/she loves the research hypothesis, but because it would mean coming closer to finding an answer to a specific problem. The research hypothesis is often based on observations that evoke suspicion that the null hypothesis is not always correct. In the Stanley Milgram Experiment, the null hypothesis was that the personality determined whether a person would hurt another person, while the research hypothesis was that the role, instructions and orders were much more important in determining whether people would hurt others. Variables A variable is something that changes. It changes according to different factors. Some variables change easily, like the stock-exchange value, while other variables are almost constant, like the name of someone. Researchers are often seeking to measure variables. The variable can be a number, a name, or anything where the value can change. An example of a variable is temperature. The temperature varies according to other variable and factors. You can measure different temperature inside and outside. If it is a sunny day, chances are that the temperature will be higher than if it's cloudy. Another thing that can make the temperature change is whether something has been done to manipulate the temperature, like lighting a fire in the chimney. In research, you typically define variables according to what you're measuring. The independent variable is the variable which the researcher would like to measure (the cause), while the dependent variable is the effect (or assumed effect), dependent on the independent variable. These variables are often stated in experimental research, in a hypothesis, e.g. "what is the effect of personality on helping behavior?" In explorative research methodology, e.g. in some qualitative research, the independent and the dependent variables might not be identified beforehand. They might not be stated because the researcher does not have a clear idea yet on what is really going on. Confounding variables are variables with a significant effect on the dependent variable that the researcher failed to control or eliminate - sometimes because the researcher is not aware of the effect of the confounding variable. The key is to identify possible confounding variables and somehow try to eliminate or control them. Operationalization Operationalization is to take a fuzzy concept (conceptual variables), such as 'helping behavior', and try to measure it by specific observations, e.g. how likely are people to help a stranger with problems. See also: Conceptual Variables Choosing the Research Method The selection of the research method is crucial for what conclusions you can make about a phenomenon. It affects what you can say about the cause and factors influencing the phenomenon. It is also important to choose a research method which is within the limits of what the researcher can do. Time, money, feasibility, ethics and availability to measure the phenomenon correctly are examples of issues constraining the research. Choosing the Measurement Choosing the scientific measurements are also crucial for getting the correct conclusion. Some measurements might not reflect the real world, because they do not measure the phenomenon as it should. Results Significance Test To test a hypothesis, quantitative research uses significance tests to determine which hypothesis is right. The significance test can show whether the null hypothesis is more likely correct than the research hypothesis. Research methodology in a number of areas like social sciences depends heavily on significance tests. A significance test may even drive the research process in a whole new direction, based on the findings. The t-test (also called the Student's T-Test) is one of many statistical significance tests, which compares two supposedly equal sets of data to see if they really are alike or not. The t-test helps the researcher conclude whether a hypothesis is supported or not. Drawing Conclusions Drawing a conclusion is based on several factors of the research process, not just because the researcher got the expected result. It has to be based on the validity and reliability of the measurement, how good the measurement was to reflect the real world and what more could have affected the results. The observations are often referred to as 'empirical evidence' and the logic/thinking leads to the conclusions. Anyone should be able to check the observation and logic, to see if they also reach the same conclusions. Errors of the observations may stem from measurement-problems, misinterpretations, unlikely random events etc. A common error is to think that correlation implies a causal relationship. This is not necessarily true. Generalization Generalization is to which extent the research and the conclusions of the research apply to the real world. It is not always so that good research will reflect the real world, since we can only measure a small portion of the population at a time. Validity and Reliability Validity refers to what degree the research reflects the given research problem, while Reliability refers to how consistent a set of measurements are. Types of validity: External Validity Population Validity Ecological Validity Internal Validity Content Validity Face Validity Construct Validity Convergent and Discriminant Validity Test Validity Criterion Validity Concurrent Validity Predictive Validity A definition of reliability may be "Yielding the same or compatible results in different clinical experiments or statistical trials" (the free dictionary). Research methodology lacking reliability cannot be trusted. Replication studies are a way to test reliability. Types of Reliability: Test-Retest Reliability Interrater Reliability Internal Consistency Reliability Instrument Reliability Statistical Reliability Reproducibility Both validity and reliability are important aspects of the research methodology to get better explanations of the world. Errors in Research Logically, there are two types of errors when drawing conclusions in research: Type 1 error is when we accept the research hypothesis when the null hypothesis is in fact correct. Type 2 error is when we reject the research hypothesis even if the null hypothesis is wrong. Threats to Construct Validity Hypothesis Guessing This threat is when the subject guesses the intent of the test and consciously, or subconsciously, alters their behavior. Evaluation Apprehension This particular threat is based upon the tendency of humans to act differently when under pressure. Individual testing is notorious for bringing on an adrenalin rush, and this can improve or hinder performance. Researcher Expectancies and Bias Researchers are only human and may give cues that influence the behavior of the subject. Humans give cues through body language, and subconsciously smiling when the subject gives a correct answer, or frowning at an undesirable response, all have an effect. Poor Construct Definition Construct validity is all about semantics and labeling. Defining a construct in too broad or too narrow terms can invalidate the entire experiment. For example, a researcher might try to use job satisfaction to define overall happiness. This is too narrow, as somebody may love their job but have an unhappy life outside the workplace. Equally, using general happiness to measure happiness at work is too broad. Many people enjoy life but still hate their work! Mislabeling is another common definition error: stating that you intend to measure depression, when you actually measure anxiety, compromises the research. The best way to avoid this particular threat is with good planning and seeking advice before you start your research program. Construct Confounding This threat to construct validity occurs when other constructs mask the effects of the measured construct. For example, self-esteem is affected by self-confidence and self-worth. The effect of these constructs needs to be incorporated into the research. Interaction of Different Treatments This particular threat is where more than one treatment influences the final outcome. For example, a researcher tests an intensive counselling program as a way of helping smokers give up cigarettes. At the end of the study, the results show that 64% of the subjects successfully gave up. Sadly, the researcher then finds that some of the subjects also used nicotine patches and gum, or electronic cigarettes. The construct validity is now too low for the results to have any meaning. Only good planning and monitoring of the subjects can prevent this. Unreliable Scores Variance in scores is a very easy trap to fall into. For example, an educational researcher devises an intelligence test that provides excellent results in the UK, and shows high construct validity. However, when the test is used upon immigrant children, with English as a second language, the scores are lower. The test measures their language ability rather than intelligence. Mono-Operation Bias This threat involves the independent variable, and is a situation where a single manipulation is used to influence a construct. For example, a researcher may want to find out whether an anti-depression drug works. They divide patients into two groups, one given the drug and a control given a placebo. The problem with this is that it is limited (e.g. random sampling error), and a solid design would use multi-groups given different doses. The other option is to conduct a pre-study that calculates the optimum dose, an equally acceptable way to preserve construct validity. Mono-Method Bias This threat to construct validity involves the dependent variable, and occurs when only a single method of measurement is used. For example, in an experiment to measure self-esteem, the researcher uses a single method to determine the level of that construct, but then discovers that it actually measures selfconfidence. Using a variety of methods, such as questionnaires, self-rating, physiological tests, and observation minimizes the chances of this particular threat affecting construct validity. INTERNAL VALIDITY Internal validity is a crucial measure in quantitative studies, where it ensures that a researcher's experiment design closely follows the principle of cause and effect. Internal validity is relevant when our hypothesis describes a causal relationship. A study is internally valid if the observed effect is actually due to the hypothesized cause. Low when there are many plausible alternative explanations for the measured effect. The easy way to describe internal validity is the confidence that we can place in the causeand-effect relationship in a study. The key question that you should ask in any experiment is: “Could there be an alternative cause, or causes, that explain my observations and results?” Looking at some extreme examples, a physics experiment into the effect of heat on the conductivity of a metal has a high internal validity. The researcher can eliminate almost all of the potential confounding variables and set up strong controls to isolate other factors. At the other end of the scale, a study into the correlation between income level and the likelihood of smoking has a far lower internal validity. A researcher may find that there is a link between low-income groups and smoking, but cannot be certain that one causes the other. Social status, profession, ethnicity, education, parental smoking, and exposure to targeted advertising are all variables that may have an effect. They are difficult to eliminate, and social research can be a statistical minefield for the unwary. Internal Validity vs Construct Validity For physical scientists, construct validity is rarely needed but, for social sciences and psychology, construct validity is the very foundation of research. Even more important is understanding the difference between construct validity and internal validity, which can be a very fine distinction. The subtle differences between the two are not always clear, but it is important to be able to distinguish between the two, especially if you wish to be involved in the social sciences, psychology and medicine. Internal validity only shows that you have evidence to suggest that a program or study had some effect on the observations and results. Construct validity determines whether the program measured the intended attribute. Internal validity says nothing about whether the results were what you expected, or whether generalization is possible. For example, imagine that some researchers wanted to investigate the effects of a computer program against traditional classroom methods for teaching Greek. The results showed that children using the computer program learned far more quickly, and improved their grades significantly. However, further investigation showed that the results were not due to the program itself, but due to the Hawthorne Effect; the children using the computer program felt that they had been singled out for special attention. As a result, they tried a little harder, instead of staring out of the window. This experiment still showed high internal validity, because the research manipulation had an effect. However, the study had low construct validity, because the cause was not correctly labeled. The experiment ultimately measured the effects of increased attention, rather than the intended merits of the computer program. How to Maintain High Confidence in Internal Validity? It is impossible to maintain 100% confidence in any experimental design, and there is always the chance of error. However, there are a number of tools that help a researcher to oversee internal validity and establish causality. Temporal Precedence Temporal precedence is the single most important tool for determining the strength of a cause and effect relationship. This is the process of establishing that the cause did indeed happen before the effect, providing a solution to the chicken and egg problem. To establish internal validity through temporal precedence, a researcher must establish which variable came first. One example could be an ecology study, establishing whether an increase in the population of lemmings in a fjord in Norway is followed by an increase in the number of predators. Lemmings show a very predictable population cycle, which steadily rises and falls over 3 to 5 year cycle. Population estimates show that the number of lemmings rises due to an increase in the abundance of food. This trend is followed, a couple of months later, by an increase in the number of predators, as more of their young survive. This seems to be a pretty clear example of temporal precedence; the availability of food for the lemmings dictates numbers. In turn, this dictates the population of predators. STOP! Not so fast! In fact, the predator/prey relationship is much more complex than this. Ecosystems rarely contain simple linear relationships, and food availability is only one controlling factor. Turning the whole thing around, an increase in the number of predators may also control the lemming population. The predators may be so successful that the lemming population plummets and the predators starve, through limiting their own food supply. What if predators turn to an alternative food supply when the number of lemmings is low? Lemmings, like many rodents, show lower breeding success during times of high population. This really is a tough call, and the only answer is to study previous research. Internal validity is possibly the single most important reason for conducting a strong and thorough literature review. Even with this, it is often difficult to show that cause happens before effect, a fact that behavioral biologists and ecologists know only too well. By contrast, the physics experiment is fairly easy - I heat the metal and conductivity increase/decreases, providing a simpler view of cause and effect and high internal validity. An Example of a Lemming Study Covariation of the Cause and Effect Covariation of the cause and effect is the process of establishing that there is a cause and effect to relationship between the variables. It establishes that the experiment or program had some measurable effect, whatever that may be. For example, in the study of Greek learning, the results showed that the group with the computer package performed better than those without. This can be summed up as: If you use the program, there is an outcome. Without the program, there is no outcome. This does not need to be an either/or relationship and it could be: More of the program equals more of the outcome. Less of the program equals less of the outcome. This seems pretty obvious, but you have to remember the basic rule of internal validity. Covariation of the cause and effect cannot explain what causes the effect, or establish whether it is due to the expected manipulated variable or to a confounding variable. It does, however, strengthen the internal validity of the study. Establishing Causality Through a Process of Elimination Establishing causality through elimination is the easiest way to prove that an experiment has high internal validity. As with the lemming example, there could be many other plausible explanations for the apparent causal link between prey and predator. Researchers often refer to any such confounding variable as the 'Missing Variable,' an unknown factor that may underpin the apparent relationship. The problem is, as the name suggests, that the variable is missing, and trying to find it is almost impossible. The only way to nullify it is through strong experimental design, eliminating confounding variables and ensuring that they cannot have any influence. Randomization, control groups and repeat experiments are the best way to eliminate these variables and maintain high validity. In the lemming example, researchers use a whole series of experiments, measuring predation rates, alternative food sources and lemming breeding rates, attempting to establish a baseline. Just to leave you with an example of how difficult measuring internal validity can be: In the experiment where researchers compared a computer program for teaching Greek against traditional methods, there are a number of threats to internal validity. The group with computers feels special, so they try harder, the Hawthorne Effect. The group without computers becomes jealous, and tries harder to prove that they should have been given the chance to use the shiny new technology. Alternatively, the group without computers is demoralized and their performance suffers. Parents of the children in the computerless group feel that their children are missing out, and complain that all children should be given the opportunity. The children talk outside school and compare notes, muddying the water. The teachers feel sorry for the children without the program and attempt to compensate, helping the children more than normal. We are not trying to depress you with these complications, only illustrate how complex internal validity can be. In fact, perfect internal validity is an unattainable ideal, but any research design must strive towards that perfection. For those of you wondering whether you picked the right course, don't worry. Designing experiments with good internal validity is a matter of experience, and becomes much easier over time. EXTERNAL VALIDITY A study is externally valid if the hypothesized relationship, supported by our findings, also holds in other settings and other groups. External validity is one the most difficult of the validity types to achieve, and is at the foundation of every good experimental design. Many scientific disciplines, especially the social sciences, face a long battle to prove that their findings represent the wider population in real world situations. The main criteria of external validity is the process of generalization, and whether results obtained from a small sample group, often in laboratory surroundings, can be extended to make predictions about the entire population. The reality is that if a research program has poor external validity, the results will not be taken seriously, so any research design must justify sampling and selection methods. What is External Validity? In 1966, Campbell and Stanley proposed the commonly accepted definition of external validity. “External validity asks the question of generalizability: To what populations, settings, treatment variables and measurement variables can this effect be generalized?” External validity is usually split into two distinct types, population validity and ecological validity, and they are both essential elements in judging the strength of an experimental design. Psychology and External Validity The Battle Lines are Drawn External validity often causes a little friction between clinical psychologists and research psychologists. Clinical psychologists often believe that research psychologists spend all of their time in laboratories, testing mice and humans in conditions that bear little resemblance to the outside world. They claim that the data produced has no external validity, and does not take into account the sheer complexity and individuality of the human mind. Before we are flamed by irate research psychologists, the truth lies somewhere between the two extremes! Research psychologists find out trends and generate sweeping generalizations that predict the behavior of groups. Clinical psychologists end up picking up the pieces, and study the individuals who lie outside the predictions, hence the animosity. In most cases, research psychology has a very high population validity, because researchers take meticulously randomly select groups and use large sample sizes, allowing meaningful statistical analysis. However, the artificial nature of research psychology means that ecological validity is usually low. Clinical psychologists, on the other hand, often use focused case studies, which cause minimum disruption to the subject and have strong ecological validity. However, the small sample sizes mean that the population validity is often low. Ideally, using both approaches provides useful generalizations, over time! Randomization in External Validity and Internal Validity It is also important to distinguish between external and internal validity, especially with the process of randomization, which is easily misinterpreted. Random selection is an important tenet of external validity. For example, a research design, which involves sending out survey questionnaires to students picked at random, displays more external validity than one where the questionnaires are given to friends. This is randomization to improve external validity. Once you have a representative sample, high internal validity involves randomly assigning subjects to groups, rather than using pre-determined selection factors. With the student example, randomly assigning the students into test groups, rather than picking pre-determined groups based upon degree type, gender, or age strengthens the internal validity. TWO DISTINCT TYPES Population validity is a type of external validity which describes how well the sample used can be extrapolated to a population as a whole. It evaluates whether the sample population represents the entire population, and also whether the sampling method is acceptable. For example, an educational study that looked at a single school could not be generalized to cover children at every US school. On the other hand, a federally appointed study, that tested every pupil of a certain age group, will have exceptionally strong population validity. Threats to Validity 1. Threats to Validity 1. History 2. Selection 3. Testing 4. Instrumentation 5. Maturation 6. Mortality 2. Threats to Validity 1. History • Refers to the events that may occur during the time frame of the study which are not actually part of the study. • They produce effects that influence the results of the study, either increasing or decreasing the expected results. 2. Selection – Occurs when respondents of the study are chosen not only individually but as a group. 3. Testing • refers to the pre-test given that results in an improved performance in the post-test. • To avoid this threat, a pre-test may not be needed for administration. • However, if a pre-test is given, another measure is recommended to use an as an alternate form of instrument. 4. Instrumentation • It refers to unreliability in measuring instruments that may result to an invalid measurement of performance. • The change in instrument used between the pre-test and post-test may result in an effect not caused by a treatment introduced 5. Maturation • This factor refers to the physiologic and psychologic changes that may happen to the respondents of the study over a period of time. • If the time frame of a training program is quite long and rigid, the participants may experience some psychological discomfort due to boredom, tiredness, hunger and the like. 6. Mortality • It refers to loss of participants during the post-test stage or even during the implementation of the time frame of the study • when the same group of individuals is studied over a long period of time. • By the time a follow-up study is conducted on the same group, some members may have dropped out or may refuse to cooperate further in the study. Causality Due to time and cost restraints, most studies lie somewhere between these two extremes, and researchers pay extreme attention to their sampling techniques. Experienced scientists ensure that their sample groups are as representative as possible, striving to use random selection rather than convenience sampling. Ecological validity is a type of external validity which looks at the testing environment and determines how much it influences behavior. In the school test example, if the pupils are used to regular testing, then the ecological validity is high because the testing process is unlikely to affect behavior. On the other hand, taking each child out of class and testing them individually, in an isolated room, will dramatically lower ecological validity. The child may be nervous, ill at ease and is unlikely to perform in the same way as they would in a classroom. Generalization becomes difficult, as the experiment does not resemble the real world situation.