Uploaded by Maria Theresa Carcamo

Key Concepts of the Scientific Method

Key Concepts of the Scientific Method
There are several important aspects to research methodology. This is a summary of the
key concepts in scientific research and an attempt to erase some common misconceptions
in science.
Steps of the scientific method are shaped like an hourglass - starting from general questions,
narrowing down to focus on one specific aspect, and designing research where we
can observe and analyze this aspect. At last, we conclude and generalize to the real world.
Formulating a Research Problem
Researchers organize their research by formulating and defining a research problem. This
helps them focus the research process so that they can draw conclusions reflecting the real
world in the best possible way.
In research, a hypothesis is a suggested explanation of a phenomenon.
A null hypothesis is a hypothesis which a researcher tries to disprove. Normally, the null
hypothesis represents the current view/explanation of an aspect of the world that the
researcher wants to challenge.
Examples of the Null Hypothesis
A researcher may postulate a hypothesis:
H1: Tomato plants exhibit a higher rate of growth when planted in compost rather than in
And a null hypothesis:
H0: Tomato plants do not exhibit a higher rate of growth when planted in compost rather
than soil.
It is important to carefully select the wording of the null, and ensure that it is as specific as
possible. For example, the researcher might postulate a null hypothesis:
H0: Tomato plants show no difference in growth rates when planted in compost rather
than soil.
There is a major flaw with this H0. If the plants actually grow more slowly in compost than in
soil, an impasse is reached. H1 is not supported, but neither is H0, because there is a
difference in growth rates.
If the null is rejected, with no alternative, the experiment may be invalid. This is the reason
why science uses a battery of deductive and inductive processes to ensure that there are no
flaws in the hypotheses.
Many scientists neglect the null, assuming that it is merely the opposite of the alternative,
but it is good practice to spend a little time creating a sound hypothesis. It is not possible to
change any hypothesis retrospectively, including H0.
Are teens better at math than adults?
Does taking aspirin every day reduce the
chance of having a heart attack?
Do teens use cell phones to access the
internet more than adults?
Do cats care about the color of their
Does chewing willow bark relieve pain?
Null Hypothesis
Age has no effect on mathematical ability.
Taking aspirin daily does not affect heart attack
Age has no effect on how cell phones are used for
internet access.
Cats express no food preference based on color.
There is no difference in pain relief after chewing
willow bark versus taking a placebo.
Why Test a Null Hypothesis?
You may be wondering why you would want to test a hypothesis just to find it false. Why
not just test an alternate hypothesis and find it true? The short answer is that it is part of
the scientific method. In science, propositions are not explicitly "proven." Rather, science
uses math to determine the probability that a statement is true or false. It turns out it's
much easier to disprove a hypothesis than to positively prove one. Also, while the null
hypothesis may be simply stated, there's a good chance the alternate hypothesis is
For example, if your null hypothesis is that plant growth is unaffected by duration of
sunlight, you could state the alternate hypothesis in several different ways. Some of these
statements might be incorrect. You could say plants are harmed by more than 12 hours of
sunlight or that plants need at least three hours of sunlight, etc. There are clear exceptions
to those alternate hypotheses, so if you test the wrong plants, you could reach the wrong
conclusion. The null hypothesis is a general statement that can be used to develop an
alternate hypothesis, which may or may not be correct.
Research methodology involves the researcher providing an alternative hypothesis,
a research hypothesis, as an alternate way to explain the phenomenon.
The researcher tests the hypothesis to disprove the null hypothesis, not because he/she
loves the research hypothesis, but because it would mean coming closer to finding an
answer to a specific problem. The research hypothesis is often based on observations that
evoke suspicion that the null hypothesis is not always correct.
In the Stanley Milgram Experiment, the null hypothesis was that the personality determined
whether a person would hurt another person, while the research hypothesis was that the
role, instructions and orders were much more important in determining whether people
would hurt others.
A variable is something that changes. It changes according to different factors. Some
variables change easily, like the stock-exchange value, while other variables are almost
constant, like the name of someone. Researchers are often seeking to measure variables.
The variable can be a number, a name, or anything where the value can change.
An example of a variable is temperature. The temperature varies according to other variable
and factors. You can measure different temperature inside and outside. If it is a sunny day,
chances are that the temperature will be higher than if it's cloudy. Another thing that can
make the temperature change is whether something has been done to manipulate the
temperature, like lighting a fire in the chimney.
In research, you typically define variables according to what you're measuring.
The independent variable is the variable which the researcher would like to measure (the
cause), while the dependent variable is the effect (or assumed effect), dependent on the
independent variable. These variables are often stated in experimental research, in
a hypothesis, e.g. "what is the effect of personality on helping behavior?"
In explorative research methodology, e.g. in some qualitative research, the independent
and the dependent variables might not be identified beforehand. They might not be stated
because the researcher does not have a clear idea yet on what is really going on.
Confounding variables are variables with a significant effect on the dependent variable that
the researcher failed to control or eliminate - sometimes because the researcher is not
aware of the effect of the confounding variable. The key is to identify possible confounding
variables and somehow try to eliminate or control them.
Operationalization is to take a fuzzy concept (conceptual variables), such as 'helping
behavior', and try to measure it by specific observations, e.g. how likely are people to help a
stranger with problems.
See also:
Conceptual Variables
Choosing the Research Method
The selection of the research method is crucial for what conclusions you can make about a
phenomenon. It affects what you can say about the cause and factors influencing the
It is also important to choose a research method which is within the limits of what the
researcher can do. Time, money, feasibility, ethics and availability to measure the
phenomenon correctly are examples of issues constraining the research.
Choosing the Measurement
Choosing the scientific measurements are also crucial for getting the correct conclusion.
Some measurements might not reflect the real world, because they do not measure the
phenomenon as it should.
Significance Test
To test a hypothesis, quantitative research uses significance tests to determine which
hypothesis is right.
The significance test can show whether the null hypothesis is more likely correct than the
research hypothesis. Research methodology in a number of areas like social sciences
depends heavily on significance tests.
A significance test may even drive the research process in a whole new direction, based on
the findings.
The t-test (also called the Student's T-Test) is one of many statistical significance tests,
which compares two supposedly equal sets of data to see if they really are alike or not. The
t-test helps the researcher conclude whether a hypothesis is supported or not.
Drawing Conclusions
Drawing a conclusion is based on several factors of the research process, not just because
the researcher got the expected result. It has to be based on the validity and reliability of
the measurement, how good the measurement was to reflect the real world and what more
could have affected the results.
The observations are often referred to as 'empirical evidence' and the logic/thinking leads to
the conclusions. Anyone should be able to check the observation and logic, to see if they
also reach the same conclusions.
Errors of the observations may stem from measurement-problems, misinterpretations,
unlikely random events etc.
A common error is to think that correlation implies a causal relationship. This is not
necessarily true.
Generalization is to which extent the research and the conclusions of the research apply to
the real world. It is not always so that good research will reflect the real world, since we can
only measure a small portion of the population at a time.
Validity and Reliability
Validity refers to what degree the research reflects the given research problem, while
Reliability refers to how consistent a set of measurements are.
Types of validity:
 External Validity
 Population Validity
 Ecological Validity
 Internal Validity
 Content Validity
 Face Validity
 Construct Validity
 Convergent and Discriminant Validity
 Test Validity
 Criterion Validity
 Concurrent Validity
 Predictive Validity
A definition of reliability may be "Yielding the same or compatible results in different clinical
experiments or statistical trials" (the free dictionary). Research methodology lacking
reliability cannot be trusted. Replication studies are a way to test reliability.
Types of Reliability:
 Test-Retest Reliability
 Interrater Reliability
 Internal Consistency Reliability
 Instrument Reliability
 Statistical Reliability
 Reproducibility
Both validity and reliability are important aspects of the research methodology to get better
explanations of the world.
Errors in Research
Logically, there are two types of errors when drawing conclusions in research:
Type 1 error is when we accept the research hypothesis when the null hypothesis is in fact
Type 2 error is when we reject the research hypothesis even if the null hypothesis is wrong.
Threats to Construct Validity
Hypothesis Guessing
This threat is when the subject guesses the intent of the test and consciously, or
subconsciously, alters their behavior.
Evaluation Apprehension
This particular threat is based upon the tendency of humans to act differently when under
pressure. Individual testing is notorious for bringing on an adrenalin rush, and this can
improve or hinder performance.
Researcher Expectancies and Bias
Researchers are only human and may give cues that influence the behavior of the subject.
Humans give cues through body language, and subconsciously smiling when the subject
gives a correct answer, or frowning at an undesirable response, all have an effect.
Poor Construct Definition
Construct validity is all about semantics and labeling. Defining a construct in too broad or
too narrow terms can invalidate the entire experiment.
For example, a researcher might try to use job satisfaction to define overall happiness. This
is too narrow, as somebody may love their job but have an unhappy life outside the
workplace. Equally, using general happiness to measure happiness at work is too broad.
Many people enjoy life but still hate their work!
Mislabeling is another common definition error: stating that you intend to measure
depression, when you actually measure anxiety, compromises the research.
The best way to avoid this particular threat is with good planning and seeking advice before
you start your research program.
Construct Confounding
This threat to construct validity occurs when other constructs mask the effects of the
measured construct.
For example, self-esteem is affected by self-confidence and self-worth. The effect of these
constructs needs to be incorporated into the research.
Interaction of Different Treatments
This particular threat is where more than one treatment influences the final outcome.
For example, a researcher tests an intensive counselling program as a way of helping
smokers give up cigarettes. At the end of the study, the results show that 64% of the
subjects successfully gave up.
Sadly, the researcher then finds that some of the subjects also used nicotine patches and
gum, or electronic cigarettes. The construct validity is now too low for the results to have
any meaning. Only good planning and monitoring of the subjects can prevent this.
Unreliable Scores
Variance in scores is a very easy trap to fall into.
For example, an educational researcher devises an intelligence test that provides excellent
results in the UK, and shows high construct validity.
However, when the test is used upon immigrant children, with English as a second language,
the scores are lower.
The test measures their language ability rather than intelligence.
Mono-Operation Bias
This threat involves the independent variable, and is a situation where a single manipulation
is used to influence a construct.
For example, a researcher may want to find out whether an anti-depression drug works.
They divide patients into two groups, one given the drug and a control given a placebo.
The problem with this is that it is limited (e.g. random sampling error), and a solid design
would use multi-groups given different doses.
The other option is to conduct a pre-study that calculates the optimum dose, an equally
acceptable way to preserve construct validity.
Mono-Method Bias
This threat to construct validity involves the dependent variable, and occurs when only a
single method of measurement is used.
For example, in an experiment to measure self-esteem, the researcher uses a single method
to determine the level of that construct, but then discovers that it actually measures selfconfidence.
Using a variety of methods, such as questionnaires, self-rating, physiological tests, and
observation minimizes the chances of this particular threat affecting construct validity.
Internal validity is a crucial measure in quantitative studies, where it ensures that a
researcher's experiment design closely follows the principle of cause and effect.
Internal validity is relevant when our hypothesis describes a causal relationship. A study is
internally valid if the observed effect is actually due to the hypothesized cause. Low when
there are many plausible alternative explanations for the measured effect.
The easy way to describe internal validity is the confidence that we can place in the causeand-effect relationship in a study. The key question that you should ask in any experiment
“Could there be an alternative cause, or causes, that explain my observations and
Looking at some extreme examples, a physics experiment into the effect of heat on the
conductivity of a metal has a high internal validity.
The researcher can eliminate almost all of the potential confounding variables and set up
strong controls to isolate other factors.
At the other end of the scale, a study into the correlation between income level and the
likelihood of smoking has a far lower internal validity.
A researcher may find that there is a link between low-income groups and smoking, but
cannot be certain that one causes the other.
Social status, profession, ethnicity, education, parental smoking, and exposure to targeted
advertising are all variables that may have an effect. They are difficult to eliminate, and
social research can be a statistical minefield for the unwary.
Internal Validity vs Construct Validity
For physical scientists, construct validity is rarely needed but, for social sciences and
psychology, construct validity is the very foundation of research.
Even more important is understanding the difference between construct validity and
internal validity, which can be a very fine distinction.
The subtle differences between the two are not always clear, but it is important to be able
to distinguish between the two, especially if you wish to be involved in the social sciences,
psychology and medicine.
Internal validity only shows that you have evidence to suggest that a program or study had
some effect on the observations and results.
Construct validity determines whether the program measured the intended attribute.
Internal validity says nothing about whether the results were what you expected, or
whether generalization is possible.
For example, imagine that some researchers wanted to investigate the effects of a
computer program against traditional classroom methods for teaching Greek.
The results showed that children using the computer program learned far more quickly, and
improved their grades significantly.
However, further investigation showed that the results were not due to the program itself,
but due to the Hawthorne Effect; the children using the computer program felt that they
had been singled out for special attention. As a result, they tried a little harder, instead of
staring out of the window.
This experiment still showed high internal validity, because the research manipulation had
an effect.
However, the study had low construct validity, because the cause was not correctly labeled.
The experiment ultimately measured the effects of increased attention, rather than the
intended merits of the computer program.
How to Maintain High Confidence in Internal Validity?
It is impossible to maintain 100% confidence in any experimental design, and there is always
the chance of error.
However, there are a number of tools that help a researcher to oversee internal validity and
establish causality.
Temporal Precedence
Temporal precedence is the single most important tool for determining the strength of a
cause and effect relationship. This is the process of establishing that the cause did indeed
happen before the effect, providing a solution to the chicken and egg problem.
To establish internal validity through temporal precedence, a researcher must establish
which variable came first.
One example could be an ecology study, establishing whether an increase in the population
of lemmings in a fjord in Norway is followed by an increase in the number of predators.
Lemmings show a very predictable population cycle, which steadily rises and falls over 3 to 5
year cycle. Population estimates show that the number of lemmings rises due to an increase
in the abundance of food.
This trend is followed, a couple of months later, by an increase in the number of predators,
as more of their young survive. This seems to be a pretty clear example of temporal
precedence; the availability of food for the lemmings dictates numbers. In turn, this dictates
the population of predators.
Not so fast!
In fact, the predator/prey relationship is much more complex than this. Ecosystems rarely
contain simple linear relationships, and food availability is only one controlling factor.
Turning the whole thing around, an increase in the number of predators may also control
the lemming population. The predators may be so successful that the lemming population
plummets and the predators starve, through limiting their own food supply.
What if predators turn to an alternative food supply when the number of lemmings is low?
Lemmings, like many rodents, show lower breeding success during times of high population.
This really is a tough call, and the only answer is to study previous research. Internal validity
is possibly the single most important reason for conducting a strong and thorough literature
Even with this, it is often difficult to show that cause happens before effect, a fact that
behavioral biologists and ecologists know only too well.
By contrast, the physics experiment is fairly easy - I heat the metal and conductivity
increase/decreases, providing a simpler view of cause and effect and high internal validity.
An Example of a Lemming Study
Covariation of the Cause and Effect
Covariation of the cause and effect is the process of establishing that there is a cause and
effect to relationship between the variables. It establishes that the experiment or program
had some measurable effect, whatever that may be.
For example, in the study of Greek learning, the results showed that the group with the
computer package performed better than those without.
This can be summed up as:
If you use the program, there is an outcome.
Without the program, there is no outcome.
This does not need to be an either/or relationship and it could be:
More of the program equals more of the outcome.
Less of the program equals less of the outcome.
This seems pretty obvious, but you have to remember the basic rule of internal validity.
Covariation of the cause and effect cannot explain what causes the effect, or establish
whether it is due to the expected manipulated variable or to a confounding variable.
It does, however, strengthen the internal validity of the study.
Establishing Causality Through a Process of Elimination
Establishing causality through elimination is the easiest way to prove that an experiment has
high internal validity.
As with the lemming example, there could be many other plausible explanations for the
apparent causal link between prey and predator.
Researchers often refer to any such confounding variable as the 'Missing Variable,' an
unknown factor that may underpin the apparent relationship.
The problem is, as the name suggests, that the variable is missing, and trying to find it is
almost impossible. The only way to nullify it is through strong experimental design,
eliminating confounding variables and ensuring that they cannot have any influence.
Randomization, control groups and repeat experiments are the best way to eliminate
these variables and maintain high validity.
In the lemming example, researchers use a whole series of experiments, measuring
predation rates, alternative food sources and lemming breeding rates, attempting to
establish a baseline.
Just to leave you with an example of how difficult measuring internal validity can be:
In the experiment where researchers compared a computer program for teaching Greek
against traditional methods, there are a number of threats to internal validity.
 The group with computers feels special, so they try harder, the Hawthorne Effect.
 The group without computers becomes jealous, and tries harder to prove that they
should have been given the chance to use the shiny new technology.
 Alternatively, the group without computers is demoralized and their performance
 Parents of the children in the computerless group feel that their children are missing
out, and complain that all children should be given the opportunity.
 The children talk outside school and compare notes, muddying the water.
 The teachers feel sorry for the children without the program and attempt to
compensate, helping the children more than normal.
We are not trying to depress you with these complications, only illustrate how complex
internal validity can be.
In fact, perfect internal validity is an unattainable ideal, but any research design must strive
towards that perfection.
For those of you wondering whether you picked the right course, don't worry. Designing
experiments with good internal validity is a matter of experience, and becomes much easier
over time.
A study is externally valid if the hypothesized
relationship, supported by our findings, also holds in other settings and other groups.
External validity is one the most difficult of the validity types to achieve, and is at the
foundation of every good experimental design.
Many scientific disciplines, especially the social sciences, face a long battle to prove that
their findings represent the wider population in real world situations.
The main criteria of external validity is the process of generalization, and
whether results obtained from a small sample group, often in laboratory surroundings, can
be extended to make predictions about the entire population.
The reality is that if a research program has poor external validity, the results will not be
taken seriously, so any research design must justify sampling and selection methods.
What is External Validity?
In 1966, Campbell and Stanley proposed the commonly accepted definition of external
“External validity asks the question of generalizability: To what populations, settings,
treatment variables and measurement variables can this effect be generalized?”
External validity is usually split into two distinct types, population validity and ecological
validity, and they are both essential elements in judging the strength of an experimental
Psychology and External Validity
The Battle Lines are Drawn
External validity often causes a little friction between clinical psychologists and research
Clinical psychologists often believe that research psychologists spend all of their time in
laboratories, testing mice and humans in conditions that bear little resemblance to the
outside world. They claim that the data produced has no external validity, and does not take
into account the sheer complexity and individuality of the human mind.
Before we are flamed by irate research psychologists, the truth lies somewhere between the
two extremes! Research psychologists find out trends and generate
sweeping generalizations that predict the behavior of groups. Clinical psychologists end up
picking up the pieces, and study the individuals who lie outside the predictions, hence the
In most cases, research psychology has a very high population validity, because researchers
take meticulously randomly select groups and use large sample sizes, allowing meaningful
statistical analysis.
However, the artificial nature of research psychology means that ecological validity is
usually low.
Clinical psychologists, on the other hand, often use focused case studies, which cause
minimum disruption to the subject and have strong ecological validity. However, the small
sample sizes mean that the population validity is often low.
Ideally, using both approaches provides useful generalizations, over time!
Randomization in External Validity and Internal Validity
It is also important to distinguish between external and internal validity, especially with the
process of randomization, which is easily misinterpreted. Random selection is an important
tenet of external validity.
For example, a research design, which involves sending out survey questionnaires to
students picked at random, displays more external validity than one where the
questionnaires are given to friends. This is randomization to improve external validity.
Once you have a representative sample, high internal validity involves randomly assigning
subjects to groups, rather than using pre-determined selection factors.
With the student example, randomly assigning the students into test groups, rather than
picking pre-determined groups based upon degree type, gender, or age strengthens the
internal validity.
Population validity is a type of external validity which describes how well the
sample used can be extrapolated to a population as a whole.
It evaluates whether the sample population represents the entire population, and also
whether the sampling method is acceptable.
For example, an educational study that looked at a single school could not be generalized to
cover children at every US school.
On the other hand, a federally appointed study, that tested every pupil of a certain age
group, will have exceptionally strong population validity.
Threats to Validity
1. Threats to Validity
1. History
2. Selection
3. Testing
4. Instrumentation
5. Maturation
6. Mortality
2. Threats to Validity
1. History • Refers to the events that may occur during the time frame of the study
which are not actually part of the study. • They produce effects that influence the
results of the study, either increasing or decreasing the expected results.
2. Selection – Occurs when respondents of the study are chosen not only individually
but as a group.
3. Testing • refers to the pre-test given that results in an improved performance in the
post-test. • To avoid this threat, a pre-test may not be needed for administration. •
However, if a pre-test is given, another measure is recommended to use an as an
alternate form of instrument.
4. Instrumentation • It refers to unreliability in measuring instruments that may result
to an invalid measurement of performance. • The change in instrument used
between the pre-test and post-test may result in an effect not caused by a treatment
5. Maturation • This factor refers to the physiologic and psychologic changes that may
happen to the respondents of the study over a period of time. • If the time frame of a
training program is quite long and rigid, the participants may experience some
psychological discomfort due to boredom, tiredness, hunger and the like.
6. Mortality • It refers to loss of participants during the post-test stage or even during
the implementation of the time frame of the study • when the same group of
individuals is studied over a long period of time. • By the time a follow-up study is
conducted on the same group, some members may have dropped out or may refuse
to cooperate further in the study.
Due to time and cost restraints, most studies lie somewhere between these two extremes,
and researchers pay extreme attention to their sampling techniques.
Experienced scientists ensure that their sample groups are as representative as possible,
striving to use random selection rather than convenience sampling.
Ecological validity is a type of external validity which looks at the testing
environment and determines how much it influences behavior.
In the school test example, if the pupils are used to regular testing, then the ecological
validity is high because the testing process is unlikely to affect behavior.
On the other hand, taking each child out of class and testing them individually, in an isolated
room, will dramatically lower ecological validity. The child may be nervous, ill at ease and is
unlikely to perform in the same way as they would in a classroom.
Generalization becomes difficult, as the experiment does not resemble the real world