Uploaded by Navid Vasseghi (navid13)

Psych exam 2 study guide w info

advertisement
Research Methods Exam 2 Study Guide
Spring 2020
Chapter 4 Psychological Measurement
Psychological Constructs
• Constructs: A variable that cannot be observed directly
• E.g., personality traits, emotional states, attitudes, and abilities
• Psychological constructs cannot be observed directly because:
-They typically represent tendencies
• E.g., someone who is extraverted is not being extraverted 100% of the time
-Often times traits are internal processes
• Neurons and sensory systems
Conceptual Definition
• Conceptual Definition: Describes the behaviors and internal
processes that make up that construct, along with how it relates to
other variables
• E.g., Neuroticism can be related to many other variables and situations
Operational Definition
• Operational definition: a definition of a variable in terms of precisely
how it is to be measured
• Generally fall into one of three broad categories
1. Self-Report Measures: those which participants report on their own
thoughts, feelings, and actions (e.g., self-esteem measure)
2. Behavioral Measures: those in which some other aspect of participants’
behavior is observed and recorded (e.g., Bobo Doll Study)
3. Physiological measures: those that involve recording any of a wide variety
of physiological processes (e.g., heart rate and blood pressure)
Converging Operations
Converging Operations: various operational definitions come
together on the same construct
Psychometric Measures
Psychometrics: The measurement of psychological variables and constructs
Self-report Measures
Self-Report Measures: those which participants report on their own
thoughts, feelings, and actions (e.g., self-esteem measure)
Behavioral Measures
Behavioral Measures: those in which some other aspect of participants’
behavior is observed and recorded (e.g., Bobo Doll Study)
Physiological Measures
Physiological measures: those that involve recording any of a wide variety
of physiological processes (e.g., heart rate and blood pressure)
Levels of Measurement
• Four different levels of measurement
• Nominal Level: use for categorical variables and involves assigning scores
that are category labels
• E.g., marriage, ethnicity, color
• Ordinal Level: involves assigning scores so that they represent the rank order
• E.g., ranking with Likert scales
• Interval: assigning scores using numerical scales in which intervals have the
same interpretation throughout. Does not have an “absolute zero”
• E.g., Fahrenheit or Celsius, IQ Tests, SAT scores
• Ratio: Has equal intervals between values and has a theoretical “true zero”
• E.g., height, weight, money, number of siblings, test grades
Reliability
• Reliability: refers to the consistency of a measure.
• Psychologists consider three types of consistency
1. Test-retest reliability
2. Across items
3. Inter-rater reliability
Test-Retest Reliability
Test-Retest Reliability: The extent to which scores on a measure are consistent across time for
the same individual
• E.g., someone who is intelligent today will likely be intelligent later
Internal Consistency
Internal Consistency: The extent to which the items on a multiple- item measure are consistent
with each other
• Ensuring that all the items on a measure reflect the same underlying Construct
• E.g., on the self-esteem scale, people who agree that they are a person of worth
should tend to agree that they have a number of good qualities
Split-half Correlation
Split-half Correlation: The correlation between scores based on one half of the items on a
multiple-item measure with scores based on the other half of the items
• E.g., correlating responses on the even and odds questions on the self-esteem
questionnaire
Inter-Rater Reliability
Inter-Rater Reliability: How two raters’ scores compare (how similar
they are)
• E.g., if you are interested in measuring university students’ social skills, you
can make video recordings of them as they interact. Then you have two or
more observers watch the videos and rate each student. Ideally, the raters
scores should be highly correlated with each other.
Validity
Validity: The extent to which scores on a measure represent the
variable or construct they are intended to measure
• A measure can have great reliability but no validity
• How do we decide if a measure is valid?
• Three types of validity:
• Face validity
• Content Validity
• Criterion Validity
Face Validity
• Face Validity: Extent to which a measurement method appears “on
its face” to measure the construct of interest. Does it look valid?
• E.g., most people expect a self-esteem questionnaire to include items about
whether they see themselves as a person of worth and have good qualities
• Face validity is at best a very weak
• Its based on people’s intuition
Content Validity
• Content Validity: The extent to which a measure “covers” the
construct of interest
• Does it measure all aspects of the construct it is measuring?
• E.g., if someone conceptually defines test anxiety as involving both anxiety
and negative thoughts, than a measure should include both
Criterion Validity
Criterion Validity: extent to which people’s scores on a measure are
correlated with other variables that one would expect them to be
correlated with
• How related are the scores on the measure to the scores on another related
criteria?
• Criterion: can be any variable that one has reason to think should be
correlated with the construct being measured
• Convergent Validity: Comparing one test to another measuring the same
construct
Discriminant Validity
Discriminant Validity: the extent to which scores on a measure are not correlated with measures
of variables that are conceptually Distinct
• .E.g., self-esteem measures should not be highly correlated with measures
assessing mood
Conceptually Defining the Construct
• Need to have a clear and complete conceptual definition of a
construct for good measurement
• Allows us to make sound decisions on how to measure the construct
• E.g., memory- how would you define this?
• Psychologists have broken up memory into multiple different facets (e.g., verbal
memory, working memory, episodic memory, etc.)
Operationally Defining the Construct
• Conceptual definitions must be transformed into something that can
be directly observed and measured
• Most variables can be defined in many different ways
• E.g., Stress can be defined as scores on a scale or cortisol concentration in
saliva
Implementing the Measure
Implementing the Measure
• Will generally want to implement any measure in a way that maximizes
reliability and validity
• Test the measure under similar conditions(e.g., a quite room)
Reactivity
• Reactivity: how people react when being measured
• Social Desirable Responding – doing or saying things because they think it is the
socially appropriate thing
• Demand Characteristics: subtle cues that reveal how the researcher expects
participants to behave
• E.g., participant attitude towards exercise immediately after reading a passage about
the dangers of heart disease
Chapter 5 Experimental Research
What is an Experiment?
•Experiment: a type of study designed specifically to answer the
question of whether there is a causal relationship between two
variables
•Do changes in IV(independent variable) cause changes in the DV - yes
Three main features
• There are three main features of an experiment:
- Manipulation of the independent variable
- All other variables are held constant
- Random assignment to groups
• If the IV is not manipulated, it is not an experiment!
• If all other variables are not held constant, it is not an experiment!
• If participants are not randomly assigned to groups, it is not an experiment!
• Have to have ALL three features in order for it to be an experiment
Conditions / Levels
•Conditions: levels of the independent variable
Treatment Condition
•Treatment condition- receives some form of treatment
•Treatment- intervention intended to change people’s lives for the better
Experimental Condition
•Experimental condition- receives some form of the IV
•Can have multiple experimental conditions
Control Condition
•Control condition- participants do not receive any form of treatment or IV
Types of Control Conditions
•No-treatment control condition: A control condition in which participants receive
no treatment whatsoever—not even a placebo
•Placebo: A treatment that lacks any active ingredient or element that should make
it effective
•Placebo effect: The positive effect of a placebo
•Placebo poses a serious problem for researchers who want to determine whether
a treatment works
•Wait-list control condition: Receives treatment at a later point
Extraneous Variables
•Extraneous Variables: variables other than the IV and DV
• Generally want to control for extraneous variables
• E.g., keeping the setting the same for all participants
Confounding Variables
•When an extraneous variable changes with the IV, it is called a
Confounding variable
• Random assignment minimizes influence of confounding variables on the DV
• And therefore minimizes threats to internal validity
what counterbalancing accomplishes:
• It controls the order of conditions so that it is no longer a confounding variable
Internal Validity
•Internal Validity: Refers to the degree to which we can confidently
infer a causal relationship between variables
•This is highest in experimental research and lowest in
non-experimental research (e.g., correlational studies)
•(see previous slide on internal validity for more info)
Manipulation of the Independent Variables
•Manipulate: change its levels systematically so that different group of
participants are exposed to different levels of that variable, or the
same group of participants is exposed to different levels at different
Times.
• E.g., To see whether expressive writing affects people’s health – a researcher
might instruct some participants to write about traumatic experiences and
others to write about neutral experiences.
Single Factor Two-Level Design
•Single Factor Two-Level Design: experiments with a single IV with two
levels
Single Factor Multi-Level Design
•Single Factor Multi-Level Design: experiments with one IV with more
than two conditions
Between-Subjects Design
•Between-subjects design and within-subjects design
• Can also combine the two into what is called a Mixed Methods Design
• Between-subjects experiment: An experiment in which each
participant is tested in one condition
• Participants are randomly assigned to different groups
• E.g., Treatment A vs. Treatment B- participants are randomly assigned to either
group
Random Assignment
• Random assignment = Each participant has an equal chance of being assigned to
each condition
• Random assignment is different than random selection
• Random assignment minimizes influence of confounding variables on the DV
• And therefore minimizes threats to internal validity
•In its strictest sense, random assignment should meet two criteria.
• One is that each participant has an equal chance of being assigned to each
condition (e.g., a 50% chance of being assigned to each of two conditions)
• The second is that each participant is assigned to a condition independently
of other participants
Block Randomization
•Block Randomization: all conditions occur once in the sequence
before any of them is repeated
Matched Groups
•Matched-Groups Design: Alternative to random assignment.
Participants in the various conditions are matched on the dependent
variable or on some extraneous variable prior to the manipulation of
the IVs
• This guarantees that these variables will not be confounded across the
experimental conditions
• E.g., in a study examining whether expressive writing affects people’s health,
the experimenter could start by measuring various health related variables.
Participants are then matched on how health or unhealthy they are. They are
then assigned to groups.
Types of Carryover Effects
• Carryover effect: An effect of being tested in one condition on participants’
behavior in later conditions
• Types of carryover effect: an effect of being tested in one condition on
participants’ behavior in later conditions
• Practice effect: Participants perform better on a task in later conditions because they have
had practice
• Fatigue effect: Participants perform worse on a task in later conditions because they have
become tired or bored
• Context effect: Participants perceive or interpret their task according to the context of
previous tasks
Minimizing Carry Over Effects with Counterbalancing
•Counterbalancing: Systematically varying the order of conditions
across participants
• Two ways to think about what counterbalancing accomplishes:
• It controls the order of conditions so that it is no longer a confounding variable
• If there are carryover effects, it makes it possible to detect them
Four Big Validities:
Internal Validity
•Internal Validity: Refers to the degree to which we can confidently
infer a causal relationship between variables
•This is highest in experimental research and lowest in
non-experimental research (e.g., correlational studies)
•(see previous slide on internal validity for more info)
External Validity
•External validity: the extent to which findings can be generalized to
people and contexts beyond the experiment
• Mundane Realism: When the participants and the situation studied are
similar to those that the researchers want to generalize to and participants
encounter every day
• Psychological Realism: Where the same mental process is used in both the
laboratory and real world
Construct Validity
•Construct Validity: ensuring the research question is clearly
operationalized by the study’s methods
•In the Darley and Latane study they were interested in studying “does
helping behavior become diffused?”
• They hypothesized that the participants in a lab would be less likely to help
when they believed there were no more potential helpers besides
themselves. This conversion from research question to experiment design is
called operationalization (operational definition)
• They operationalized the IV of diffusion of responsibility by increasing the
number of potential helpers (there was a crisis and more individuals means
more helpers)
Statistical Validity
•Statistical Validity: Concerns the proper statistical treatment of data
and the soundness of the researchers’ statistical conclusion
•Types of tests commonly used are t-tests, ANOVA, regression,
correlation
•A study also needs enough participants for statistical significance
• This is generally done through what is called a power analysis
Experimenter Expectancy Effect
•Experimenter Expectancy Effect:
• Rosenthal and Fode, 1963 conducted a study with students training
genetically similar rats to run a maze. Students were told their rats were
either “maze-bright” or “maze-dull”
• What do you think happened?
Experimenter Bias
• Experimenter bias: When researchers’ biases inadvertently affect participants’
behaviors
Biosocial Effects
• Biosocial effects: When an experimenter’s characteristics affect participants’
behaviors
Psychosocial Effects
• Psychosocial effects: When an experimenter’s attitude/personality affects
participants’ behavior
Blind and Double-Blind Study
•Blind study
• Double blind study: Neither experimenters nor participants know what group
the participants are in.
• Single blind study: The participants do not know what condition they are in.
Chapter 6 Non-Experimental Research
Non-Experimental Research
•Non-Experimental Research: research that lacks the manipulation of
an independent variable
•Research methods that do not meet the three criteria of an
Experiment
• Manipulation of an IV
• Holding extraneous variables constant
• Random assignment
When to use non-experimental research
•Research question or hypothesis relates to a single variable rather
than a statistical relationship between two variables
• (e.g., how accurate are people’s first impressions?)
•Research question pertains to a non-causal statistical relationship
• (e.g., correlation between verbal intelligence and mathematical intelligence)
•Research question is about a causal relationship, but the independent
variable cannot be manipulated or randomly assigned
• (e.g., does damage to the hippocampus impair one’s ability to remember)
•Research question is broad and exploratory
• (what is it like to be working mother diagnosed with depression?)
Single Variable Research
•Description of a single variable
• Single-variable research
•
E.g., Milgram’s obedience study
Correlational Research
• Correlational research- IV is not manipulated & no random assignment
• No attempt to control for extraneous variables
Quasi-experimental Research
• Quasi-experimental research- No random assignment
• IV is manipulated across conditions
Observational Research
a research technique where you observe participants and phenomena in their most natural
settings
Nonexperimental Research and Internal Validity
•Internal validity is compromised in nonexperimental research
• Because extraneous variables are not held constant
•In order of lowest internal validity to highest:
• Correlational (low)
• Quasi-experimental (moderate)
• Experimental (high)
Correlational Research
•Types of nonexperimental research
•Researcher measures and assesses the relationship between two
variables
•Why use correlational research?
• Do not think that relationship is causal
• Cannot manipulate the IV
•Variables can be quantitative or categorical
• E.g., relationship between gender (categorical) and verbal fluency
(quantitative)
• E.g., relationship between age (quantitative) and verbal fluency (quantitative)
Pearson’s Correlation Coefficient
•Scatterplots are often used to represent correlations
•Pearson’s Correlation Coefficient (Pearson’s r): used to represent the
strength of the correlation
Quantitative Research
• Quantitative Research: starts with a focused research question or
hypothesis
• Collect a small amount of numerical data from a large number of individuals
• Describe the resulting data using statistical techniques
• Draw general conclusions about some large population
Qualitative Research
• Qualitative Research: originated in the disciplines of anthropology and
sociology
• Begins with less focused research question
• Collect large amounts of relatively unfiltered data from a small number of
individuals
• Describe data using nonatypical techniques
• Grounded theory, thematic analysis, critical discourse analysis or interpretative
phenomenological analysis
Data Analysis in Qualitative Research
•What determines if data is quantitative vs qualitative depends more
on what researchers do with the data they collected
• E.g., interview about religion and alcohol
•What does qualitative data analysis look like?
Grounded Theory
• Grounded Theory:
• Done in stages
• First they identify ideas that are repeated throughout the data
• Then they organize ideas into smaller number of broader themes
• Then they write Theoretical Narratives – an interpretation of the data in terms of themes
that they have identified
• These narratives focus on the subjective experiences of the participants and is usually
supported by many direct quotations from the participants themselves
Mixed-Methods Research / Triangulation
• Mixed-Methods Research & Triangulation are two ways to combine both
quantitative and qualitative research
• E.g., use qualitative research for hypothesis generation and qualitative research for
hypothesis testing
Naturalistic Observation
•Naturalistic Observation: observational method that involves
observing people’s behavior in the environment in which it typically
occurs
• A type of field research (e.g., Jane Goodall and chimpanzee research)
• Disguised Naturalistic Observation: When researchers engage in naturalistic
observation by making their observations as unobtrusively as possible so that
participants are not aware that they are being studied
• Undisguised Naturalistic Observation: Where the participants are made
aware of the researcher presence and monitoring of their behavior
Disguised Naturalistic Observation
• Disguised Naturalistic Observation: When researchers engage in naturalistic
observation by making their observations as unobtrusively as possible so that
participants are not aware that they are being studied
Undisguised Naturalistic Observation
• Undisguised Naturalistic Observation: Where the participants are made
aware of the researcher presence and monitoring of their behavior
Drawbacks of Naturalistic observation
•Drawbacks of Naturalistic Observations
• Reactivity refers to when a measure changes participants’ behavior.
• Hawthorne effect - the alteration of behavior by the subjects of a study due to their awareness
of being observed.
• People get used to being observed (e.g., reality TV shows)
Participant Observation
•Participant Observation: Researchers become active participants in
the group or situation they are studying
• Rational is that there may be important information that is only accessible to,
or can be interpreted only by, someone who is an active participant in the
group or situation.
• Disguised participant observation: the researcher pretends to be a member
of the social group they are observing and conceal their true identity as
researchers
• Undisguised participant observation: researchers become part of the group
they are studying and they disclose their true identity as researchers to the
group under investigation
Structured Observation
•Structured Observation: investigator makes careful observations of
one or more specific behaviors in a particular setting that is more
structured than the settings used in naturalistic or participant
observation
• E.g., Mary Ainsworth and the Stranger Situation
•Helps investigate a limited set of behaviors
•Far more efficient than naturalistic and participant observation
•Decreased external validity
•Coding: A part of structured observation whereby the observers use a
clearly defined set of guidelines to "code“ behaviors—assigning
specific behaviors they are observing to a category—and count the
number of times or the duration that the behavior occurs.
Case Study
•Case Study: in-depth examination of an individual, social units (e.g.,
cults), or events (e.g., natural disasters)
•Tend to be more qualitative in nature
•Useful because they provide a level of detailed analysis not found in
many other research
•Only way to study rare conditions
Patient H.M.
•27 year old man who experienced severe
seizures
•Underwent brain surgery to remove his
hippocampus and amygdala
•What do you think happened?
•Seizures reduced but lost his ability to form new memories
• E.g., could not learn who the new president was
•However, he was able to learn new skills
• E.g., learning how to use a computer without remembering how
Archival Research
•Archival data
• Use data that has already been collected for another purpose
• Newspapers, census data, institutional records, hospital records
• Content analysis- analyze the content
• E.g., number of times nature-based terms are used in a dictionary
Chapter 12 Descriptive Statistics
Descriptive Statistics
•Descriptive Statistics – a set of techniques for summarizing and
displaying data
Distribution
•Distribution – way the scores are distributed across the levels of that
variable.
• Example – In a sample of 100 university students, distribution of the variable
“sex” might be such that 44 have a score of “male” and 56 have a score of
female
Frequency Tables
•Frequency Table – One way to display the distribution of a variable
Histograms
•Histogram – a graphical display of a distribution
Central Tendency – the point around which the scores in the
distribution tend to cluster
Mean – sum of the scores divided by the number of scores
Median – middle score in the sense that half the scores in the
distribution are less than it and half are greater than it
Mode – the most frequent score in a distribution
Variability: The extent to which the scores vary around their
central tendency
Range: The difference between the highest and lowest
scores in the distribution
Standard deviation: Average distance between the scores
and the mean
Variance: mean of the squared differences
Percentile Rank
•Percentile Rank: the percentage of scores in the distribution that are
lower than the score
• Percentile rank of 80 means you scored higher than 80 percent of people
Z-Score
•Z-Score: indicates how far above or below the mean a raw score is,
but it expresses this in terms of the standard deviation
• Difference between the individual’s score and the mean of the distribution,
divided by the standard deviation of the distribution
• Z = (X-M)/SD
• Example: Distribution of IQ scores with a mean of 100. Single score is 110
• Standard deviation is 15
• (110-100) / 15 = +0.67
• So a score of 110 is 0.67 standard deviations above the mean
•Z-Scores are important because:
• Provide a way of describing where an individual’s score is located within a
distribution and are sometimes used to report the results of standardized
tests
• Provide one way of defining outliers
• If a z score is less < -3.00 or > +3.00 than it is probably an outlier because it is 3 standard
deviations away from the mean
• Play an important role in understanding and computing other statistics
Effect Size
•Effect Size: describes the strength of a statistical relationship
•Effect sizes
• Small = 0.20
• Medium = 0.50
• Large = 0.80 or greater
Cohen’s d
•Cohen’s d: A measure of the effect size for a difference between two
groups or conditions
• the difference between the two means divided by the standard deviation
• d = (M1-M2)/SDpooled
• Typically compares a treatment group to a control group
• Treatment group is usually M1
• Otherwise, the larger number is usually M1 so that the number is +
Line graphs
: A graph used to show the relationship between two variables
• In general, line graphs are used when the variable on the x-axis has (or is
organized into) a small number of distinct values
• Scatterplots are used when the variable on the x-axis has a large
number of values
Linear Relationships
•Linear Relationships – A statistical relationship in which, as the X
variable increases, the Y variables changes at a constant rate
• Best described by a straight line
Nonlinear Relationships
•Nonlinear Relationships – A statistical relationship in which, as thee X
variable increases, the Y variable does not increase or decrease at a
constant rate
• Best Described by a curved line
Bar Graphs
•Bar Graphs: Generally used to present and compare the mean scores
for two or more groups or conditions.
Line Graphs
•Line Graphs: used when the IV is measurable in more continuous
manner (e.g., time) or to present correlations between quantitative
variables when the IV has a relatively small number of distinct levels
• Each Point should represent the mean score on the DV for participants at one
level of the IV
Scatterplots
•Scatterplots: used to present correlations and relationships between
quantitative variables when the variable on the x-axis has a large
number of levels
• Each point represents an individual rather than the mean for a group of
individuals
• No line connecting the points
Chapter 13 Inferential Statistics
Statistics
•Statistics: Descriptive data that involves measuring one or more
variables in a sample and computing descriptive summary data (e.g.,
means, correlation coefficients) for those variables
Parameters
•Parameters: Corresponding values in the population
Sampling Error
•Sampling Error: The random variability in statistic from sample to
sample
Null Hypothesis Testing
•Purpose of null hypothesis testing is to help researchers decide
between
• There is a relationship in the population, and the relationship in the sample
reflects this
• There is no relationship in the population, and the relationship in the sample
reflects only sampling error
•Null Hypothesis Testing: formal approach to
deciding between two interpretations of statistical
relationship in a sample
Null Hypothesis
• Null Hypothesis (H0): The idea that there is no
relationship in the population and that the relationship
in the sample reflects only sampling error
• Informally, the null hypothesis is that the sample relationship
“occurred by chance”
Alternative Hypothesis
• Alternative Hypothesis (HA or H1): An alternative to the
null hypothesis, this hypothesis proposes that there is a
relationship in the population and that the relationship
in the sample reflects this relationship in the population
Logic of Null Hypothesis Testing
•Assume for the moment that the null hypothesis is true. There is no
relationship between the variables in the population.
•Determine how likely the sample relationship would be if the null
hypothesis were true.
•If the sample relationship would be extremely unlikely, then reject
the null hypothesis in favor of the alternative hypothesis. If it would
not be extremely unlikely, then retain the null hypothesis
P-value
•p-value: The probability of obtaining the sample result or a more
extreme result if the null hypothesis were true
α (alpha)
•α (alpha): the criterion that shows how low a p-value should be
before the sample result is considered unlikely enough to reject the
null hypothesis (Usually set to .05)
Statistical Significance
•Statistically Significant: An effort that is unlikely due to random
chance and therefore likely represents a real effect in the population
Sample Size and Relationship Strength
•The stronger the sample relationship and the larger the sample, the
less likely the result would be if the null hypothesis were true
• Imagine a study in which a sample of 500 women is compared with a sample
of 500 men in terms of some psychological characteristic, and Cohen’s d is a
strong 0.50
• If there were really no sex difference in the population, then a result this
strong based on such a large sample should seem highly unlikely
•Sometimes the result can be weak and the sample large, or the result
can be strong and the sample small.
Statistical Significance vs Practical Significance
•Based on the previous table, a statistically significant result is not
necessarily a strong one
•Even a weak result can be statistically significant if the n is big enough
•Practical Significance: importance or usefulness of the result in some
real-world context
• E.g., many sex differences are statistically significant but are not practically
significant
Basic Null Hypothesis Tests
t-Test
•t-Test: focuses on the difference between two means
• The one-sample t-test
• The dependent-sample t-test
• Independent samples t-test
One-Sample t-test
•One-Sample t-test: used to compare a sample mean (M) with a
hypothetical population mean (µ0) that provides some interesting
standard of comparison
• Null hypothesis is that the mean for the population is equal to the
hypothetical population mean: (µ=µ0)
• HA= mean of the population is different from the hypothetical (µ≠µ0)
Dependent-Sample t-test
•Paired-Samples t-test (paired dependent-samples t-test): used to
compare two means for the same sample tested at two different
times or under two different conditions
• Most appropriate for pretest-posttest designs or within-subjects experiments
• H0 is that the means at the two times or under the two same conditions are
the same as the population
• HA is that they are not the same
• This test can be one-tailed if the researcher has good reason to expect the
difference in a certain direction
Independent Samples t-test
•Independent-Samples t-test: used to compare the means of two
separate samples (M1 and M2)
• Two samples might have been tested under different conditions in a between
subjects experiment
• Could be pre-existing groups in a cross-sectional design (e.g., men vs women,
extraverts vs introverts)
• The null hypothesis is that the means of the two populations are the same
Two-Tailed test
•Two-tailed test: Where we reject the null hypothesis if the test
statistic for the sample is extreme in either direction (+/-)
One-Tailed test
•One-tailed test: Where we reject the null hypothesis only if the
t-score for the sample is extreme in one direction that we specify
before collecting the data
• Advantage of the one-tailed test is that critical values are less extreme
• However, if the sample mean differs from the hypothetical population mean
in the unexpected direction, then there is no chance at all of rejecting the null
Analysis of Variance
•When there are more than two groups or condition means to
compare, the most common null hypothesis test is the Analysis of
Variance (ANOVA).
One-way ANOVA
• One-way ANOVA: used for between-subjects designs with a single
independent variable
•Used to compare the means of more than two samples in a
between-subjects design
• H0 is that all the means are equal in the population
• The test statistic for the ANOVA is called F
• mean squares between groups (MSB): estimate of the population variance
and is based on the differences among the sample means
• mean squares within groups (MSW) :based on the differences among the
scores within each group
One-way Repeated Measures ANOVA
• One-Way Repeated Measures ANOVA: used for within-subjects design with a
single IV
•Repeated-Measures ANOVA: used for within-subjects design with a
single IV
• Imagine, for example, that the dependent variable in a study is a measure of
reaction time.
• In a between-subjects design, these stable individual differences would simply
add to the variability within the groups and increase the value of MSW (which
would, in turn, decrease the value of F).
• In a within-subjects design, however, these stable individual differences can
be measured and subtracted from the value of MSW. This lower value of MSW
means a higher value of F and a more sensitive test.
Factorial ANOVA
• Factorial ANOVA: Used for between-subjects design with more than one IV
•Factorial ANOVA: when there is more than one IV
• The main difference is that factorial ANOVAs produce an F ratio and p value
for each main effect and for each interaction
• Returning to our calorie estimation example, imagine that the health psychologist tests
the effect of participant major (psychology vs. nutrition) and food type (cookie vs.
hamburger) in a factorial design.
• A factorial ANOVA would produce separate F ratios and p values for the main effect of
major, the main effect of food type, and the interaction between major and food.
Mixed ANOVA
• Mixed ANOVA: Used to compare one or more between-subjects IV and one
or more within-subjects IV
Post Hoc Comparisons
•When we reject the null hypothesis in a one-way ANOVA, we
conclude that the group means are not all the same in the population.
But this can indicate different things
• With three groups, it can indicate that all three means are significantly
different from each other
• One mean might be significantly different from the other two, with no
difference between the remaining two
• E.g., The mean calorie estimates of psychology majors, nutrition majors, and dieticians
are all significantly different from each other. Or it could be that the mean for dieticians
is significantly different from the means for psychology and nutrition majors, but the
means for psychology and nutrition majors are not significantly different from each other
Errors in Null Hypothesis Testing
• Type I Error: Rejecting the null hypothesis
when it is true
• Occur because even when there is no
relationship in the population, sampling error
alone will usually produce significant results
• When the null hypothesis is true and α is .05,
we will mistakenly reject the
null hypothesis 5% of the time.
• In principle, it is possible to reduce the chance
of a Type I error by setting α to something less
than .05.
• Setting it to .01, for example, would mean that
if the null hypothesis is true, then there is only
a 1% chance of mistakenly rejecting it
• Type II Error: Retaining the null hypothesis
when it is false
• In practice, Type II errors occur primarily
because the research design lacks adequate
statistical power to detect the relationship
(e.g., the sample is too small)
• It is possible to reduce the chance of a Type II
error by setting α to something greater than
.05 (e.g., .10).
Problems with Null Hypothesis Testing
•Criticisms of Null Hypothesis Testing
• Convention of rejecting and failing to reject the null based on p<.05
• E.g., One study has a p value of .04 and the other a p value of .06. Although the two
studies have produced essentially the same result, the former is likely to be considered
interesting and worthy of publication and the latter simply not significant
• Null testing is not very informative
• Typically just indicates that there is a relationship but doesn’t describe it in detail
Replicability Crisis
•Replicability Crisis: a phrase that reefers to the inability of
researchers to replicate earlier research findings
• E.g., The results of the Reproducibility Project, which involved over 270
psychologists around the world coordinating their efforts to test the reliability
of 100 previously published psychological experiments (Aarts et al., 2015)
• Although 97 of the original 100 studies had found statistically significant
effects, only 36 of the replications did
Ways of improving scientific rigor
•Ways of improving scientific rigor:
1.
Designing and conducting studies that have sufficient statistical power, in
order to increase the reliability of findings
2.
Publishing both null and significant findings (thereby counteracting the
publication bias and reducing the file drawer problem)
3.
Describing one’s research designs in sufficient detail to enable other
researchers to replicate your study using an identical or at least very similar
procedure
4.
Conducting high-quality replications and publishing these results (Brandt et
al., 2014)
Download