Uploaded by boink boinko

PSY290- Intro to Research Methods: Notes

advertisement
PART I: Introduction to Scientific Reasoning
Chapter 1: Psychology is a Way of Thinking
●
●
Empiricism- basing one’s conclusions on systematic observations
Either a PRODUCER or CONSUMER psych student
○ Producer- produce research studies
○ Consumers- consume info they later apply to their work
●
How psychologists approach their work
○ 1. Act as empiricists and observe the world
■ Use evidence from senses and instruments to make conclusions
■ Cupboard theory: proven wrong- mother valuable bc she is source of food
vs Harlow’s comfort theory
○ 2. Test theories and revise their theories based on data
■ Theory: set of statements that describes general principles about how
variables relate to one another
■ Hypothesis: specific outcome the researcher expects to observe in a
study if theory is accurate
○ Data: set of observations
■ Theory-data cycle: systematic steps to solve a problem
■ Ex:
● Cupboard theory: Mother is valuable because source of food.
● Harry Harlow- contact comfort theory: mother is valuable because
of comfort of cozy touch
■ Bult 2 monkey mothers: one with food and one with only comfort
○ 3. Take an empirical approach to applied research and basic research
○ 4. Test why an effect worse
○ 5. Make work public
-
Good Scientific Theories:
- Supported by data
- Falsifiable
- Exhibit parsimony (simplicity)
- Theories don’t prove anything: data is “consistent with” a theory
-
Types of Research
- Applied research: done with a practical problem in mind
- For ex: “An applied research study might ask, for example, if a school district’s
new method of teaching language arts is working better than the former one.”
-
-
-
-
Basic research: enhance body of knowledge
- Ex: understand structure of visual system, memory, etc
Translational research: use of lessons from basic research to develop and test
applications to treatment
- Bridge from applied to basic research
Scientists write papers and submit to a JOURNAL that is peer-reviewed
- Journal editor sends submission to 3-4 experts
- Peer-review process is anonymous
- Other scientists can comment even after publishing
Theory: set of statements that describe how variables relate to one another
Journal to Journalism
- Journalism: kind of news we hear or see in public
- Benefits and Risks of Journalism
- Benefit: public learns about it
- However, 2 things are important
- Journalists need to report on MOST IMPORTANT scientific stories
- Is it sensational or actually important?
- Journalists must ensure the data is reported accurately
- Mozart Effect: journalists might misrepresent science when
writing for a popular audience
- Called that bc media mispresented data from
Mozart sonata listeners and an intelligence test; in
reality, only one aspect of intelligence was affected
Chapter 2: Sources of Information- Why research
is Best and how to find it
EXPERIENCE VS RESEARCH
- Why you should not trust your experience over research
- Experience has no COMPARISON group
- Comparison group- enables us to compare what would happen
with or without thing we are checking
- Experience is confounded
- Confounds: several possible alternative explanations for an
outcome
- To fix these, we need to carefully control variables
- Research better than experience
- Confederate: actor playing a role for an experiment
-
Ex: steve and the catharsis experiment (catharsis not
actually effective)
- Experience could be an exception/ research is probabilistic
INTUITION VS RESEARCH
Intuition: using hunches about what seems natural or attempting to think about things logically
-
Ways Intuition is biased
- Being swayed by good stories
- Ex: Freud thought catharsis works and “Scared Straight” prison youth
program
- Availability heuristic: things that pop up easily in our mind tend to guide our
thinking
- present/present bias: we fail to think about what we cannot see
- Confirmation bias: tendency to look at info that agrees with what we know
- Bias blind spot: belief that we are unlikely to fall prey to the other biases
previously described
-
Authority
- Only trust authority on subject if thoughts are based on authority’s research
-
3 kinds of sources where psych scientists publish
- Journal articles
- Audience: other psych students and scientists
- Types:
- Empirical articles: report for first time results of an empirical
research study
- Review journal articles: summary of all published studies done in
one research area
- Can use quantitative technique called meta-analysis, which
combines results of many studies and gives number that
summarizes magnitude or effect size of a relationship
- Chapters in edited books
- Generally not first place a study is reported; usually summary of a
collection of research
- Not peer-reviewed as rigorously but only experts are invited to write
- Audience: other psychologists or psych students
- Whole books
- Not common with SCIENTIFIC books; can be for general audience
Finding Scientific Sources
- PsychINFO: database for psych sources
-
Google Scholar
Reading the Research
- Components of Empirical Journal Article
- Abstract- concise summary, 120 words long
- Introduction- 1st- topic of study explained
- Middle- background for research
- Final- specific research goals, questions, hypotheses
- Method- How researchers conducted study
- Results- quantitative and qualitative data + figures + statistical tests
- Discussion
- Opening: research q and methods and how well results support
hypotheses
- Next: study’s importance
- References- sources cited
Read with a purpose
- Empirical Journal Articles
- 2 Questions
- What is the argument?
What is the evidence to support the argument?
- Chapter and Review Articles
- Read the headings to get an idea and categorize
Research in less scholarly places:
- Psych books for general audience
- To see if they are well-written and researched, check REFERENCES
- Wikis as a research source
- Not compregensive, idiosyncratic (represent preferences of contributors), could
be incorrect or vandalized
- Popular media
- Can be helpful if journalist specializes in science written, but if not, can be
oversimplified and simply WRONG
Chapter 3: Three Claims, Four Validities:
Interrogation Tools for Consumers of Research
-
Variable: something that varies
- must have at least 2 levels or values
-
-
Measured variable: value observed and recorded
Manipulated variable: research controls it by assigning study participants to
different levels of the variable
- Vars that cannot be manipulated: IQ or age cause you can’t randomly
assign
Constant- only has one level in the study
Conceptual variable: abstract concept (construct)
- Must be carefully defined at the theoretical level and these definitions are called
conceptual definitions
- Operational definition of variables (operationalization): turn a concept of interest
into a variable
3 claims
-
Claim: argument someone is trying to make
- Types:
- Frequency- describe a particular rate or degree of a single variable
- Measure only 1 variable and variables are MEASURED not
manipulated
- Validities:
- Construct: how well variables are measured
- External: how well sample represents population
- Statistical: margin of error
- Association- argues one level of a variable is likely to be associated with a
particular level of another variable
- 3 types: positive association, negative association, and zero
association
- Validities:
- Construct: measure construct validity of each variable
- External: can it generalize to other populations or contexts
- Statistical: significance and strength
- Causal: one variable is responsible for changing the other
- Association → causality
- 3 criteria
- Variables are correlated
- covariance
- Causal variable came first and outcome later
- Temporal precendence: one variable comes
first in time (independent variable)
- No other explanations exist for relationship
- Called INTERNAL VALIDITY or
third-variable criterion
-
-
Random assignment can help do
this to control for alternative
explanations
- Only an experiment can enable researchers to support a
causal claim
- Validities:
- Construct
- External
- Statistical
- internal
Only research not experiences can create a causal claim
Validity: appropriateness of a conclusion or decision
- Specify which validity we are talking about when we say a claim is “valid”
- Types:
- Construct validity: how well a conceptual variable is operationalized
- Basically how well a study measured or manipulated a variable
- To ensure this is met: establish that each variable has been
measured reliably
- External validity: how well the results of a study represent people and
contexts
- Statistical validity: extent to which a study’s statistical conclusions are
accurate and reasonable
- Whether or not numbers suppport claim
- Errors:
- Type I- mistakenly conclude there is an association when
there really isn’t
- “False positive”
- Type II- conclude there is no association when there really
is
- “miss”
-
Part II- Research Foundations for Any Claim
Chapter 4: Ethical Guidelines for Psychology
Research
-
Historical Examples
-
-
Tuskegee Syphilis Study
- A lot of Black men had syphilis and researchers wanted to study the effect
of untreated syphilis on men’s health over time
- At the time, no treatment was reasonable because methods were risky
- Lasted 40 years and men were recruited from churches
- Were told they had “bad blood” instead of syphilis
- Dangerous tests such as spinal taps were conducted
- Prevented men from being treated even after syphilis received treatment
of penicillin
- 3 distinct categories of unethical choices
- Men were not treated respectfully becayse they were lied to with
no informed decisions
- Men were harmed bc they were not given treatment
- Researchers targeted a disadvantaged social group
Milgram Obedience studies
- Subjects thought they were punishing and torturing a learner who was
really an actor
- 65% of the participants obeyed but obedience decreased when learner
sat in the room
- Also dropped when experimenter was giving instructions from
down the hall or over phone
- Was it ethical?
- Some say it was bad cause it was stressful to the participants
- Some say it was okay because the participants were debriefed
and ensured learner wasn’t harmed
Core Ethical Principles
-
Nuremberg code as a result of the Nuremberg Trials of Nazis
Declaration of Helsinki- guides ethics in medical research and practice
Belmont Report:
- Commission of professionals gathered in Belmont Conference center in Elridge,
Maryland at request of Congress in response to Tuskegee
- Outlines 3 main principles for guiding ethical decision making
- Respect of persons
2 provisions:
- Individuals should be treated as autonomous agents free
to make up mind (with INFORMED CONSENT)
- Some people have less autonomy and are entitled to
special protection when it comes to informed consent
- Ex: children, people with disabilities and prisoners
- Beneficence
- Protect participants from harm
-
Examples of action to follow this:
- To prevent release of private info:
- Anonymous study- no potentially identifying info
collected
- Confidential study- some info collected but it is not
disclosed
-
-
Justice
- Fair balance between people who want to participate in research
and ones who benefit from it
Common Rule:
- Describes detailed ways Belmont Report should be applied
- Fed funded agencies in the US must follow this
Guidelines for Psychologists
-
APA outlines 5 general principles for guiding individual aspects of ethical behavior
1. Respect for persons
2. Beneficience
3. Justice
4. Fidelity and responsibility
a. No sexual relations with students or clients and avoid prior relations
5. Integrity
a. Profs obligated to teach accurately and therapists required to stay current
-
APA has 10 specific enforceable ethical standards
- Ethical Standard 8
-
-
-
-
8.01- IRB (Institutional Review Board): committee responsible for
interpreting ethical principles and ensuring that research using human
participants is conducted ethically
- Mandated by federal law in the US if federal money is involved
- 5 or more people
- At least one scientist
- At least one with academic interests outside sciences
- At least a community member with no tiest to institutions
- When prison discussed, one must be a prisoner advocate
8.02- Informed consent: researcher’s obligation to explain study to
potential participants in every day language and give them chance
whether or not to participate
- In some cases, APA standards indicate informed consent
procedures not necessary
- If study is not likely to cause harm
- If study takes place in an educational setting
- obtaining informed consent also involves informing people
whether the data they provide in a research study will be treated
as private and confidential
8.07- Deception
- Researchers witholding some details of the study from
participants—deception through omission
- actively lied to them—deception through commission
- Is deception ethical?
- In study, researcher must still uphold principal of respect
by informing of risks and benefits
- Benefience: ethical costs and benefits of study
- APA principles and federal guidelines require researchers
to avoid using deceptive research designs except as a last
resort and to debrief participants after the study
8.08- Debriefing
- researchers describe the nature of the deception and explain why
it was necessary and describe study design
- Nondeceptive studies sometimes have it too
- Purpose is to make participation in research an educational
experience
RESEARCH MISCONDUCT
-
Data Fabrication (Standard 8.10) and Data Falsification
- Unethical and has far-reaching consequences
- Data Fabrication (8.10)
- Researcher invent data that fits hypotheses
-
-
-
Data falsification
- Researchers influence a study’s results
Plaigiarism (8.11)
- defined as representing the ideas or words of others as one’s own
- To prevent this, cite all ideas that are not one’s own
- Even when paraphrasing, cite last name and year of publication
Animal Research (8.09)
- By APA, research must
- Care for animals humanely
- Use as few animals as possible
- Must be sure research is valuable enough to justify it
- Animal Welfare Act in US
- Must have IACUC (Institutional Animal Care and Use Committee) that
must approve project
- 3 or more members
- At least 1 vet
- At least 1 practicing scientist familiar with animal research
- At least 1 member of local community
- First, must submit scientific justification and then IACUC monitors
treatment of animals by inspecting labs every 6 months
- Animal Care Guidelines
- Replacement: researchers should find alternatives to animals in research
when possible
- Refinement: researchers must modify experimental procedures to
minimize distress
- Reduction: adopt designs that use the fewest animals possible
- Arguments of Animal Rights groups
- Animals r just as likely as humans to experience suffering
- Animals have inherent rights equal to those as humans
Chapter 5: Identifying Good Measurement
-
Ways to measure variables
- 3 types:
- Self-report
- Recording people’s answers to questions about themselves
- Observational
- Recording observable behaviors
- Such as IQ tests
- Physiological
-
-
Recording biological data such as brain activity, hormone levels,
or heart rate
Operational definition of variable: how research decides to measure or
manipulate the conceptual variable
-
Levels of operational variables can be coded using different scales of measurement
- Categorical variables (or nominal variables)
- Levels are categories
- Examples: sex (M and F), species
- Quantitative variables
- Coded with meaningful numbers
- Ordinal scale: numbers represent ranked order
- Interval scale
- Criteria
- Numbers represent equal intervals
- There is no true zero
- Ratio scale
- Numbers represent equal intervals AND value of 0 truly
means nothing
-
Reliability
- How consistent the results of a measure are
- Types:
- Test-retest reliability- researcher gets consistent score every time with
measure
- To assess: measure same set of participants at least twice within
time frame and then compute r
- Interrater reliability- consistent scores are obtained no matter who (which
researcher) measures variable
- To assess: ask 2 observers to rate same participants at same time
and compute r
- Kappa measures extent to which 2 rates place participants into the
same categories
- Internal reliability- study participant gives consistent answers no matter
how research phrases question
- Relevant for measures that use more than 1 item to get at the
same construct
- Ex: question phrased in multiple ways
- Cronbach’s alpha or coefficient alpha
- Statistic that measures whether or not scale has internal
reliability
- Collect data on scale from large sample of participants and
compute all possible correlations
-
Closer alpha is to 1, better reliability but 0.7 or
higher is sufficient
-
Using correlation coefficient r to quantify reliability
- Number r indicates how close the dots are to a line drawn through them
- Strong when dots are close to line
- R is 0 when relationship is weak and close to 1 and -1 when relationship
is strong
-
Validity: whether operalization is measuring what it is supposed to measure (construct
validity)
- Researchers have to evaluate measure’s validity through research
- What is weight of evidence in favor of this measure’s validity?
-
Ways to subjectively assess validity:
- Face validity: is it subjectively considered to be a plausible operalization of the
conceptual variable
- Content validity: a measure must capture all points of your defined construct
-
Empirical ways to assess validity:
- Criterion validity- whether measure under consideration is associated with a
concrete BEHAVIORAL outcome that it should be associated with
- Known-groups paradigm: researchers see whether scores on the
measure can discriminate among two or more groups whose behavior is
already confirmed
- Convergent validity- pattern of correlations with measures of theoretically similar
constructs
- Divergent validity- pattern of correlations with measures of theoretically dissimilar
constructs
- Test should NOT correlate strongly with measures of constructs that are
different than what we are testing for
- For example: if we’re testing for depression, we get a test that
thinks other diseases are depression
-
Relationship between reliability and validity
- Although a measure may be less valid than it is reliable, it cannot be more valid
than it is reliable
- If a measure does not even correlate with itself, then how can it be more
strongly associated with some other variable?
- Reliability is necessary (but not sufficient) for validity
- Usually will find both reliability and validity info in the Method section in a journal
Chapter 6- Surveys and Observations: Describing what people do
Construct Validity of Surveys and Polls
-
Survey or poll: when people are asked about their social or political opinions
Question Formats
- Open-ended questions- can provide spontaneous and rich info
- Drawback is that responses must be conded and categorized and this can
be time-consuming
- Forced-choice questions: pick best of 2 or more options
- Ex: narcissistic Personality inventory: choose 1 statement of 2 for 40 pairs
- Likert scale
- People presented with statement and asked to rate on a rating scale
(strongly agree, agree, etc)
- If diverges slightly, Likert-type scale
- Ex: Rosenberg self-esteem inventory
- Semantic differential format: slight difference where respondents are
asked to rate a target object using a numeric scale anchored with
adjectives (for ex: 5 star scale)
Writing Well-worded Questions
-
If people answer questions that suggest a particular viewpoint, some ppl change
answers
Ex: leading questions and specifying groups
If researchers want to measure how much wording matters for their topic, they word
questions more than one way and test it
-
Double-barreled questions- asks 2 questions in 1
- Poor construct validity bc people might be responding to the first half of the
question, the second half, or both
-
Negatively worded questions
- Can cause confusion, reducing construct validity
- In case of this, researchers can ask 2 ways and use Cronbach’s alpha to see if
people respond similarly
-
Question order
- Way to control for this is to prepare different versions of a survey with the
questions in diff sequences
- Or if the question we are talking about is the first question
Encouraging Accurate Responses
-
Self-reports are often ideal, cheap, provide meaningful info
- Might also be the only option
Response sets (or nondifferentiation)- short cut respondents take when answering
survey questions
- Ppl might develop a consistent way of answering all the questions, especially to
the end of a long questionnaire
- Weaken construct validity
- One response- acquiescene (yea-saying): people say yes to every item
- Instead of measuring accurately, survey could be measuring tendency to
agree or lack of motivation to think clearly
- How to look for this?
- Include reverse-worded items
- Drawback is that sometimes it creates negatively worded
- Fence sitting- playing it safe by answering in the middle of a scale
- Can be if Q is controversial or unvlear
- Way to solve for this
- Take away the neutral option
- Drawback: if people really do not have an opinion,
choosing a side is an invalid representation
- Use forced-choice questions
- D: can frustrate ppl who feel their opinion is in middle of
options
- Could make thing where IDK is written if person
volunteers ambivalence
-
Socially desired responding (or faking good)- if ppl embarrassed of unpopular opinion,
they will not tell truth
- Another less common- faking bad
- To prevent, research tells respondents that their responses are anonymous
- Not perfect solution bc anonymous respondents may treat surveys less
seriously
- Another way- researchers include special survey items that target socially
desirable responders
- Flagged if they agree with too many of those items
- Another way: researchers ask friends to at them
- Another: using computerized measures to evaluate people’s implicit opinions
about sensitive topics
- Ex: implicit association test
-
Self-reporting more than they can know
-
People cna untintentionally inaccurate response even if they actively volunteer a
response
People’s justifications might be unintentionally misguided
-
Self-reporting memories of events
- Ppl’s memories are not very accurate about events they participated in
- For ex: flashbulb memories about activities during dramatic events
- To test, administer test day after event and then again a few years
later
- Findings
- Overall accuracy is very low
- People’s confidence in memories’ accuracy is unrelated to
their true accuracy- no diff
-
Rating Products
- One study: little correspondence between amazon 5 starts and ratings by
Constumer reports, a product rating firm
- Customer ratings correlated with cost of product and prestige of brand
Construct Validity of Behavioral Observations
-
Observational research: researcher systematically observes and records
- Can be basis for frequency claims
- Can be used to operationalize variables in association and causal claims
-
Ex:
-
-
-
Observing how much ppl talk (Melh et al.)
- Device given measures sound at 12.5 min intervals and records
everything person says
- Published data that showed women and men showed same level of
speaking
Observing parent behavior at a hockey game to batle stereotype about parent
fights at youth hockey
- False- mostly positive comments
Observing families in the evening
- Video crews follow parents and assistants coded behaviors
- Rating from cold to neutral to happy
Making reliable and valid observations
-
Construct validity threatend by 3 problems
-
Observer bias- observers’ expectations influence interpretations
-
Observer effects- observers change behavior of those they are observing; participant
changes behavior to meet expectations
- Ex: maze-bright and maze-dull rats
- Ex: Clever Hans horse- horse very good at detecting questioner head
movements
-
To prevent observer bias and effects:
- Clear rating instructions called “codebooks”- precise statements of how variables
are operationalized
- Using multiple observers- allows measure of interrater reliability
- Still not perfect- observers might share same bias
- Other methods
- Masled research design (blind design)- observers are unaware of purpose
and conditions of study and participants
-
Reactivity: change in behavior if participants know someone is watching
- Happens even with animals
- To fix- solutions
- Sol. 1- blend in
- Make unobtrusive observations
- For ex: one way mirror, act like person in crowd
- Sol. 2- wait it out
- Participant eventually forgets he/she is being watched
- Sol. 3- measure the behavior’s results
- Don’t observe behavior directly, just what it leaves behinds
Observing People Ethically
-
It’s ok, ppl say, in locations where ppl are aware they are in public
Secret methods- ethical in some conditions
- Ppl must obtain permission in advance or if hidden, must explain procedure at
conclusipn of study (must erase if ppl object)
Chapter 7- Sampling: Estimating the Frequency of Behaviors and Beliefs
Generalizability- does the sample represent the population?
-
-
External validity- can the study be generalized to larger populations? Is important for
frequency claims
- Of sample: is sample is adequate to entire population
- “Generalizes to” or “is representative of”
Sample types
- Biased or unrepresentative sample
- Some members of population have a much higher chance of being
included
- 2 ways to obtain
- Only those who they can contact conveniently
- For ex: ppl who respond to online surveys or polls
- Only those who volunteer to respond
- Self-selection
- Unbiased or representative sample
- All members have equal chance
Obtaining a Representative Sample: Probability Sampling Techniques
-
Types
- Probability or random sampling- each member has equal chance of being
selected
- Types- Simple random sampling
- Cluster sampling: option when people are already divided into
arbitrary groups
- Clusters of participants randomly selected and all
individuals in a selected cluster are used
- Multistage sampling
- 2 random samples are selected: a random sample of
clusters, and then a random sample of people within those
clusters
- Technique- stratified random sampling
- Researcher purposefully selects particular
demographic categories (strata) and randomly
selected individuals within each of the categories,
proportionate to their assumed membership in the
-
population, proportionate to membership in the
population
- Differs from cluster sampling bc
- More meaningful categories
- Sample sizes reflect proportion in
population
- Variation- oversampling
- Researcher intentionally overrepresents one
or more groups
- Later, final results are adjusted so that the
weights correspond to actual proportion in
population
Nonprobability sampling- results in a biased sample
- Convenience sampling
- Purposive sampling
- Researchers want to study only certain kinds of people, they
recruit only those particular participants
- When done in NONRANDOM WAY, called purposive
- For ex: testing effectiveness of drug to quit smoking would
choose smokers
- Snowball sampling- can help find rare individuals
- Participants asked to recommend acquaintances
- Quota sampling
- Target numbers are set for subsets of population of interest
- Diff from stratified random bc the participants are selected
nonrandomly
Random sampling vs random assignment
-
Random sampling
- Enhances external validity
Random assignment
- Researchers assign participants into groups at random
- Enhances internal validity helping ensure that the comparison group and the
treatment group have the same kinds of people in them, thereby controlling for
alternative explanations.
Interrogating External Validity: What matters most?
-
Sometimes, unknown external validity is OK
In FREQUENCY claims, external validity is a PRIORITY
When external validity is a lower priority
- Bc random assignment prioritized over random sampling
- Sometimes sample bias isn’t relevant to the claim
Larger samples are not more representatives
-
If rare phenomenon, we need a large sample to locate enough instances for valid
statistical analysis
But otherwise, external validity is bout HOW not HOW MANY
Only 1000-2000 ppl really needed for public opinion
Chapter 8- Bivariate Correlational Research
Introducing Bivariate Correlations
-
Bivariate association = involves exactly 2 variables
3 types of associations: positive, negative, and zero
Authors present bivariate correlations between diff pairs of variables separately
Review: Describing Associations between 2 quantitative variables
-
Positive r = positive relationship
-
Describing associations with Categorical Data
-
When at least one of the variables is categorical, researchers use diff stats test
- More common to test whether diff between group averages is statistically
significant using a t test
A study with all measured variables is correlational
-
supported by a particular kind of statistic or a particular kind of graph; it is supported by a study
design—correlational research—in which all the variables are measured
Interrogating Association Claims
-
-
Important validities- construct and statistical
Construct- how well was each variable measured? Does it have good reliability and is it
measuring the intended this?
- What is the evidence for its face validity, its concurrent validity, its discriminant and
convergent validity?
Statistical- how well do data support conclusion?
- Question 1- what is the effect size?
- Larger effect size = r closer to 1 = more accurate predictions
- Errors estimating get larger when associations are weaker
- Larger effect sizes are usually more important than small ones
- Question 2- is it statistically significant?- likelihood of getting a correlation of that size just
by chance
- P < 0.05
- Small effect size will be statistically significant if identified in a very large sample
- Question 3- could outliers be affecting association
- In bivariate- outliers mainly problematic when they involve extreme scores on
both variables
- Matter most in small sample
- Question 4- Is there restriction of range?
- If there is not a full range of scores on one variable, it can make correlation
appear smaller than it is
- Correction for restriction of range formula or recruit more ppl
- Can apply if one variable for some reason has little variance
- Questio 5- is the association curvilinear?
- Could fail the r test cause the r test measures linear
Internal Validity- can we make a causal inference from an association?
-
-
we must guard against the powerful temptation to make a causal inference from any association
claim we read
Must satisfy 3 causal criteria
- Covariance
- Temporal precedence- directionality problem
- Internal validity- third-variable problem
When we think of a reasonable third variable explanation for an association claim, how do we
know if it’s an internal validity problem?
- We check if correlation is present in subgroups
-
spurious association- the bivariate correlation is there, but only because of some third
variable
External Validity- To Whom Can the Association be generalized?
-
Size does not matter as much as the way sample was selected from a population of interest
- If not random sample, can’t be sure results will generalize to the population
How important is external validity?
- U can accept study results and leave generalization question to next study
Moderator- when the relationship between two variables changes depending on the level of
another variable, that other variable is called a moderator
In correlational research, moderators can inform external validity.
- When an association is moderated by residential mobility, type of relationship, day of the
week, or some other variable, we know it does not generalize from one of these
situations to the others.
Chapter 9- Multivariate Correlational Research
Reviewing the 3 causal criteria
-
Covariance
Temporal precedence
Internal validity
Establishing temporal precedence with longitudinal designs
-
A longitudinal design can provide evidence for temporal precedence by measuring the same
variables in the same people at several points in time.
Interpreting results:
- Multivariate design gives 3 types of correlations
- Cross-sectional- are 2 variables, measured at same point in time, correlated?
- Auto- evaluate associations of each variable with itself across time
- Cross-lag- show whether the earlier measure of one variable is associated with
the later measure of the other variable
- Help establish temporal precedence by addressing directionality problem
- 3 results:
- One correlation is significant
- Opposite correlation is significant
- Both correlationships are significant
Ruling out third variables with multiple-regression analyses
-
-
-
Basically means to measure other potential interfering correlations
By conducting a multivariate design, researchers can evaluate whether a relationship between
two key variables still holds when they control for another variable
- If researchers take the relationship with third variable into account, is there still a
correlation with our original variables?
- Similar to looking within subgroups
Criterion variable (dependent)- the one researchers want to predict
Predictor or independent variables
Using betas to test for third variables
- One beta for each predictor variable
- A positive beta, like a positive r, indicates a positive relationship between that predictor
variable and the criterion variable, when the other predictor variables are statistically
controlled for.
- Can’t describe effect size like in r- betas change depending on predictor variables being
controlled for
Coefficient b- The coefficient b represents an unstandardized coefficient.
-
-
-
-
A b is similar to beta in that the sign of b—positive or negative—denotes a positive or
negative association (when the other predictors are controlled for). But unlike two betas,
we cannot compare two b values within the same table to each other.
- The reason is that b values are computed from the original measurements of the
predictor variables (such as dollars, centi- meters, or inches), whereas betas are
computed from predictor variables that have been changed to standardized units.
P-value indicates if beta is statistically significant
Adding several predictors to a regression analysis can help answer two kinds of questions.
- First, it helps control for several third variables at once.
- Second, by looking at the betas for all the other predictor variables, we can get a sense
of which factors most strongly predict the chance of pregnancy.
Phrases for regression
- “Controlled for”
- “Taking into account”
- “Adjusting for”
Regression does NOT establish causation
- Not able to establish temporal precedence
- But can eliminate internal validity
Getting at Causality with Pattern and Parsimony
-
-
Researchers can investigate causality by using a variety of correlational studies that all point in a
single, causal direction.
This approach can be called “pattern and parsimony” because there’s a pattern of
results best explained by a single, parsimonious causal theory
- Parsimony- degree to which theory provides simplest explanation, making fewest
exceptions
By converging studies, you can try to find the simplest correlation
- Mult studies diversity makes it hard to raise third variable explanations
In popular media
- When journalists write about science, they do not always fairly represent pattern and
parsimony in research. Instead, they may report only the results of the latest study.
- Selective presentation of scientific process
Mediation
-
Mediator- variable that explains why of patterns between variables
-
Mediator vs third-variable
- when researchers propose a mediator, they are interested in isolating which aspect of the
presumed causal variable is responsible for that relationship.
-
-
A mediator variable is internal to the causal variable and often of direct interest to
the researchers, rather than a nuisance.
Mediator vs moderators
- Mediators ask: Why?
- Moderators ask: Who is most vulnerable? For whom is the association strongest?
-
Chapter 10- Introduction to simple experiments
Example 1: Taking notes
-
Scored the same on factual, long hand did better on conceptual than on laptop
Example 2: Serving Pasta
Larger bowl participants = overall ate more pasta
-
How do experiments establish causal claims
- Covariance
- Because every independent variable has at least two levels, true experiments are
always set up to look for covariance
- Results matter too
- Temporal precedence
- Manipulating a variable ensures it came first in time
- Internal validity
- Done by keeping controls constant
- there can be several possible alternative explanations, which are known as confounds, or
potential threats to internal validity
-
Threats to internal validity
- Design confound: experimenter’s mistake in designing the independent variable
- second variable that happens to vary systematically along with the intended
independent variable and therefore is an alternative explanation for the results
- Only a problem for internal validity if the problem shows systematic variability with the
independent variable
- If unsystematic (random or haphazard) variability, it is not a confound
- But still can obscure patterns
- Selection effects- when the kinds of participants in one level of the independent variable
are systematically different from those in the other
- Can happen if experimenters let participants choose group they want to be in
- Can avoid with
- random assignment
- Matched groups
- Within matched set for a particular variable, researcher randomly
assign one person to one group and the other to another
- Requires more time and resources
-
Independent-Groups vs Within-Groups designs
- Independent-group design (between-subjects or between-groups)
- different groups of participants are placed into different levels of the independent
variable
- 2 basic forms of design
- Posttest-only (equivalent groups, posttest-only)
- participants are randomly assigned to independent variable
groups and are tested on the dependent variable once
- satisfy all three criteria for causation
- They allow researchers to test for covariance by
detecting differences in the dependent variable. (Having
at least two groups makes it possible to do so.) They
establish tem- poral precedence because the
independent variable comes first in time. And when they
are conducted well, they establish internal validity. When
researchers use appropriate control variables, there
should be no design confounds, and random assignment
takes care of selection effects.
- pretest/posttest
- participants are randomly assigned to at least two different
groups and are tested on the key dependent variable
twice—once before and once after exposure to the independent
variable
- Researchers might use a pretest/posttest design if they want to
study improvement over time, or to be extra sure that two groups
were equivalent at the start—as long as the pretest does not
make the participants change their more spontaneous behavior
- Within-groups design (within-subjects)
- Only one group and each person presented with all levels of the independent
variable
- main advantage of a within-groups design is that it ensures the
participants in the two groups will be equivalent
- treating each participant as his or her own control
- Power- probability that a study will show a statistically significant result
when an independent variable truly has an effect in the population
- Requires fewer participants
- 2 basic forms:
- Repeated-measures design
- Participants are measured on a dependent variable more than
once, after exposure to each level of the independent variable
- Concurrent-measures design
- participants are exposed to all the levels of an independent
variable at roughly the same time, and a single attitudinal or
behavioral preference is the dependent variable
- Threat to internal validity- order effects
-
-
being exposed to one condition changes how participants react to the
other condition
- Practice effects (fatigue)- participants either get better at or tired
of the task
- Carryover- some contamination carried over from ne condition to
next
- Use counterbalancing technique to cancel these out
- Diff. sequences
- Full- used if there are only 2 or 3 levels of indiv. var
- All possible condition orders are represented
- Partial- only some are represented
- Can use randomized order
- Or latin square to ensure that every condition appears in
each position at least once
3 disadvantage of within-groups
- Potential order effects
- But can be controlled thru counterbalancing
- Might not be possible or practical
- problem occurs when people see all levels of the independent variable
and then change the way they would normally act
- Demand characteristic- cue that leads participants to guess an
experiment’s hypothesis
-
-
-
Construct validity
- Dependent variables- face validity
- Independent- manipulation check and pilot studies
- A manipulation check is an extra dependent variable that researchers can insert
into an experiment to convince them that their experimen- tal manipulation
worked
- A pilot study is a simple study, using a separate group of participants, that is
completed before (or sometimes after) conducting the study of primary interest
- may use pilot study data to confirm the effectiveness of their
manipulations before using them in a target study
External
- How participants were recruited- random sampling?
- Generalizing to other situations- it is sometimes necessary to consider the results of other
research
Statistical
- If not significant, there is not a covariance
- Effect and sample size can determine covariance strength
- Indicator of standardized effect size d- measure represents how far apart two
experimental groups are on the dependent variable
- The standardized effect size, d, takes into account both the dif- ference between
means and the spread of scores within each group (the standard deviation).
- When d is larger, it usually means the independent variable caused the
dependent variable to change for more of the participants in the study.
- When d is smaller, it usually means the scores of participants in the two
experimental group overlap more
-
3 potential threats to internal validity
- 1. Did the experimental design ensure that there were no design confounds, or did some
other variable accidentally covary along with the intended independent variable? (Mueller
and Oppenheimer made sure people in both groups saw the same video lectures, in the
same room, and so on.)
- 2. If the experimenters used an independent-groups design, did they control for selection
effects by using random assignment or matching? (Random assignment controlled for
selection effects in the notetaking study.)
- 3. If the experimenters used a within-groups design, did they control for order effects by
counterbalancing? (Counterbalancing is not relevant in Mueller and Oppenheimer’s
design because it was an independent-groups design.)
Chapter 11- More on Experiments: Confounding and Obscuring Variables
-
Threats to internal validity- Design confounds: alternative explanation bc experiment was poorly designed
- Another variable varies systemically along with the intended independent
variable
- Selection effects- different independent variables have systematically different types of
participants
- Differences in participants and not intended independent variable caused the
changes
- Order Effects- outcome might be caused by indepdenent vairable but also the order in
which the levels of vairables are presented
- Ppl could be getting tired, bored, or practiced
In only one-group, pretest/posttest design
-
Maturation effects
- Behavior that emerges spontaneously over time not because of any outside
intervention (like our intended variable)
-
-
-
-
-
-
Also note- spontaneous remission of depression in woman example CBT therapy
in book
- To prevent these, comparison group is important
History threats- something specfic happened from pre to post test
- External factor that affects most members of treatment group at same time as
treatment
- To prevent these, use comparison group
Regression threats- if when measured at time 1 it is extreme and closer to its usual
average value at Time 2
- Occur only when a group has an extreme score at pretest
- To prevent, use comparison groups + if initial extremes are observed, account for
them
Attrition threats: due to reduction in participant numbers; becomes problem if systematic
(if only a certain kind of participant drops out)
- To prevent, remove those participants from pretest average too
- Also check if they have extreme scores on pretest- more of attrition effect if they
do
Testing threats- change in participant due to the effects of taking a test more than once
- They become more practiced or bored
- To prevent, maybe abandon pretest and do posttest only design?
- Or comparison group also helps
Instrumentation threats- when measuring instrument changes over time; in observational
research, people might change standards for judging behavior
- To prevent, switch to posttest only or ensure pre and post tests are equivalent
- Could also counterbalance the tests (using some of each both times)
In any study
-
-
Obserer bias- researchers’ expectations influence interpretation of results
Demand characteristics- participants guess what study is supposed to be about and change
behavior
To avoid both!- do a double-blind study
- If not possible, a masked/blind study- participants know which group they are in, but not
observers
Placebo effects- ppl who think they are receiving a valid treatment improve
- TO AVOID: include a group with placebo, one w no therapy, and one w drug
Combined Threats:
-
Selection-history threat- outside event affects only those at one level of independent variable
Selection-attrition threat- only one of experimental groups experiences attrition
-
Null effect: no significant covariance between indepedent and dependent variable
Possible reasons for it
- 1. Maybe indepedent variable rlly doesn’t affect the dependent v
- 2. Study not designed correctly
-
2 factors:
- Not enough between-groups diff (not enough independent var level)
- Weak manipulation- how did researchers operationalize
independent variable? (construct validity)
- Insensitive measures: researchers did not use an
operationalization of dependent variable w enough sensitivity
- So use detailed instruments
- Ceiling and floor effects
- Ceiling effects: all scores squeezed together at high end
- Floor effect: all scores cluster at low end
- Do a MANIPULATION CHECK to detect al of these
- This is a separate dependent var that experimenters
include
- Or design confounds could undo indepedent var effect
- Too much within-groups variability (confounding variables bc too much
stuff to think about)
- For ex: too much unsystematic variability: NOISE (error variance
or unsystematic variance)
- Makes the effect size smaller and can obscure a group
difference
- Can be caused by
- Measurement error: human/instrument factor
that alters a person’s true score
- More random error = variability
- Solutions:
- Use reliable and precise tools
- Measure more instances- so all
possible errors can cancel each
other out
- Due to individual differences between people
- Solutions:
- Change the design to use
within-group design instead
(where you mentioned
before/after of 1 person)
- Add more participants- u get a
larger t-value and it’s easier to
find a significant t
- Situation noise- due to an external distraction
ppl might react differently
- Solution:
- Carefully control surroundings
- All these solutions increase power (likelihood a study will
return a statistically significant result when the
independent variable rlly has an effect)
- To increase power:
- Use within-groups design
- Strong manipulation
-
- Larger number of participants
- Less siuation noise
More power= detect more true patterns
Chapter 12- Experiments with more than 1 variable
-
-
-
-
Experiments with 2 independent variables can show interactions
- Interaction effect: whether effect of original independent variable depends on level of
other independent variable
- Interaction = difference in differences
Intuitive Interactions
- Crossover interaction: lines cross each other- “it depends”
- There is a difference in differences
- Ice cream vs pancake temperature
- Spreading interaction: not parallel and do not cross each other- “only when”
- Dog bribed and told to set
- No difference when condition not met but large difference when it is
Factorial design- used to test for interactions
- One in which there r 2 or more independent variables (factors)
- Test for each combination of independent variables
- Participant variable- variable whose levels are selected or measured not manipulated
- Example: Age
- Form of external validity- tests whether effect of independent variable generalizes across
other variables
- Testing for moderators (variable that changes relationship between 2 other variables)
Interpreting factorial results- main effects and interactions
- Main effect- overall effect of 1 independent variable on the dependent variable averaging
over levels of other independent variable (simple difference)
- 2 main effects w 2 independent vars
- Marginal means are the arithmetic means for each level of an independent variable,
averaging over levels of the other independent variable
- Simple average
- Main effects may or may not be stat significant
- However, interaction is the most important
- Think of a main effect instead as an overall effect—the overall effect of one
independent variable at a time.
- Interaction effect = difference in differences
- To describe
- Either explain both patterns
- Or use key phrases like “it depends” or “especially in”
- Interaction ALMOST ALWAYS MORE IMPORTANT
Factorial Variations
-
Independent-Groups (between subjects)
-
-
-
-
-
-
both independent variables are studied as independent-groups.
- There- fore, if the design is a 2 × 2, there are four different groups of participants
in the experiment.
Within groups (repeated measures)
- both independent variables are manipulated as within-groups.
If the design is 2 × 2, there is only one group of participants, but they participate
in all four combinations, or cells, of the design.
- Requires fewer participants
Mixed- one independent variable is manipulated as independent-groups and the other is
manipulated as within-groups
Increasing # of levels of independent variable
- _ # x _ # (each is # of levels for each independent var)
- Quantity of numbers equals number of independent vars
Increasing # of independent vars = 3 vars = _ x _ x _
- 3 main effects, 3 two-way interactions and 1 three-way interactio
Identifying factorial designs in reading
- Empirical study
- Method section= study design
- Results section= are main effects and interactions significant
- Popular media
- Talk of “it depends”
- Participant variables like age, personality, gender, or ethinicity
Chapter 13- Quasi-Experiments and Small-N Designs
-
Quasi-experiment- researchers do not have full experimental control; researchers study
participants exposed to levels of an independent variable but they might not be able to randomly
assign
-
Examples of Independent-Groups Quasi-Experiments- where there are different participants at
each level of the independent variable (nonequivalent control group design)
- At least 1 treatment + control group, but no random assignment
- Ex: ppl walking by a church building
- Researchers couldn’t choose random ppl to walk by the building
- No random assignment
-
Examples of Repeated-Measures Quasi-Experiment- same participants experience all levels of
an independent variable
- Researcher relies on already scheduled or planned policy that manipulates the
independent variable
- Ex: judicial decision-making
- Example of interrupted time-design- a quasi-experimental study that measures
participants repeatedly on a dependent variable (in this example, parole decision
making) before, during, and after the “interruption” caused by some event (a food
break)
- Ex: health insurance laws- The independent variables were not controlled by
researchers—people were not randomly assigned to states that had health
insurance laws and those that did not.
- nonequivalent control group interrupted time-series design- it combines two
of the previous designs (the nonequivalent control group design and the
interrupted time- series design)
-
-
Internal Validity in Quasi-Experiments
- Main concern is internal validity
- Threats
- Selection effects- relevant only for independent-groups not
repeated-measures
- Participants at one level are systematically different from others
- To prevent:
- can try to make matched groups to control for selection
effects
- Or use wait-list design- all participants plan to receive
treatment but are randomly assigned to do so at different
times (like different time slot for plastic surgery)
- Design Confounds: outside variable that systematically varies with
targeted independent variable
- To prevent, can track data to analyze for design confounds
- Maturation threat- observed change emerges spontaneously over time
- To prevent: make a comparison group
- History threat- external, historical event happens for everyone in study at
same time as the treatment
- To prevent: use comparison group however, vulnerable to
selection-history effect
- Selection-history effect- historical event affects only ppl in
treatment group/comp group
- Regression to the mean- extreme mean caused by combo of random
factors unlikely to happen again
- Mainly a problem whena group is selected for differing high or
low scores at pretest
-
-
Attrition threat- people drop out of study over time and this is systematic
- Easy to check for
Testing and Instrumentation threats- testing : order effect bc participants change
- Instrumentation test: instrument could change or different test
used
- Comparison group can rule these out
Observer bias, demand characteristics and placebo effects: all related to
human objectivity
- All easy to interrogate
- For observer bias, you simply ask who measured the behaviors.
Was the design blind (masked) or double-blind? For
experimental demand, you can think about whether the
participants were able to detect the study’s goals and respond
accordingly. For placebo effects, you can ask whether the design
of a study included a comparison group that received an inert, or
placebo, treatment.
-
Balancing priorities in Quasi-experiments
- Real-world opportunities
- External validity- likelihood patterns will generalize
- Ethics
- Construct validity and statistical validity
- Excellent construct v generally and for stats, just analyze significance and effect
size
-
Are they the same as correlational studies?
- Seem similar when a quasi-experiment uses an independent-groups design— that is,
when it compares two groups without using random assignment
- Difference: researchers do more meddling in quasi-experiments
- Try harder to obtain groups by targeting key locations
-
-
Small-N designs: studying only a few individuals
- How sample selected is more important than size, for external validity
Balancing priorities in case study research- how convinced can we be?
- Experimental control: can make conclusions with good research design
- Helps study special cases
Disadvantages of small-N studies
- Internal validity- hard to pin down actual problem and cause
- External validity- may not represent general population well
- To explore generalizability- might be smart to triangulate (compae case study
results to research using other methods)
3 Small-N designs- used frequently in behavior analysis
-
Stable-baseline- researcher observes behavior for a baseline period before beginning treatment
- If behavior during the baseline is stable, the researcher is more cer- tain of the
treatment’s effectiveness
- If the researchers had collected a single baseline measure before the new
treatment and a single test afterward (a before-and-after comparison), the
improve- ment could be explained by any number of factors, such as maturation
(spontaneous change), regression to the mean, or a history effect. (In fact, a
single baseline record followed by a single posttreatment measure would have
been a small-N version of the “really bad experiment”; see Chapter 11.) Instead,
the researchers recorded an extended, stable baseline, which made it unlikely
that some sudden, spontaneous change just happened to occur right at the time
the new therapy began. Further- more, the stable baseline meant there was not a
single, extreme low point from which improvement would almost definitely occur
(a regression effect). Performance began low and stayed low until the
experimental technique was introduced.
- Baseline gave study INTERNAL VALIDITY
-
-
-
Multiple-baseline design- researchers stagger their introduction of an inter- vention across a
variety of individuals, times, or situations to rule out alternative explanations
- Helps rule out alternative explanations so INTERNVAL VALIDITY to support causality
Reversal design- a researcher observes a problem behavior both with and without treatment, but
takes the treatment away for a while (the reversal period) to see whether the problem behavior
returns (reverses)
- Reversal designs are appropriate mainly for situations in which a treatment may not
cause lasting change.
- Also, there are some ethical issues here
Evaluating 4 validities in small-N designs
- Internal validity, if conducted with good design, is quite high
- External validity- question of generalizability
- Enhanced by triangulating (combining with results of other small-N studies),
specifying population they want to generalize and sometimes, they don’t even
care about generalizing
- Construct validity- measurements reliable and valid
- Use multiple observers and checking for interrater reliability in case of
observation
- Statistical validity- generally not traditional stats
- But effect sizes are important
Chapter 14- Replication, Generalization and the Real World
-
Must be replicable results
3 types of relocation studies
- Direct replication- researchers repeat original study as closely as they can
- Can never usually follow exact detail however
- Problem: if there are threats to internal validity or construct validity in original
study, they are now repeated and also doesn’t test in a new context
- Conceptual replication- researchers explore same research question but with different
procedures
- Same conceptual variables but diff procedures to operationalize
- Replication-plus-extension- researchers replicate original experiment and add variables
to test additional questions
- New tests or conditions or introduce new independent variable
-
Replication debate in psychology- replication crisis bc direct replication dates are bad
- Why do replication studies fail?
- One reason: contexts were manipulated to be too different
- Another: OSC only conducted one replication attempt per original study
- Other: problem with original studies
- Sample size too small or questionable research practices were employed
- HARK- hypothesizing after the results are known
- P-hacking- running diff analysis or individuals that supports conclusion
- Improcements to scientific practice
- Require larger sample sizes
- Open science- practice of sharing data openly for replication and
verification
-
-
-
Preregistration of methods in advance of data collection
Scientifi literature- series of related studies conducted by various researchers that have tested
similar variables
- Meta-analysis: way of mathematically averaging the results of all published and
unpublished studies that have tested the same variables
- Strengths
- Can sort studies into moderators and compute separate effect sizes
- Usually uses all peer-reviewed data
- Can tell overall effect size
- Limitations
- File drawe problem- idea that a meta-analysis might be overesti- mating
the true size of an effect because null effects, or even opposite effects,
have not been included in the collection process
- To combat, researchers should contact colleagues for published
and unpublished data
Reliability, importance and popular media
- Good journalists give sense of literature not only the recent study
- Some summarize press releases and don’t question the study
-
To be important, must a study have external validity?
- Direct replication studies don’t support this, but conceptual and repl+exten can
- Generalizing to other participants- to analyze this, ask how participants were obtained
- If convenience, can’t be sure
- How is important rather than sample size
- U can choose which population u represent
- Generalizing to other situations
- Conceptual replications illustrate this
- Does it generalize to real-world?- ecological validity or mundane realism
- Does it have to be generalizable to many people?
- Not rlly; depends on research mode of researcher
- Theory-testing- designing correlational or experimental research to
investigate support for a theory
- Generalization- researchers want to generalize the findings from the
sample in a previous study to a larger population
- Frequency claims are always in generalization more
- Association and causal are sometimes in generalization more
-
Cultural psychology: a special case of generalization mode (subdiscipline of psychology focusing
on how cultural contexts shape the way a person thinks, feels, and behaves)
- challenged researchers who work exclusively in theory-testing mode by identifying
several theories that were supported by data in one cultural context but not in any other
-
-
-
-
Muller-Lyer illusion: ppl in N America and Europe fall fo this illusion, bu not around whole
world
- he Segall team used their global data to conclude that people who grow up in a
“carpentered world” have more visual experience with right angles as cues for
depth perception than people who grow up in other societies.
- Researchers cannot take generalization for granted
Figure and ground
- japanese students tended to focus on the background first and focused on the
background of the fish, but N Americans recognized the fish primarily
Theory testing using weird participants
- Western, educated, industrialized, rich, democratic
Does study have to take place in a real-world setting
- Real-world (field setting)
- Built in advantage for external validity
- However, can also be replicated in lab (experimental realism)
- Generalization mode and the real world
- Because external validity is of primary importance when
researchers are in gener- alization mode, they might strive for a
representative sample of a population. But they might also try to
enhance the ecological validity of a study in order to ensure its
generalizability to nonlaboratory settings.
- Theory-tesing mode and the real world
- When a researcher is working in theory-testing mode, external
validity and real- world applicability may be lower priorities.
- Theory-testing mode prioritizes internal validity at the expense of
all other considerations, including ecological validity.
Download