PART I: Introduction to Scientific Reasoning Chapter 1: Psychology is a Way of Thinking ● ● Empiricism- basing one’s conclusions on systematic observations Either a PRODUCER or CONSUMER psych student ○ Producer- produce research studies ○ Consumers- consume info they later apply to their work ● How psychologists approach their work ○ 1. Act as empiricists and observe the world ■ Use evidence from senses and instruments to make conclusions ■ Cupboard theory: proven wrong- mother valuable bc she is source of food vs Harlow’s comfort theory ○ 2. Test theories and revise their theories based on data ■ Theory: set of statements that describes general principles about how variables relate to one another ■ Hypothesis: specific outcome the researcher expects to observe in a study if theory is accurate ○ Data: set of observations ■ Theory-data cycle: systematic steps to solve a problem ■ Ex: ● Cupboard theory: Mother is valuable because source of food. ● Harry Harlow- contact comfort theory: mother is valuable because of comfort of cozy touch ■ Bult 2 monkey mothers: one with food and one with only comfort ○ 3. Take an empirical approach to applied research and basic research ○ 4. Test why an effect worse ○ 5. Make work public - Good Scientific Theories: - Supported by data - Falsifiable - Exhibit parsimony (simplicity) - Theories don’t prove anything: data is “consistent with” a theory - Types of Research - Applied research: done with a practical problem in mind - For ex: “An applied research study might ask, for example, if a school district’s new method of teaching language arts is working better than the former one.” - - - - Basic research: enhance body of knowledge - Ex: understand structure of visual system, memory, etc Translational research: use of lessons from basic research to develop and test applications to treatment - Bridge from applied to basic research Scientists write papers and submit to a JOURNAL that is peer-reviewed - Journal editor sends submission to 3-4 experts - Peer-review process is anonymous - Other scientists can comment even after publishing Theory: set of statements that describe how variables relate to one another Journal to Journalism - Journalism: kind of news we hear or see in public - Benefits and Risks of Journalism - Benefit: public learns about it - However, 2 things are important - Journalists need to report on MOST IMPORTANT scientific stories - Is it sensational or actually important? - Journalists must ensure the data is reported accurately - Mozart Effect: journalists might misrepresent science when writing for a popular audience - Called that bc media mispresented data from Mozart sonata listeners and an intelligence test; in reality, only one aspect of intelligence was affected Chapter 2: Sources of Information- Why research is Best and how to find it EXPERIENCE VS RESEARCH - Why you should not trust your experience over research - Experience has no COMPARISON group - Comparison group- enables us to compare what would happen with or without thing we are checking - Experience is confounded - Confounds: several possible alternative explanations for an outcome - To fix these, we need to carefully control variables - Research better than experience - Confederate: actor playing a role for an experiment - Ex: steve and the catharsis experiment (catharsis not actually effective) - Experience could be an exception/ research is probabilistic INTUITION VS RESEARCH Intuition: using hunches about what seems natural or attempting to think about things logically - Ways Intuition is biased - Being swayed by good stories - Ex: Freud thought catharsis works and “Scared Straight” prison youth program - Availability heuristic: things that pop up easily in our mind tend to guide our thinking - present/present bias: we fail to think about what we cannot see - Confirmation bias: tendency to look at info that agrees with what we know - Bias blind spot: belief that we are unlikely to fall prey to the other biases previously described - Authority - Only trust authority on subject if thoughts are based on authority’s research - 3 kinds of sources where psych scientists publish - Journal articles - Audience: other psych students and scientists - Types: - Empirical articles: report for first time results of an empirical research study - Review journal articles: summary of all published studies done in one research area - Can use quantitative technique called meta-analysis, which combines results of many studies and gives number that summarizes magnitude or effect size of a relationship - Chapters in edited books - Generally not first place a study is reported; usually summary of a collection of research - Not peer-reviewed as rigorously but only experts are invited to write - Audience: other psychologists or psych students - Whole books - Not common with SCIENTIFIC books; can be for general audience Finding Scientific Sources - PsychINFO: database for psych sources - Google Scholar Reading the Research - Components of Empirical Journal Article - Abstract- concise summary, 120 words long - Introduction- 1st- topic of study explained - Middle- background for research - Final- specific research goals, questions, hypotheses - Method- How researchers conducted study - Results- quantitative and qualitative data + figures + statistical tests - Discussion - Opening: research q and methods and how well results support hypotheses - Next: study’s importance - References- sources cited Read with a purpose - Empirical Journal Articles - 2 Questions - What is the argument? What is the evidence to support the argument? - Chapter and Review Articles - Read the headings to get an idea and categorize Research in less scholarly places: - Psych books for general audience - To see if they are well-written and researched, check REFERENCES - Wikis as a research source - Not compregensive, idiosyncratic (represent preferences of contributors), could be incorrect or vandalized - Popular media - Can be helpful if journalist specializes in science written, but if not, can be oversimplified and simply WRONG Chapter 3: Three Claims, Four Validities: Interrogation Tools for Consumers of Research - Variable: something that varies - must have at least 2 levels or values - - Measured variable: value observed and recorded Manipulated variable: research controls it by assigning study participants to different levels of the variable - Vars that cannot be manipulated: IQ or age cause you can’t randomly assign Constant- only has one level in the study Conceptual variable: abstract concept (construct) - Must be carefully defined at the theoretical level and these definitions are called conceptual definitions - Operational definition of variables (operationalization): turn a concept of interest into a variable 3 claims - Claim: argument someone is trying to make - Types: - Frequency- describe a particular rate or degree of a single variable - Measure only 1 variable and variables are MEASURED not manipulated - Validities: - Construct: how well variables are measured - External: how well sample represents population - Statistical: margin of error - Association- argues one level of a variable is likely to be associated with a particular level of another variable - 3 types: positive association, negative association, and zero association - Validities: - Construct: measure construct validity of each variable - External: can it generalize to other populations or contexts - Statistical: significance and strength - Causal: one variable is responsible for changing the other - Association → causality - 3 criteria - Variables are correlated - covariance - Causal variable came first and outcome later - Temporal precendence: one variable comes first in time (independent variable) - No other explanations exist for relationship - Called INTERNAL VALIDITY or third-variable criterion - - Random assignment can help do this to control for alternative explanations - Only an experiment can enable researchers to support a causal claim - Validities: - Construct - External - Statistical - internal Only research not experiences can create a causal claim Validity: appropriateness of a conclusion or decision - Specify which validity we are talking about when we say a claim is “valid” - Types: - Construct validity: how well a conceptual variable is operationalized - Basically how well a study measured or manipulated a variable - To ensure this is met: establish that each variable has been measured reliably - External validity: how well the results of a study represent people and contexts - Statistical validity: extent to which a study’s statistical conclusions are accurate and reasonable - Whether or not numbers suppport claim - Errors: - Type I- mistakenly conclude there is an association when there really isn’t - “False positive” - Type II- conclude there is no association when there really is - “miss” - Part II- Research Foundations for Any Claim Chapter 4: Ethical Guidelines for Psychology Research - Historical Examples - - Tuskegee Syphilis Study - A lot of Black men had syphilis and researchers wanted to study the effect of untreated syphilis on men’s health over time - At the time, no treatment was reasonable because methods were risky - Lasted 40 years and men were recruited from churches - Were told they had “bad blood” instead of syphilis - Dangerous tests such as spinal taps were conducted - Prevented men from being treated even after syphilis received treatment of penicillin - 3 distinct categories of unethical choices - Men were not treated respectfully becayse they were lied to with no informed decisions - Men were harmed bc they were not given treatment - Researchers targeted a disadvantaged social group Milgram Obedience studies - Subjects thought they were punishing and torturing a learner who was really an actor - 65% of the participants obeyed but obedience decreased when learner sat in the room - Also dropped when experimenter was giving instructions from down the hall or over phone - Was it ethical? - Some say it was bad cause it was stressful to the participants - Some say it was okay because the participants were debriefed and ensured learner wasn’t harmed Core Ethical Principles - Nuremberg code as a result of the Nuremberg Trials of Nazis Declaration of Helsinki- guides ethics in medical research and practice Belmont Report: - Commission of professionals gathered in Belmont Conference center in Elridge, Maryland at request of Congress in response to Tuskegee - Outlines 3 main principles for guiding ethical decision making - Respect of persons 2 provisions: - Individuals should be treated as autonomous agents free to make up mind (with INFORMED CONSENT) - Some people have less autonomy and are entitled to special protection when it comes to informed consent - Ex: children, people with disabilities and prisoners - Beneficence - Protect participants from harm - Examples of action to follow this: - To prevent release of private info: - Anonymous study- no potentially identifying info collected - Confidential study- some info collected but it is not disclosed - - Justice - Fair balance between people who want to participate in research and ones who benefit from it Common Rule: - Describes detailed ways Belmont Report should be applied - Fed funded agencies in the US must follow this Guidelines for Psychologists - APA outlines 5 general principles for guiding individual aspects of ethical behavior 1. Respect for persons 2. Beneficience 3. Justice 4. Fidelity and responsibility a. No sexual relations with students or clients and avoid prior relations 5. Integrity a. Profs obligated to teach accurately and therapists required to stay current - APA has 10 specific enforceable ethical standards - Ethical Standard 8 - - - - 8.01- IRB (Institutional Review Board): committee responsible for interpreting ethical principles and ensuring that research using human participants is conducted ethically - Mandated by federal law in the US if federal money is involved - 5 or more people - At least one scientist - At least one with academic interests outside sciences - At least a community member with no tiest to institutions - When prison discussed, one must be a prisoner advocate 8.02- Informed consent: researcher’s obligation to explain study to potential participants in every day language and give them chance whether or not to participate - In some cases, APA standards indicate informed consent procedures not necessary - If study is not likely to cause harm - If study takes place in an educational setting - obtaining informed consent also involves informing people whether the data they provide in a research study will be treated as private and confidential 8.07- Deception - Researchers witholding some details of the study from participants—deception through omission - actively lied to them—deception through commission - Is deception ethical? - In study, researcher must still uphold principal of respect by informing of risks and benefits - Benefience: ethical costs and benefits of study - APA principles and federal guidelines require researchers to avoid using deceptive research designs except as a last resort and to debrief participants after the study 8.08- Debriefing - researchers describe the nature of the deception and explain why it was necessary and describe study design - Nondeceptive studies sometimes have it too - Purpose is to make participation in research an educational experience RESEARCH MISCONDUCT - Data Fabrication (Standard 8.10) and Data Falsification - Unethical and has far-reaching consequences - Data Fabrication (8.10) - Researcher invent data that fits hypotheses - - - Data falsification - Researchers influence a study’s results Plaigiarism (8.11) - defined as representing the ideas or words of others as one’s own - To prevent this, cite all ideas that are not one’s own - Even when paraphrasing, cite last name and year of publication Animal Research (8.09) - By APA, research must - Care for animals humanely - Use as few animals as possible - Must be sure research is valuable enough to justify it - Animal Welfare Act in US - Must have IACUC (Institutional Animal Care and Use Committee) that must approve project - 3 or more members - At least 1 vet - At least 1 practicing scientist familiar with animal research - At least 1 member of local community - First, must submit scientific justification and then IACUC monitors treatment of animals by inspecting labs every 6 months - Animal Care Guidelines - Replacement: researchers should find alternatives to animals in research when possible - Refinement: researchers must modify experimental procedures to minimize distress - Reduction: adopt designs that use the fewest animals possible - Arguments of Animal Rights groups - Animals r just as likely as humans to experience suffering - Animals have inherent rights equal to those as humans Chapter 5: Identifying Good Measurement - Ways to measure variables - 3 types: - Self-report - Recording people’s answers to questions about themselves - Observational - Recording observable behaviors - Such as IQ tests - Physiological - - Recording biological data such as brain activity, hormone levels, or heart rate Operational definition of variable: how research decides to measure or manipulate the conceptual variable - Levels of operational variables can be coded using different scales of measurement - Categorical variables (or nominal variables) - Levels are categories - Examples: sex (M and F), species - Quantitative variables - Coded with meaningful numbers - Ordinal scale: numbers represent ranked order - Interval scale - Criteria - Numbers represent equal intervals - There is no true zero - Ratio scale - Numbers represent equal intervals AND value of 0 truly means nothing - Reliability - How consistent the results of a measure are - Types: - Test-retest reliability- researcher gets consistent score every time with measure - To assess: measure same set of participants at least twice within time frame and then compute r - Interrater reliability- consistent scores are obtained no matter who (which researcher) measures variable - To assess: ask 2 observers to rate same participants at same time and compute r - Kappa measures extent to which 2 rates place participants into the same categories - Internal reliability- study participant gives consistent answers no matter how research phrases question - Relevant for measures that use more than 1 item to get at the same construct - Ex: question phrased in multiple ways - Cronbach’s alpha or coefficient alpha - Statistic that measures whether or not scale has internal reliability - Collect data on scale from large sample of participants and compute all possible correlations - Closer alpha is to 1, better reliability but 0.7 or higher is sufficient - Using correlation coefficient r to quantify reliability - Number r indicates how close the dots are to a line drawn through them - Strong when dots are close to line - R is 0 when relationship is weak and close to 1 and -1 when relationship is strong - Validity: whether operalization is measuring what it is supposed to measure (construct validity) - Researchers have to evaluate measure’s validity through research - What is weight of evidence in favor of this measure’s validity? - Ways to subjectively assess validity: - Face validity: is it subjectively considered to be a plausible operalization of the conceptual variable - Content validity: a measure must capture all points of your defined construct - Empirical ways to assess validity: - Criterion validity- whether measure under consideration is associated with a concrete BEHAVIORAL outcome that it should be associated with - Known-groups paradigm: researchers see whether scores on the measure can discriminate among two or more groups whose behavior is already confirmed - Convergent validity- pattern of correlations with measures of theoretically similar constructs - Divergent validity- pattern of correlations with measures of theoretically dissimilar constructs - Test should NOT correlate strongly with measures of constructs that are different than what we are testing for - For example: if we’re testing for depression, we get a test that thinks other diseases are depression - Relationship between reliability and validity - Although a measure may be less valid than it is reliable, it cannot be more valid than it is reliable - If a measure does not even correlate with itself, then how can it be more strongly associated with some other variable? - Reliability is necessary (but not sufficient) for validity - Usually will find both reliability and validity info in the Method section in a journal Chapter 6- Surveys and Observations: Describing what people do Construct Validity of Surveys and Polls - Survey or poll: when people are asked about their social or political opinions Question Formats - Open-ended questions- can provide spontaneous and rich info - Drawback is that responses must be conded and categorized and this can be time-consuming - Forced-choice questions: pick best of 2 or more options - Ex: narcissistic Personality inventory: choose 1 statement of 2 for 40 pairs - Likert scale - People presented with statement and asked to rate on a rating scale (strongly agree, agree, etc) - If diverges slightly, Likert-type scale - Ex: Rosenberg self-esteem inventory - Semantic differential format: slight difference where respondents are asked to rate a target object using a numeric scale anchored with adjectives (for ex: 5 star scale) Writing Well-worded Questions - If people answer questions that suggest a particular viewpoint, some ppl change answers Ex: leading questions and specifying groups If researchers want to measure how much wording matters for their topic, they word questions more than one way and test it - Double-barreled questions- asks 2 questions in 1 - Poor construct validity bc people might be responding to the first half of the question, the second half, or both - Negatively worded questions - Can cause confusion, reducing construct validity - In case of this, researchers can ask 2 ways and use Cronbach’s alpha to see if people respond similarly - Question order - Way to control for this is to prepare different versions of a survey with the questions in diff sequences - Or if the question we are talking about is the first question Encouraging Accurate Responses - Self-reports are often ideal, cheap, provide meaningful info - Might also be the only option Response sets (or nondifferentiation)- short cut respondents take when answering survey questions - Ppl might develop a consistent way of answering all the questions, especially to the end of a long questionnaire - Weaken construct validity - One response- acquiescene (yea-saying): people say yes to every item - Instead of measuring accurately, survey could be measuring tendency to agree or lack of motivation to think clearly - How to look for this? - Include reverse-worded items - Drawback is that sometimes it creates negatively worded - Fence sitting- playing it safe by answering in the middle of a scale - Can be if Q is controversial or unvlear - Way to solve for this - Take away the neutral option - Drawback: if people really do not have an opinion, choosing a side is an invalid representation - Use forced-choice questions - D: can frustrate ppl who feel their opinion is in middle of options - Could make thing where IDK is written if person volunteers ambivalence - Socially desired responding (or faking good)- if ppl embarrassed of unpopular opinion, they will not tell truth - Another less common- faking bad - To prevent, research tells respondents that their responses are anonymous - Not perfect solution bc anonymous respondents may treat surveys less seriously - Another way- researchers include special survey items that target socially desirable responders - Flagged if they agree with too many of those items - Another way: researchers ask friends to at them - Another: using computerized measures to evaluate people’s implicit opinions about sensitive topics - Ex: implicit association test - Self-reporting more than they can know - People cna untintentionally inaccurate response even if they actively volunteer a response People’s justifications might be unintentionally misguided - Self-reporting memories of events - Ppl’s memories are not very accurate about events they participated in - For ex: flashbulb memories about activities during dramatic events - To test, administer test day after event and then again a few years later - Findings - Overall accuracy is very low - People’s confidence in memories’ accuracy is unrelated to their true accuracy- no diff - Rating Products - One study: little correspondence between amazon 5 starts and ratings by Constumer reports, a product rating firm - Customer ratings correlated with cost of product and prestige of brand Construct Validity of Behavioral Observations - Observational research: researcher systematically observes and records - Can be basis for frequency claims - Can be used to operationalize variables in association and causal claims - Ex: - - - Observing how much ppl talk (Melh et al.) - Device given measures sound at 12.5 min intervals and records everything person says - Published data that showed women and men showed same level of speaking Observing parent behavior at a hockey game to batle stereotype about parent fights at youth hockey - False- mostly positive comments Observing families in the evening - Video crews follow parents and assistants coded behaviors - Rating from cold to neutral to happy Making reliable and valid observations - Construct validity threatend by 3 problems - Observer bias- observers’ expectations influence interpretations - Observer effects- observers change behavior of those they are observing; participant changes behavior to meet expectations - Ex: maze-bright and maze-dull rats - Ex: Clever Hans horse- horse very good at detecting questioner head movements - To prevent observer bias and effects: - Clear rating instructions called “codebooks”- precise statements of how variables are operationalized - Using multiple observers- allows measure of interrater reliability - Still not perfect- observers might share same bias - Other methods - Masled research design (blind design)- observers are unaware of purpose and conditions of study and participants - Reactivity: change in behavior if participants know someone is watching - Happens even with animals - To fix- solutions - Sol. 1- blend in - Make unobtrusive observations - For ex: one way mirror, act like person in crowd - Sol. 2- wait it out - Participant eventually forgets he/she is being watched - Sol. 3- measure the behavior’s results - Don’t observe behavior directly, just what it leaves behinds Observing People Ethically - It’s ok, ppl say, in locations where ppl are aware they are in public Secret methods- ethical in some conditions - Ppl must obtain permission in advance or if hidden, must explain procedure at conclusipn of study (must erase if ppl object) Chapter 7- Sampling: Estimating the Frequency of Behaviors and Beliefs Generalizability- does the sample represent the population? - - External validity- can the study be generalized to larger populations? Is important for frequency claims - Of sample: is sample is adequate to entire population - “Generalizes to” or “is representative of” Sample types - Biased or unrepresentative sample - Some members of population have a much higher chance of being included - 2 ways to obtain - Only those who they can contact conveniently - For ex: ppl who respond to online surveys or polls - Only those who volunteer to respond - Self-selection - Unbiased or representative sample - All members have equal chance Obtaining a Representative Sample: Probability Sampling Techniques - Types - Probability or random sampling- each member has equal chance of being selected - Types- Simple random sampling - Cluster sampling: option when people are already divided into arbitrary groups - Clusters of participants randomly selected and all individuals in a selected cluster are used - Multistage sampling - 2 random samples are selected: a random sample of clusters, and then a random sample of people within those clusters - Technique- stratified random sampling - Researcher purposefully selects particular demographic categories (strata) and randomly selected individuals within each of the categories, proportionate to their assumed membership in the - population, proportionate to membership in the population - Differs from cluster sampling bc - More meaningful categories - Sample sizes reflect proportion in population - Variation- oversampling - Researcher intentionally overrepresents one or more groups - Later, final results are adjusted so that the weights correspond to actual proportion in population Nonprobability sampling- results in a biased sample - Convenience sampling - Purposive sampling - Researchers want to study only certain kinds of people, they recruit only those particular participants - When done in NONRANDOM WAY, called purposive - For ex: testing effectiveness of drug to quit smoking would choose smokers - Snowball sampling- can help find rare individuals - Participants asked to recommend acquaintances - Quota sampling - Target numbers are set for subsets of population of interest - Diff from stratified random bc the participants are selected nonrandomly Random sampling vs random assignment - Random sampling - Enhances external validity Random assignment - Researchers assign participants into groups at random - Enhances internal validity helping ensure that the comparison group and the treatment group have the same kinds of people in them, thereby controlling for alternative explanations. Interrogating External Validity: What matters most? - Sometimes, unknown external validity is OK In FREQUENCY claims, external validity is a PRIORITY When external validity is a lower priority - Bc random assignment prioritized over random sampling - Sometimes sample bias isn’t relevant to the claim Larger samples are not more representatives - If rare phenomenon, we need a large sample to locate enough instances for valid statistical analysis But otherwise, external validity is bout HOW not HOW MANY Only 1000-2000 ppl really needed for public opinion Chapter 8- Bivariate Correlational Research Introducing Bivariate Correlations - Bivariate association = involves exactly 2 variables 3 types of associations: positive, negative, and zero Authors present bivariate correlations between diff pairs of variables separately Review: Describing Associations between 2 quantitative variables - Positive r = positive relationship - Describing associations with Categorical Data - When at least one of the variables is categorical, researchers use diff stats test - More common to test whether diff between group averages is statistically significant using a t test A study with all measured variables is correlational - supported by a particular kind of statistic or a particular kind of graph; it is supported by a study design—correlational research—in which all the variables are measured Interrogating Association Claims - - Important validities- construct and statistical Construct- how well was each variable measured? Does it have good reliability and is it measuring the intended this? - What is the evidence for its face validity, its concurrent validity, its discriminant and convergent validity? Statistical- how well do data support conclusion? - Question 1- what is the effect size? - Larger effect size = r closer to 1 = more accurate predictions - Errors estimating get larger when associations are weaker - Larger effect sizes are usually more important than small ones - Question 2- is it statistically significant?- likelihood of getting a correlation of that size just by chance - P < 0.05 - Small effect size will be statistically significant if identified in a very large sample - Question 3- could outliers be affecting association - In bivariate- outliers mainly problematic when they involve extreme scores on both variables - Matter most in small sample - Question 4- Is there restriction of range? - If there is not a full range of scores on one variable, it can make correlation appear smaller than it is - Correction for restriction of range formula or recruit more ppl - Can apply if one variable for some reason has little variance - Questio 5- is the association curvilinear? - Could fail the r test cause the r test measures linear Internal Validity- can we make a causal inference from an association? - - we must guard against the powerful temptation to make a causal inference from any association claim we read Must satisfy 3 causal criteria - Covariance - Temporal precedence- directionality problem - Internal validity- third-variable problem When we think of a reasonable third variable explanation for an association claim, how do we know if it’s an internal validity problem? - We check if correlation is present in subgroups - spurious association- the bivariate correlation is there, but only because of some third variable External Validity- To Whom Can the Association be generalized? - Size does not matter as much as the way sample was selected from a population of interest - If not random sample, can’t be sure results will generalize to the population How important is external validity? - U can accept study results and leave generalization question to next study Moderator- when the relationship between two variables changes depending on the level of another variable, that other variable is called a moderator In correlational research, moderators can inform external validity. - When an association is moderated by residential mobility, type of relationship, day of the week, or some other variable, we know it does not generalize from one of these situations to the others. Chapter 9- Multivariate Correlational Research Reviewing the 3 causal criteria - Covariance Temporal precedence Internal validity Establishing temporal precedence with longitudinal designs - A longitudinal design can provide evidence for temporal precedence by measuring the same variables in the same people at several points in time. Interpreting results: - Multivariate design gives 3 types of correlations - Cross-sectional- are 2 variables, measured at same point in time, correlated? - Auto- evaluate associations of each variable with itself across time - Cross-lag- show whether the earlier measure of one variable is associated with the later measure of the other variable - Help establish temporal precedence by addressing directionality problem - 3 results: - One correlation is significant - Opposite correlation is significant - Both correlationships are significant Ruling out third variables with multiple-regression analyses - - - Basically means to measure other potential interfering correlations By conducting a multivariate design, researchers can evaluate whether a relationship between two key variables still holds when they control for another variable - If researchers take the relationship with third variable into account, is there still a correlation with our original variables? - Similar to looking within subgroups Criterion variable (dependent)- the one researchers want to predict Predictor or independent variables Using betas to test for third variables - One beta for each predictor variable - A positive beta, like a positive r, indicates a positive relationship between that predictor variable and the criterion variable, when the other predictor variables are statistically controlled for. - Can’t describe effect size like in r- betas change depending on predictor variables being controlled for Coefficient b- The coefficient b represents an unstandardized coefficient. - - - - A b is similar to beta in that the sign of b—positive or negative—denotes a positive or negative association (when the other predictors are controlled for). But unlike two betas, we cannot compare two b values within the same table to each other. - The reason is that b values are computed from the original measurements of the predictor variables (such as dollars, centi- meters, or inches), whereas betas are computed from predictor variables that have been changed to standardized units. P-value indicates if beta is statistically significant Adding several predictors to a regression analysis can help answer two kinds of questions. - First, it helps control for several third variables at once. - Second, by looking at the betas for all the other predictor variables, we can get a sense of which factors most strongly predict the chance of pregnancy. Phrases for regression - “Controlled for” - “Taking into account” - “Adjusting for” Regression does NOT establish causation - Not able to establish temporal precedence - But can eliminate internal validity Getting at Causality with Pattern and Parsimony - - Researchers can investigate causality by using a variety of correlational studies that all point in a single, causal direction. This approach can be called “pattern and parsimony” because there’s a pattern of results best explained by a single, parsimonious causal theory - Parsimony- degree to which theory provides simplest explanation, making fewest exceptions By converging studies, you can try to find the simplest correlation - Mult studies diversity makes it hard to raise third variable explanations In popular media - When journalists write about science, they do not always fairly represent pattern and parsimony in research. Instead, they may report only the results of the latest study. - Selective presentation of scientific process Mediation - Mediator- variable that explains why of patterns between variables - Mediator vs third-variable - when researchers propose a mediator, they are interested in isolating which aspect of the presumed causal variable is responsible for that relationship. - - A mediator variable is internal to the causal variable and often of direct interest to the researchers, rather than a nuisance. Mediator vs moderators - Mediators ask: Why? - Moderators ask: Who is most vulnerable? For whom is the association strongest? - Chapter 10- Introduction to simple experiments Example 1: Taking notes - Scored the same on factual, long hand did better on conceptual than on laptop Example 2: Serving Pasta Larger bowl participants = overall ate more pasta - How do experiments establish causal claims - Covariance - Because every independent variable has at least two levels, true experiments are always set up to look for covariance - Results matter too - Temporal precedence - Manipulating a variable ensures it came first in time - Internal validity - Done by keeping controls constant - there can be several possible alternative explanations, which are known as confounds, or potential threats to internal validity - Threats to internal validity - Design confound: experimenter’s mistake in designing the independent variable - second variable that happens to vary systematically along with the intended independent variable and therefore is an alternative explanation for the results - Only a problem for internal validity if the problem shows systematic variability with the independent variable - If unsystematic (random or haphazard) variability, it is not a confound - But still can obscure patterns - Selection effects- when the kinds of participants in one level of the independent variable are systematically different from those in the other - Can happen if experimenters let participants choose group they want to be in - Can avoid with - random assignment - Matched groups - Within matched set for a particular variable, researcher randomly assign one person to one group and the other to another - Requires more time and resources - Independent-Groups vs Within-Groups designs - Independent-group design (between-subjects or between-groups) - different groups of participants are placed into different levels of the independent variable - 2 basic forms of design - Posttest-only (equivalent groups, posttest-only) - participants are randomly assigned to independent variable groups and are tested on the dependent variable once - satisfy all three criteria for causation - They allow researchers to test for covariance by detecting differences in the dependent variable. (Having at least two groups makes it possible to do so.) They establish tem- poral precedence because the independent variable comes first in time. And when they are conducted well, they establish internal validity. When researchers use appropriate control variables, there should be no design confounds, and random assignment takes care of selection effects. - pretest/posttest - participants are randomly assigned to at least two different groups and are tested on the key dependent variable twice—once before and once after exposure to the independent variable - Researchers might use a pretest/posttest design if they want to study improvement over time, or to be extra sure that two groups were equivalent at the start—as long as the pretest does not make the participants change their more spontaneous behavior - Within-groups design (within-subjects) - Only one group and each person presented with all levels of the independent variable - main advantage of a within-groups design is that it ensures the participants in the two groups will be equivalent - treating each participant as his or her own control - Power- probability that a study will show a statistically significant result when an independent variable truly has an effect in the population - Requires fewer participants - 2 basic forms: - Repeated-measures design - Participants are measured on a dependent variable more than once, after exposure to each level of the independent variable - Concurrent-measures design - participants are exposed to all the levels of an independent variable at roughly the same time, and a single attitudinal or behavioral preference is the dependent variable - Threat to internal validity- order effects - - being exposed to one condition changes how participants react to the other condition - Practice effects (fatigue)- participants either get better at or tired of the task - Carryover- some contamination carried over from ne condition to next - Use counterbalancing technique to cancel these out - Diff. sequences - Full- used if there are only 2 or 3 levels of indiv. var - All possible condition orders are represented - Partial- only some are represented - Can use randomized order - Or latin square to ensure that every condition appears in each position at least once 3 disadvantage of within-groups - Potential order effects - But can be controlled thru counterbalancing - Might not be possible or practical - problem occurs when people see all levels of the independent variable and then change the way they would normally act - Demand characteristic- cue that leads participants to guess an experiment’s hypothesis - - - Construct validity - Dependent variables- face validity - Independent- manipulation check and pilot studies - A manipulation check is an extra dependent variable that researchers can insert into an experiment to convince them that their experimen- tal manipulation worked - A pilot study is a simple study, using a separate group of participants, that is completed before (or sometimes after) conducting the study of primary interest - may use pilot study data to confirm the effectiveness of their manipulations before using them in a target study External - How participants were recruited- random sampling? - Generalizing to other situations- it is sometimes necessary to consider the results of other research Statistical - If not significant, there is not a covariance - Effect and sample size can determine covariance strength - Indicator of standardized effect size d- measure represents how far apart two experimental groups are on the dependent variable - The standardized effect size, d, takes into account both the dif- ference between means and the spread of scores within each group (the standard deviation). - When d is larger, it usually means the independent variable caused the dependent variable to change for more of the participants in the study. - When d is smaller, it usually means the scores of participants in the two experimental group overlap more - 3 potential threats to internal validity - 1. Did the experimental design ensure that there were no design confounds, or did some other variable accidentally covary along with the intended independent variable? (Mueller and Oppenheimer made sure people in both groups saw the same video lectures, in the same room, and so on.) - 2. If the experimenters used an independent-groups design, did they control for selection effects by using random assignment or matching? (Random assignment controlled for selection effects in the notetaking study.) - 3. If the experimenters used a within-groups design, did they control for order effects by counterbalancing? (Counterbalancing is not relevant in Mueller and Oppenheimer’s design because it was an independent-groups design.) Chapter 11- More on Experiments: Confounding and Obscuring Variables - Threats to internal validity- Design confounds: alternative explanation bc experiment was poorly designed - Another variable varies systemically along with the intended independent variable - Selection effects- different independent variables have systematically different types of participants - Differences in participants and not intended independent variable caused the changes - Order Effects- outcome might be caused by indepdenent vairable but also the order in which the levels of vairables are presented - Ppl could be getting tired, bored, or practiced In only one-group, pretest/posttest design - Maturation effects - Behavior that emerges spontaneously over time not because of any outside intervention (like our intended variable) - - - - - - Also note- spontaneous remission of depression in woman example CBT therapy in book - To prevent these, comparison group is important History threats- something specfic happened from pre to post test - External factor that affects most members of treatment group at same time as treatment - To prevent these, use comparison group Regression threats- if when measured at time 1 it is extreme and closer to its usual average value at Time 2 - Occur only when a group has an extreme score at pretest - To prevent, use comparison groups + if initial extremes are observed, account for them Attrition threats: due to reduction in participant numbers; becomes problem if systematic (if only a certain kind of participant drops out) - To prevent, remove those participants from pretest average too - Also check if they have extreme scores on pretest- more of attrition effect if they do Testing threats- change in participant due to the effects of taking a test more than once - They become more practiced or bored - To prevent, maybe abandon pretest and do posttest only design? - Or comparison group also helps Instrumentation threats- when measuring instrument changes over time; in observational research, people might change standards for judging behavior - To prevent, switch to posttest only or ensure pre and post tests are equivalent - Could also counterbalance the tests (using some of each both times) In any study - - Obserer bias- researchers’ expectations influence interpretation of results Demand characteristics- participants guess what study is supposed to be about and change behavior To avoid both!- do a double-blind study - If not possible, a masked/blind study- participants know which group they are in, but not observers Placebo effects- ppl who think they are receiving a valid treatment improve - TO AVOID: include a group with placebo, one w no therapy, and one w drug Combined Threats: - Selection-history threat- outside event affects only those at one level of independent variable Selection-attrition threat- only one of experimental groups experiences attrition - Null effect: no significant covariance between indepedent and dependent variable Possible reasons for it - 1. Maybe indepedent variable rlly doesn’t affect the dependent v - 2. Study not designed correctly - 2 factors: - Not enough between-groups diff (not enough independent var level) - Weak manipulation- how did researchers operationalize independent variable? (construct validity) - Insensitive measures: researchers did not use an operationalization of dependent variable w enough sensitivity - So use detailed instruments - Ceiling and floor effects - Ceiling effects: all scores squeezed together at high end - Floor effect: all scores cluster at low end - Do a MANIPULATION CHECK to detect al of these - This is a separate dependent var that experimenters include - Or design confounds could undo indepedent var effect - Too much within-groups variability (confounding variables bc too much stuff to think about) - For ex: too much unsystematic variability: NOISE (error variance or unsystematic variance) - Makes the effect size smaller and can obscure a group difference - Can be caused by - Measurement error: human/instrument factor that alters a person’s true score - More random error = variability - Solutions: - Use reliable and precise tools - Measure more instances- so all possible errors can cancel each other out - Due to individual differences between people - Solutions: - Change the design to use within-group design instead (where you mentioned before/after of 1 person) - Add more participants- u get a larger t-value and it’s easier to find a significant t - Situation noise- due to an external distraction ppl might react differently - Solution: - Carefully control surroundings - All these solutions increase power (likelihood a study will return a statistically significant result when the independent variable rlly has an effect) - To increase power: - Use within-groups design - Strong manipulation - - Larger number of participants - Less siuation noise More power= detect more true patterns Chapter 12- Experiments with more than 1 variable - - - - Experiments with 2 independent variables can show interactions - Interaction effect: whether effect of original independent variable depends on level of other independent variable - Interaction = difference in differences Intuitive Interactions - Crossover interaction: lines cross each other- “it depends” - There is a difference in differences - Ice cream vs pancake temperature - Spreading interaction: not parallel and do not cross each other- “only when” - Dog bribed and told to set - No difference when condition not met but large difference when it is Factorial design- used to test for interactions - One in which there r 2 or more independent variables (factors) - Test for each combination of independent variables - Participant variable- variable whose levels are selected or measured not manipulated - Example: Age - Form of external validity- tests whether effect of independent variable generalizes across other variables - Testing for moderators (variable that changes relationship between 2 other variables) Interpreting factorial results- main effects and interactions - Main effect- overall effect of 1 independent variable on the dependent variable averaging over levels of other independent variable (simple difference) - 2 main effects w 2 independent vars - Marginal means are the arithmetic means for each level of an independent variable, averaging over levels of the other independent variable - Simple average - Main effects may or may not be stat significant - However, interaction is the most important - Think of a main effect instead as an overall effect—the overall effect of one independent variable at a time. - Interaction effect = difference in differences - To describe - Either explain both patterns - Or use key phrases like “it depends” or “especially in” - Interaction ALMOST ALWAYS MORE IMPORTANT Factorial Variations - Independent-Groups (between subjects) - - - - - - both independent variables are studied as independent-groups. - There- fore, if the design is a 2 × 2, there are four different groups of participants in the experiment. Within groups (repeated measures) - both independent variables are manipulated as within-groups. If the design is 2 × 2, there is only one group of participants, but they participate in all four combinations, or cells, of the design. - Requires fewer participants Mixed- one independent variable is manipulated as independent-groups and the other is manipulated as within-groups Increasing # of levels of independent variable - _ # x _ # (each is # of levels for each independent var) - Quantity of numbers equals number of independent vars Increasing # of independent vars = 3 vars = _ x _ x _ - 3 main effects, 3 two-way interactions and 1 three-way interactio Identifying factorial designs in reading - Empirical study - Method section= study design - Results section= are main effects and interactions significant - Popular media - Talk of “it depends” - Participant variables like age, personality, gender, or ethinicity Chapter 13- Quasi-Experiments and Small-N Designs - Quasi-experiment- researchers do not have full experimental control; researchers study participants exposed to levels of an independent variable but they might not be able to randomly assign - Examples of Independent-Groups Quasi-Experiments- where there are different participants at each level of the independent variable (nonequivalent control group design) - At least 1 treatment + control group, but no random assignment - Ex: ppl walking by a church building - Researchers couldn’t choose random ppl to walk by the building - No random assignment - Examples of Repeated-Measures Quasi-Experiment- same participants experience all levels of an independent variable - Researcher relies on already scheduled or planned policy that manipulates the independent variable - Ex: judicial decision-making - Example of interrupted time-design- a quasi-experimental study that measures participants repeatedly on a dependent variable (in this example, parole decision making) before, during, and after the “interruption” caused by some event (a food break) - Ex: health insurance laws- The independent variables were not controlled by researchers—people were not randomly assigned to states that had health insurance laws and those that did not. - nonequivalent control group interrupted time-series design- it combines two of the previous designs (the nonequivalent control group design and the interrupted time- series design) - - Internal Validity in Quasi-Experiments - Main concern is internal validity - Threats - Selection effects- relevant only for independent-groups not repeated-measures - Participants at one level are systematically different from others - To prevent: - can try to make matched groups to control for selection effects - Or use wait-list design- all participants plan to receive treatment but are randomly assigned to do so at different times (like different time slot for plastic surgery) - Design Confounds: outside variable that systematically varies with targeted independent variable - To prevent, can track data to analyze for design confounds - Maturation threat- observed change emerges spontaneously over time - To prevent: make a comparison group - History threat- external, historical event happens for everyone in study at same time as the treatment - To prevent: use comparison group however, vulnerable to selection-history effect - Selection-history effect- historical event affects only ppl in treatment group/comp group - Regression to the mean- extreme mean caused by combo of random factors unlikely to happen again - Mainly a problem whena group is selected for differing high or low scores at pretest - - Attrition threat- people drop out of study over time and this is systematic - Easy to check for Testing and Instrumentation threats- testing : order effect bc participants change - Instrumentation test: instrument could change or different test used - Comparison group can rule these out Observer bias, demand characteristics and placebo effects: all related to human objectivity - All easy to interrogate - For observer bias, you simply ask who measured the behaviors. Was the design blind (masked) or double-blind? For experimental demand, you can think about whether the participants were able to detect the study’s goals and respond accordingly. For placebo effects, you can ask whether the design of a study included a comparison group that received an inert, or placebo, treatment. - Balancing priorities in Quasi-experiments - Real-world opportunities - External validity- likelihood patterns will generalize - Ethics - Construct validity and statistical validity - Excellent construct v generally and for stats, just analyze significance and effect size - Are they the same as correlational studies? - Seem similar when a quasi-experiment uses an independent-groups design— that is, when it compares two groups without using random assignment - Difference: researchers do more meddling in quasi-experiments - Try harder to obtain groups by targeting key locations - - Small-N designs: studying only a few individuals - How sample selected is more important than size, for external validity Balancing priorities in case study research- how convinced can we be? - Experimental control: can make conclusions with good research design - Helps study special cases Disadvantages of small-N studies - Internal validity- hard to pin down actual problem and cause - External validity- may not represent general population well - To explore generalizability- might be smart to triangulate (compae case study results to research using other methods) 3 Small-N designs- used frequently in behavior analysis - Stable-baseline- researcher observes behavior for a baseline period before beginning treatment - If behavior during the baseline is stable, the researcher is more cer- tain of the treatment’s effectiveness - If the researchers had collected a single baseline measure before the new treatment and a single test afterward (a before-and-after comparison), the improve- ment could be explained by any number of factors, such as maturation (spontaneous change), regression to the mean, or a history effect. (In fact, a single baseline record followed by a single posttreatment measure would have been a small-N version of the “really bad experiment”; see Chapter 11.) Instead, the researchers recorded an extended, stable baseline, which made it unlikely that some sudden, spontaneous change just happened to occur right at the time the new therapy began. Further- more, the stable baseline meant there was not a single, extreme low point from which improvement would almost definitely occur (a regression effect). Performance began low and stayed low until the experimental technique was introduced. - Baseline gave study INTERNAL VALIDITY - - - Multiple-baseline design- researchers stagger their introduction of an inter- vention across a variety of individuals, times, or situations to rule out alternative explanations - Helps rule out alternative explanations so INTERNVAL VALIDITY to support causality Reversal design- a researcher observes a problem behavior both with and without treatment, but takes the treatment away for a while (the reversal period) to see whether the problem behavior returns (reverses) - Reversal designs are appropriate mainly for situations in which a treatment may not cause lasting change. - Also, there are some ethical issues here Evaluating 4 validities in small-N designs - Internal validity, if conducted with good design, is quite high - External validity- question of generalizability - Enhanced by triangulating (combining with results of other small-N studies), specifying population they want to generalize and sometimes, they don’t even care about generalizing - Construct validity- measurements reliable and valid - Use multiple observers and checking for interrater reliability in case of observation - Statistical validity- generally not traditional stats - But effect sizes are important Chapter 14- Replication, Generalization and the Real World - Must be replicable results 3 types of relocation studies - Direct replication- researchers repeat original study as closely as they can - Can never usually follow exact detail however - Problem: if there are threats to internal validity or construct validity in original study, they are now repeated and also doesn’t test in a new context - Conceptual replication- researchers explore same research question but with different procedures - Same conceptual variables but diff procedures to operationalize - Replication-plus-extension- researchers replicate original experiment and add variables to test additional questions - New tests or conditions or introduce new independent variable - Replication debate in psychology- replication crisis bc direct replication dates are bad - Why do replication studies fail? - One reason: contexts were manipulated to be too different - Another: OSC only conducted one replication attempt per original study - Other: problem with original studies - Sample size too small or questionable research practices were employed - HARK- hypothesizing after the results are known - P-hacking- running diff analysis or individuals that supports conclusion - Improcements to scientific practice - Require larger sample sizes - Open science- practice of sharing data openly for replication and verification - - - Preregistration of methods in advance of data collection Scientifi literature- series of related studies conducted by various researchers that have tested similar variables - Meta-analysis: way of mathematically averaging the results of all published and unpublished studies that have tested the same variables - Strengths - Can sort studies into moderators and compute separate effect sizes - Usually uses all peer-reviewed data - Can tell overall effect size - Limitations - File drawe problem- idea that a meta-analysis might be overesti- mating the true size of an effect because null effects, or even opposite effects, have not been included in the collection process - To combat, researchers should contact colleagues for published and unpublished data Reliability, importance and popular media - Good journalists give sense of literature not only the recent study - Some summarize press releases and don’t question the study - To be important, must a study have external validity? - Direct replication studies don’t support this, but conceptual and repl+exten can - Generalizing to other participants- to analyze this, ask how participants were obtained - If convenience, can’t be sure - How is important rather than sample size - U can choose which population u represent - Generalizing to other situations - Conceptual replications illustrate this - Does it generalize to real-world?- ecological validity or mundane realism - Does it have to be generalizable to many people? - Not rlly; depends on research mode of researcher - Theory-testing- designing correlational or experimental research to investigate support for a theory - Generalization- researchers want to generalize the findings from the sample in a previous study to a larger population - Frequency claims are always in generalization more - Association and causal are sometimes in generalization more - Cultural psychology: a special case of generalization mode (subdiscipline of psychology focusing on how cultural contexts shape the way a person thinks, feels, and behaves) - challenged researchers who work exclusively in theory-testing mode by identifying several theories that were supported by data in one cultural context but not in any other - - - - Muller-Lyer illusion: ppl in N America and Europe fall fo this illusion, bu not around whole world - he Segall team used their global data to conclude that people who grow up in a “carpentered world” have more visual experience with right angles as cues for depth perception than people who grow up in other societies. - Researchers cannot take generalization for granted Figure and ground - japanese students tended to focus on the background first and focused on the background of the fish, but N Americans recognized the fish primarily Theory testing using weird participants - Western, educated, industrialized, rich, democratic Does study have to take place in a real-world setting - Real-world (field setting) - Built in advantage for external validity - However, can also be replicated in lab (experimental realism) - Generalization mode and the real world - Because external validity is of primary importance when researchers are in gener- alization mode, they might strive for a representative sample of a population. But they might also try to enhance the ecological validity of a study in order to ensure its generalizability to nonlaboratory settings. - Theory-tesing mode and the real world - When a researcher is working in theory-testing mode, external validity and real- world applicability may be lower priorities. - Theory-testing mode prioritizes internal validity at the expense of all other considerations, including ecological validity.