Research Methods Exam 2 Study Guide Spring 2020 Chapter 4 Psychological Measurement Psychological Constructs • Constructs: A variable that cannot be observed directly • E.g., personality traits, emotional states, attitudes, and abilities • Psychological constructs cannot be observed directly because: -They typically represent tendencies • E.g., someone who is extraverted is not being extraverted 100% of the time -Often times traits are internal processes • Neurons and sensory systems Conceptual Definition • Conceptual Definition: Describes the behaviors and internal processes that make up that construct, along with how it relates to other variables • E.g., Neuroticism can be related to many other variables and situations Operational Definition • Operational definition: a definition of a variable in terms of precisely how it is to be measured • Generally fall into one of three broad categories 1. Self-Report Measures: those which participants report on their own thoughts, feelings, and actions (e.g., self-esteem measure) 2. Behavioral Measures: those in which some other aspect of participants’ behavior is observed and recorded (e.g., Bobo Doll Study) 3. Physiological measures: those that involve recording any of a wide variety of physiological processes (e.g., heart rate and blood pressure) Converging Operations Converging Operations: various operational definitions come together on the same construct Psychometric Measures Psychometrics: The measurement of psychological variables and constructs Self-report Measures Self-Report Measures: those which participants report on their own thoughts, feelings, and actions (e.g., self-esteem measure) Behavioral Measures Behavioral Measures: those in which some other aspect of participants’ behavior is observed and recorded (e.g., Bobo Doll Study) Physiological Measures Physiological measures: those that involve recording any of a wide variety of physiological processes (e.g., heart rate and blood pressure) Levels of Measurement • Four different levels of measurement • Nominal Level: use for categorical variables and involves assigning scores that are category labels • E.g., marriage, ethnicity, color • Ordinal Level: involves assigning scores so that they represent the rank order • E.g., ranking with Likert scales • Interval: assigning scores using numerical scales in which intervals have the same interpretation throughout. Does not have an “absolute zero” • E.g., Fahrenheit or Celsius, IQ Tests, SAT scores • Ratio: Has equal intervals between values and has a theoretical “true zero” • E.g., height, weight, money, number of siblings, test grades Reliability • Reliability: refers to the consistency of a measure. • Psychologists consider three types of consistency 1. Test-retest reliability 2. Across items 3. Inter-rater reliability Test-Retest Reliability Test-Retest Reliability: The extent to which scores on a measure are consistent across time for the same individual • E.g., someone who is intelligent today will likely be intelligent later Internal Consistency Internal Consistency: The extent to which the items on a multiple- item measure are consistent with each other • Ensuring that all the items on a measure reflect the same underlying Construct • E.g., on the self-esteem scale, people who agree that they are a person of worth should tend to agree that they have a number of good qualities Split-half Correlation Split-half Correlation: The correlation between scores based on one half of the items on a multiple-item measure with scores based on the other half of the items • E.g., correlating responses on the even and odds questions on the self-esteem questionnaire Inter-Rater Reliability Inter-Rater Reliability: How two raters’ scores compare (how similar they are) • E.g., if you are interested in measuring university students’ social skills, you can make video recordings of them as they interact. Then you have two or more observers watch the videos and rate each student. Ideally, the raters scores should be highly correlated with each other. Validity Validity: The extent to which scores on a measure represent the variable or construct they are intended to measure • A measure can have great reliability but no validity • How do we decide if a measure is valid? • Three types of validity: • Face validity • Content Validity • Criterion Validity Face Validity • Face Validity: Extent to which a measurement method appears “on its face” to measure the construct of interest. Does it look valid? • E.g., most people expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and have good qualities • Face validity is at best a very weak • Its based on people’s intuition Content Validity • Content Validity: The extent to which a measure “covers” the construct of interest • Does it measure all aspects of the construct it is measuring? • E.g., if someone conceptually defines test anxiety as involving both anxiety and negative thoughts, than a measure should include both Criterion Validity Criterion Validity: extent to which people’s scores on a measure are correlated with other variables that one would expect them to be correlated with • How related are the scores on the measure to the scores on another related criteria? • Criterion: can be any variable that one has reason to think should be correlated with the construct being measured • Convergent Validity: Comparing one test to another measuring the same construct Discriminant Validity Discriminant Validity: the extent to which scores on a measure are not correlated with measures of variables that are conceptually Distinct • .E.g., self-esteem measures should not be highly correlated with measures assessing mood Conceptually Defining the Construct • Need to have a clear and complete conceptual definition of a construct for good measurement • Allows us to make sound decisions on how to measure the construct • E.g., memory- how would you define this? • Psychologists have broken up memory into multiple different facets (e.g., verbal memory, working memory, episodic memory, etc.) Operationally Defining the Construct • Conceptual definitions must be transformed into something that can be directly observed and measured • Most variables can be defined in many different ways • E.g., Stress can be defined as scores on a scale or cortisol concentration in saliva Implementing the Measure Implementing the Measure • Will generally want to implement any measure in a way that maximizes reliability and validity • Test the measure under similar conditions(e.g., a quite room) Reactivity • Reactivity: how people react when being measured • Social Desirable Responding – doing or saying things because they think it is the socially appropriate thing • Demand Characteristics: subtle cues that reveal how the researcher expects participants to behave • E.g., participant attitude towards exercise immediately after reading a passage about the dangers of heart disease Chapter 5 Experimental Research What is an Experiment? •Experiment: a type of study designed specifically to answer the question of whether there is a causal relationship between two variables •Do changes in IV(independent variable) cause changes in the DV - yes Three main features • There are three main features of an experiment: - Manipulation of the independent variable - All other variables are held constant - Random assignment to groups • If the IV is not manipulated, it is not an experiment! • If all other variables are not held constant, it is not an experiment! • If participants are not randomly assigned to groups, it is not an experiment! • Have to have ALL three features in order for it to be an experiment Conditions / Levels •Conditions: levels of the independent variable Treatment Condition •Treatment condition- receives some form of treatment •Treatment- intervention intended to change people’s lives for the better Experimental Condition •Experimental condition- receives some form of the IV •Can have multiple experimental conditions Control Condition •Control condition- participants do not receive any form of treatment or IV Types of Control Conditions •No-treatment control condition: A control condition in which participants receive no treatment whatsoever—not even a placebo •Placebo: A treatment that lacks any active ingredient or element that should make it effective •Placebo effect: The positive effect of a placebo •Placebo poses a serious problem for researchers who want to determine whether a treatment works •Wait-list control condition: Receives treatment at a later point Extraneous Variables •Extraneous Variables: variables other than the IV and DV • Generally want to control for extraneous variables • E.g., keeping the setting the same for all participants Confounding Variables •When an extraneous variable changes with the IV, it is called a Confounding variable • Random assignment minimizes influence of confounding variables on the DV • And therefore minimizes threats to internal validity what counterbalancing accomplishes: • It controls the order of conditions so that it is no longer a confounding variable Internal Validity •Internal Validity: Refers to the degree to which we can confidently infer a causal relationship between variables •This is highest in experimental research and lowest in non-experimental research (e.g., correlational studies) •(see previous slide on internal validity for more info) Manipulation of the Independent Variables •Manipulate: change its levels systematically so that different group of participants are exposed to different levels of that variable, or the same group of participants is exposed to different levels at different Times. • E.g., To see whether expressive writing affects people’s health – a researcher might instruct some participants to write about traumatic experiences and others to write about neutral experiences. Single Factor Two-Level Design •Single Factor Two-Level Design: experiments with a single IV with two levels Single Factor Multi-Level Design •Single Factor Multi-Level Design: experiments with one IV with more than two conditions Between-Subjects Design •Between-subjects design and within-subjects design • Can also combine the two into what is called a Mixed Methods Design • Between-subjects experiment: An experiment in which each participant is tested in one condition • Participants are randomly assigned to different groups • E.g., Treatment A vs. Treatment B- participants are randomly assigned to either group Random Assignment • Random assignment = Each participant has an equal chance of being assigned to each condition • Random assignment is different than random selection • Random assignment minimizes influence of confounding variables on the DV • And therefore minimizes threats to internal validity •In its strictest sense, random assignment should meet two criteria. • One is that each participant has an equal chance of being assigned to each condition (e.g., a 50% chance of being assigned to each of two conditions) • The second is that each participant is assigned to a condition independently of other participants Block Randomization •Block Randomization: all conditions occur once in the sequence before any of them is repeated Matched Groups •Matched-Groups Design: Alternative to random assignment. Participants in the various conditions are matched on the dependent variable or on some extraneous variable prior to the manipulation of the IVs • This guarantees that these variables will not be confounded across the experimental conditions • E.g., in a study examining whether expressive writing affects people’s health, the experimenter could start by measuring various health related variables. Participants are then matched on how health or unhealthy they are. They are then assigned to groups. Types of Carryover Effects • Carryover effect: An effect of being tested in one condition on participants’ behavior in later conditions • Types of carryover effect: an effect of being tested in one condition on participants’ behavior in later conditions • Practice effect: Participants perform better on a task in later conditions because they have had practice • Fatigue effect: Participants perform worse on a task in later conditions because they have become tired or bored • Context effect: Participants perceive or interpret their task according to the context of previous tasks Minimizing Carry Over Effects with Counterbalancing •Counterbalancing: Systematically varying the order of conditions across participants • Two ways to think about what counterbalancing accomplishes: • It controls the order of conditions so that it is no longer a confounding variable • If there are carryover effects, it makes it possible to detect them Four Big Validities: Internal Validity •Internal Validity: Refers to the degree to which we can confidently infer a causal relationship between variables •This is highest in experimental research and lowest in non-experimental research (e.g., correlational studies) •(see previous slide on internal validity for more info) External Validity •External validity: the extent to which findings can be generalized to people and contexts beyond the experiment • Mundane Realism: When the participants and the situation studied are similar to those that the researchers want to generalize to and participants encounter every day • Psychological Realism: Where the same mental process is used in both the laboratory and real world Construct Validity •Construct Validity: ensuring the research question is clearly operationalized by the study’s methods •In the Darley and Latane study they were interested in studying “does helping behavior become diffused?” • They hypothesized that the participants in a lab would be less likely to help when they believed there were no more potential helpers besides themselves. This conversion from research question to experiment design is called operationalization (operational definition) • They operationalized the IV of diffusion of responsibility by increasing the number of potential helpers (there was a crisis and more individuals means more helpers) Statistical Validity •Statistical Validity: Concerns the proper statistical treatment of data and the soundness of the researchers’ statistical conclusion •Types of tests commonly used are t-tests, ANOVA, regression, correlation •A study also needs enough participants for statistical significance • This is generally done through what is called a power analysis Experimenter Expectancy Effect •Experimenter Expectancy Effect: • Rosenthal and Fode, 1963 conducted a study with students training genetically similar rats to run a maze. Students were told their rats were either “maze-bright” or “maze-dull” • What do you think happened? Experimenter Bias • Experimenter bias: When researchers’ biases inadvertently affect participants’ behaviors Biosocial Effects • Biosocial effects: When an experimenter’s characteristics affect participants’ behaviors Psychosocial Effects • Psychosocial effects: When an experimenter’s attitude/personality affects participants’ behavior Blind and Double-Blind Study •Blind study • Double blind study: Neither experimenters nor participants know what group the participants are in. • Single blind study: The participants do not know what condition they are in. Chapter 6 Non-Experimental Research Non-Experimental Research •Non-Experimental Research: research that lacks the manipulation of an independent variable •Research methods that do not meet the three criteria of an Experiment • Manipulation of an IV • Holding extraneous variables constant • Random assignment When to use non-experimental research •Research question or hypothesis relates to a single variable rather than a statistical relationship between two variables • (e.g., how accurate are people’s first impressions?) •Research question pertains to a non-causal statistical relationship • (e.g., correlation between verbal intelligence and mathematical intelligence) •Research question is about a causal relationship, but the independent variable cannot be manipulated or randomly assigned • (e.g., does damage to the hippocampus impair one’s ability to remember) •Research question is broad and exploratory • (what is it like to be working mother diagnosed with depression?) Single Variable Research •Description of a single variable • Single-variable research • E.g., Milgram’s obedience study Correlational Research • Correlational research- IV is not manipulated & no random assignment • No attempt to control for extraneous variables Quasi-experimental Research • Quasi-experimental research- No random assignment • IV is manipulated across conditions Observational Research a research technique where you observe participants and phenomena in their most natural settings Nonexperimental Research and Internal Validity •Internal validity is compromised in nonexperimental research • Because extraneous variables are not held constant •In order of lowest internal validity to highest: • Correlational (low) • Quasi-experimental (moderate) • Experimental (high) Correlational Research •Types of nonexperimental research •Researcher measures and assesses the relationship between two variables •Why use correlational research? • Do not think that relationship is causal • Cannot manipulate the IV •Variables can be quantitative or categorical • E.g., relationship between gender (categorical) and verbal fluency (quantitative) • E.g., relationship between age (quantitative) and verbal fluency (quantitative) Pearson’s Correlation Coefficient •Scatterplots are often used to represent correlations •Pearson’s Correlation Coefficient (Pearson’s r): used to represent the strength of the correlation Quantitative Research • Quantitative Research: starts with a focused research question or hypothesis • Collect a small amount of numerical data from a large number of individuals • Describe the resulting data using statistical techniques • Draw general conclusions about some large population Qualitative Research • Qualitative Research: originated in the disciplines of anthropology and sociology • Begins with less focused research question • Collect large amounts of relatively unfiltered data from a small number of individuals • Describe data using nonatypical techniques • Grounded theory, thematic analysis, critical discourse analysis or interpretative phenomenological analysis Data Analysis in Qualitative Research •What determines if data is quantitative vs qualitative depends more on what researchers do with the data they collected • E.g., interview about religion and alcohol •What does qualitative data analysis look like? Grounded Theory • Grounded Theory: • Done in stages • First they identify ideas that are repeated throughout the data • Then they organize ideas into smaller number of broader themes • Then they write Theoretical Narratives – an interpretation of the data in terms of themes that they have identified • These narratives focus on the subjective experiences of the participants and is usually supported by many direct quotations from the participants themselves Mixed-Methods Research / Triangulation • Mixed-Methods Research & Triangulation are two ways to combine both quantitative and qualitative research • E.g., use qualitative research for hypothesis generation and qualitative research for hypothesis testing Naturalistic Observation •Naturalistic Observation: observational method that involves observing people’s behavior in the environment in which it typically occurs • A type of field research (e.g., Jane Goodall and chimpanzee research) • Disguised Naturalistic Observation: When researchers engage in naturalistic observation by making their observations as unobtrusively as possible so that participants are not aware that they are being studied • Undisguised Naturalistic Observation: Where the participants are made aware of the researcher presence and monitoring of their behavior Disguised Naturalistic Observation • Disguised Naturalistic Observation: When researchers engage in naturalistic observation by making their observations as unobtrusively as possible so that participants are not aware that they are being studied Undisguised Naturalistic Observation • Undisguised Naturalistic Observation: Where the participants are made aware of the researcher presence and monitoring of their behavior Drawbacks of Naturalistic observation •Drawbacks of Naturalistic Observations • Reactivity refers to when a measure changes participants’ behavior. • Hawthorne effect - the alteration of behavior by the subjects of a study due to their awareness of being observed. • People get used to being observed (e.g., reality TV shows) Participant Observation •Participant Observation: Researchers become active participants in the group or situation they are studying • Rational is that there may be important information that is only accessible to, or can be interpreted only by, someone who is an active participant in the group or situation. • Disguised participant observation: the researcher pretends to be a member of the social group they are observing and conceal their true identity as researchers • Undisguised participant observation: researchers become part of the group they are studying and they disclose their true identity as researchers to the group under investigation Structured Observation •Structured Observation: investigator makes careful observations of one or more specific behaviors in a particular setting that is more structured than the settings used in naturalistic or participant observation • E.g., Mary Ainsworth and the Stranger Situation •Helps investigate a limited set of behaviors •Far more efficient than naturalistic and participant observation •Decreased external validity •Coding: A part of structured observation whereby the observers use a clearly defined set of guidelines to "code“ behaviors—assigning specific behaviors they are observing to a category—and count the number of times or the duration that the behavior occurs. Case Study •Case Study: in-depth examination of an individual, social units (e.g., cults), or events (e.g., natural disasters) •Tend to be more qualitative in nature •Useful because they provide a level of detailed analysis not found in many other research •Only way to study rare conditions Patient H.M. •27 year old man who experienced severe seizures •Underwent brain surgery to remove his hippocampus and amygdala •What do you think happened? •Seizures reduced but lost his ability to form new memories • E.g., could not learn who the new president was •However, he was able to learn new skills • E.g., learning how to use a computer without remembering how Archival Research •Archival data • Use data that has already been collected for another purpose • Newspapers, census data, institutional records, hospital records • Content analysis- analyze the content • E.g., number of times nature-based terms are used in a dictionary Chapter 12 Descriptive Statistics Descriptive Statistics •Descriptive Statistics – a set of techniques for summarizing and displaying data Distribution •Distribution – way the scores are distributed across the levels of that variable. • Example – In a sample of 100 university students, distribution of the variable “sex” might be such that 44 have a score of “male” and 56 have a score of female Frequency Tables •Frequency Table – One way to display the distribution of a variable Histograms •Histogram – a graphical display of a distribution Central Tendency – the point around which the scores in the distribution tend to cluster Mean – sum of the scores divided by the number of scores Median – middle score in the sense that half the scores in the distribution are less than it and half are greater than it Mode – the most frequent score in a distribution Variability: The extent to which the scores vary around their central tendency Range: The difference between the highest and lowest scores in the distribution Standard deviation: Average distance between the scores and the mean Variance: mean of the squared differences Percentile Rank •Percentile Rank: the percentage of scores in the distribution that are lower than the score • Percentile rank of 80 means you scored higher than 80 percent of people Z-Score •Z-Score: indicates how far above or below the mean a raw score is, but it expresses this in terms of the standard deviation • Difference between the individual’s score and the mean of the distribution, divided by the standard deviation of the distribution • Z = (X-M)/SD • Example: Distribution of IQ scores with a mean of 100. Single score is 110 • Standard deviation is 15 • (110-100) / 15 = +0.67 • So a score of 110 is 0.67 standard deviations above the mean •Z-Scores are important because: • Provide a way of describing where an individual’s score is located within a distribution and are sometimes used to report the results of standardized tests • Provide one way of defining outliers • If a z score is less < -3.00 or > +3.00 than it is probably an outlier because it is 3 standard deviations away from the mean • Play an important role in understanding and computing other statistics Effect Size •Effect Size: describes the strength of a statistical relationship •Effect sizes • Small = 0.20 • Medium = 0.50 • Large = 0.80 or greater Cohen’s d •Cohen’s d: A measure of the effect size for a difference between two groups or conditions • the difference between the two means divided by the standard deviation • d = (M1-M2)/SDpooled • Typically compares a treatment group to a control group • Treatment group is usually M1 • Otherwise, the larger number is usually M1 so that the number is + Line graphs : A graph used to show the relationship between two variables • In general, line graphs are used when the variable on the x-axis has (or is organized into) a small number of distinct values • Scatterplots are used when the variable on the x-axis has a large number of values Linear Relationships •Linear Relationships – A statistical relationship in which, as the X variable increases, the Y variables changes at a constant rate • Best described by a straight line Nonlinear Relationships •Nonlinear Relationships – A statistical relationship in which, as thee X variable increases, the Y variable does not increase or decrease at a constant rate • Best Described by a curved line Bar Graphs •Bar Graphs: Generally used to present and compare the mean scores for two or more groups or conditions. Line Graphs •Line Graphs: used when the IV is measurable in more continuous manner (e.g., time) or to present correlations between quantitative variables when the IV has a relatively small number of distinct levels • Each Point should represent the mean score on the DV for participants at one level of the IV Scatterplots •Scatterplots: used to present correlations and relationships between quantitative variables when the variable on the x-axis has a large number of levels • Each point represents an individual rather than the mean for a group of individuals • No line connecting the points Chapter 13 Inferential Statistics Statistics •Statistics: Descriptive data that involves measuring one or more variables in a sample and computing descriptive summary data (e.g., means, correlation coefficients) for those variables Parameters •Parameters: Corresponding values in the population Sampling Error •Sampling Error: The random variability in statistic from sample to sample Null Hypothesis Testing •Purpose of null hypothesis testing is to help researchers decide between • There is a relationship in the population, and the relationship in the sample reflects this • There is no relationship in the population, and the relationship in the sample reflects only sampling error •Null Hypothesis Testing: formal approach to deciding between two interpretations of statistical relationship in a sample Null Hypothesis • Null Hypothesis (H0): The idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error • Informally, the null hypothesis is that the sample relationship “occurred by chance” Alternative Hypothesis • Alternative Hypothesis (HA or H1): An alternative to the null hypothesis, this hypothesis proposes that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population Logic of Null Hypothesis Testing •Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population. •Determine how likely the sample relationship would be if the null hypothesis were true. •If the sample relationship would be extremely unlikely, then reject the null hypothesis in favor of the alternative hypothesis. If it would not be extremely unlikely, then retain the null hypothesis P-value •p-value: The probability of obtaining the sample result or a more extreme result if the null hypothesis were true α (alpha) •α (alpha): the criterion that shows how low a p-value should be before the sample result is considered unlikely enough to reject the null hypothesis (Usually set to .05) Statistical Significance •Statistically Significant: An effort that is unlikely due to random chance and therefore likely represents a real effect in the population Sample Size and Relationship Strength •The stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true • Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s d is a strong 0.50 • If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely •Sometimes the result can be weak and the sample large, or the result can be strong and the sample small. Statistical Significance vs Practical Significance •Based on the previous table, a statistically significant result is not necessarily a strong one •Even a weak result can be statistically significant if the n is big enough •Practical Significance: importance or usefulness of the result in some real-world context • E.g., many sex differences are statistically significant but are not practically significant Basic Null Hypothesis Tests t-Test •t-Test: focuses on the difference between two means • The one-sample t-test • The dependent-sample t-test • Independent samples t-test One-Sample t-test •One-Sample t-test: used to compare a sample mean (M) with a hypothetical population mean (µ0) that provides some interesting standard of comparison • Null hypothesis is that the mean for the population is equal to the hypothetical population mean: (µ=µ0) • HA= mean of the population is different from the hypothetical (µ≠µ0) Dependent-Sample t-test •Paired-Samples t-test (paired dependent-samples t-test): used to compare two means for the same sample tested at two different times or under two different conditions • Most appropriate for pretest-posttest designs or within-subjects experiments • H0 is that the means at the two times or under the two same conditions are the same as the population • HA is that they are not the same • This test can be one-tailed if the researcher has good reason to expect the difference in a certain direction Independent Samples t-test •Independent-Samples t-test: used to compare the means of two separate samples (M1 and M2) • Two samples might have been tested under different conditions in a between subjects experiment • Could be pre-existing groups in a cross-sectional design (e.g., men vs women, extraverts vs introverts) • The null hypothesis is that the means of the two populations are the same Two-Tailed test •Two-tailed test: Where we reject the null hypothesis if the test statistic for the sample is extreme in either direction (+/-) One-Tailed test •One-tailed test: Where we reject the null hypothesis only if the t-score for the sample is extreme in one direction that we specify before collecting the data • Advantage of the one-tailed test is that critical values are less extreme • However, if the sample mean differs from the hypothetical population mean in the unexpected direction, then there is no chance at all of rejecting the null Analysis of Variance •When there are more than two groups or condition means to compare, the most common null hypothesis test is the Analysis of Variance (ANOVA). One-way ANOVA • One-way ANOVA: used for between-subjects designs with a single independent variable •Used to compare the means of more than two samples in a between-subjects design • H0 is that all the means are equal in the population • The test statistic for the ANOVA is called F • mean squares between groups (MSB): estimate of the population variance and is based on the differences among the sample means • mean squares within groups (MSW) :based on the differences among the scores within each group One-way Repeated Measures ANOVA • One-Way Repeated Measures ANOVA: used for within-subjects design with a single IV •Repeated-Measures ANOVA: used for within-subjects design with a single IV • Imagine, for example, that the dependent variable in a study is a measure of reaction time. • In a between-subjects design, these stable individual differences would simply add to the variability within the groups and increase the value of MSW (which would, in turn, decrease the value of F). • In a within-subjects design, however, these stable individual differences can be measured and subtracted from the value of MSW. This lower value of MSW means a higher value of F and a more sensitive test. Factorial ANOVA • Factorial ANOVA: Used for between-subjects design with more than one IV •Factorial ANOVA: when there is more than one IV • The main difference is that factorial ANOVAs produce an F ratio and p value for each main effect and for each interaction • Returning to our calorie estimation example, imagine that the health psychologist tests the effect of participant major (psychology vs. nutrition) and food type (cookie vs. hamburger) in a factorial design. • A factorial ANOVA would produce separate F ratios and p values for the main effect of major, the main effect of food type, and the interaction between major and food. Mixed ANOVA • Mixed ANOVA: Used to compare one or more between-subjects IV and one or more within-subjects IV Post Hoc Comparisons •When we reject the null hypothesis in a one-way ANOVA, we conclude that the group means are not all the same in the population. But this can indicate different things • With three groups, it can indicate that all three means are significantly different from each other • One mean might be significantly different from the other two, with no difference between the remaining two • E.g., The mean calorie estimates of psychology majors, nutrition majors, and dieticians are all significantly different from each other. Or it could be that the mean for dieticians is significantly different from the means for psychology and nutrition majors, but the means for psychology and nutrition majors are not significantly different from each other Errors in Null Hypothesis Testing • Type I Error: Rejecting the null hypothesis when it is true • Occur because even when there is no relationship in the population, sampling error alone will usually produce significant results • When the null hypothesis is true and α is .05, we will mistakenly reject the null hypothesis 5% of the time. • In principle, it is possible to reduce the chance of a Type I error by setting α to something less than .05. • Setting it to .01, for example, would mean that if the null hypothesis is true, then there is only a 1% chance of mistakenly rejecting it • Type II Error: Retaining the null hypothesis when it is false • In practice, Type II errors occur primarily because the research design lacks adequate statistical power to detect the relationship (e.g., the sample is too small) • It is possible to reduce the chance of a Type II error by setting α to something greater than .05 (e.g., .10). Problems with Null Hypothesis Testing •Criticisms of Null Hypothesis Testing • Convention of rejecting and failing to reject the null based on p<.05 • E.g., One study has a p value of .04 and the other a p value of .06. Although the two studies have produced essentially the same result, the former is likely to be considered interesting and worthy of publication and the latter simply not significant • Null testing is not very informative • Typically just indicates that there is a relationship but doesn’t describe it in detail Replicability Crisis •Replicability Crisis: a phrase that reefers to the inability of researchers to replicate earlier research findings • E.g., The results of the Reproducibility Project, which involved over 270 psychologists around the world coordinating their efforts to test the reliability of 100 previously published psychological experiments (Aarts et al., 2015) • Although 97 of the original 100 studies had found statistically significant effects, only 36 of the replications did Ways of improving scientific rigor •Ways of improving scientific rigor: 1. Designing and conducting studies that have sufficient statistical power, in order to increase the reliability of findings 2. Publishing both null and significant findings (thereby counteracting the publication bias and reducing the file drawer problem) 3. Describing one’s research designs in sufficient detail to enable other researchers to replicate your study using an identical or at least very similar procedure 4. Conducting high-quality replications and publishing these results (Brandt et al., 2014)