Psych 626 Data Analysis & NHST– Dr. Mascolo 1 What Happens When We Don’t Know Population Parameters? The Probability doc took us from Dice, Poker, & the California Lottery to Percentile Ranks – first for an individual score Y & then for a group’s average score πΜ . Now I’ve made 2 important points recently: 1) the Central Limit Theorem allows us to derive Sampling Distribution Parameters based upon Population Distribution Parameters and 2) we almost never know the Population Distribution Parameters. Isn’t that like giving a starving person a can of beans but no can opener? Or like giving you the combination for a safe containing $1 million but not telling you where the safe is? Let’s focus on one Sampling Distribution Parameter -- the standard deviation (i.e., the “standard error of the mean”). The Central Limit Theorem tells us: sY = s , but we do not know the value of the numerator (β΄), so we cannot derive the standard error N (β΄πΜ ). So we can only, well, estimate it – and all we have to go on are our very own data. That brings up a 3rd point I’ve made recently: First and foremost, the purpose of statistical data analysis is to estimate population parameters. So as an example, in the formula for a standard score: z= Y - my sy We could try simply inserting our sample standard deviation (s) into the denominator: z= Y - my sy The numerator doesn’t change because our sample mean is an unbiased estimate of the population mean. However, the denominator does change because the standard deviation is not unbiased – it underestimates the population standard deviation. We try to compensate for this underestimate by “tweaking” the formula for the sample variance -- s2 -- (& therefore the standard deviation -- s) – that is, we use “N-1” in the denominator rather than simply N. Using a slightly smaller denominator in the formula for the sample variance will slightly inflate the result – again, trying to compensate for its underestimation of the population variance. This trick does not completely erase the underestimation problem, so we write the formula with s instead of s in the denominator -- an admission that we are starting with an estimate of the population standard deviation – so this is not simply a direct derivation from the Central Limit Theorem. There’s a cost to this imperfect and biased solution – the Central Limit Theorem’s 4th point is not true – the sampling distribution built with an estimate of the population standard deviation is not a normal distribution – in particular, it’s too short in the head and too fat in the tails. So, we really cannot begin Psych 626 Data Analysis & NHST– Dr. Mascolo the formula with “ z =” (a z-score can be calculated for a score in any data set -- it does not magically transform that data set into a normal distribution…. still, it’s misleading. This class would end at this point -- if not for an employee at the Guinness Brewery in Ireland: William Gosset “did the math” and determined exactly how a distribution using s instead of s in the denominator differs from the normal distribution. Gosset published his work under the pseudonym Student (there are competing explanations of this), and his mathematical result is called the Student t Distribution. So now we have the final formula with all its intellectual honesty: Y - mY t= sY In class we’ll see how the t table in your text’s appendix works – it’ll depend upon the sample size N, or more correctly, the degrees of freedom, which equals N-1 (that’s right, just like the adjusted denominator we used to calculate S2) 2 Psych 626 Data Analysis & NHST– Dr. Mascolo 3 Data Analysis and Hypothesis Testing Your text refers to 2 hypotheses, and I have detailed the models that provide the basis for deciding between them. That is, we “fit” each model to our data (because science is empirical) and then compare the errors -- the “residuals”. The Null Hypothesis (Ho) – is represented by the Null Model, and the Research Hypothesis (H1) is represented by the Full Model. Here are 2 ways to see how the Null Model is simpler than the Full Model: Logical: the Null Model says all the participants come from the same group while the Full Model says the participants must be separated into 2 groups according to a “Predictor Variable” Statistical: Null Model has only 1 parameter to estimate, while the Full Model has 2 parameters to estimate So all things being equal, the Null Model is favored because it is simpler compared to the Full Model. This preference for simplicity has been stated in different ways, for example, the Law of Parsimony, Morgan’s Canon, Occam’s Razor. A scientific theory should not be more complicated than necessary to explain the data. Statistically, the Null Model requires a single parameter estimate: µ. However, the Full Model requires two parameter estimates: µ1 & µ2. Thus, the Null Model enjoys preferential treatment and is discarded in favor of the Full Model only when the data strongly warrant the move to the more complicated model. What would constitute such evidence? The evidence would be that the Full Model more accurately fits the data by significantly decreases errors (residuals). So we state the hypotheses and models this way: Ho: µ1 = µ2 Null Model: Y = µ + e H1: µ1 ≠ µ2 Full Model: Y1 = µ1 + e Y2 = µ2 + e Remember those little e’s are error -- and that’s what we want to compare between models O.K. -- now time to “fit the models” by estimating these parameters: the Null Model’s µ and the Full Model’s µ1 & µ2. In math, estimates are designated with the symbol ‘ or ^ -- in our case, Y’ or Y (let’s use Y’ -- it’s a whole lot easier for me to format). How can we possibly hope to estimate a parameter representing an entire population? Well, necessity is the mother of invention, and all we have available to us comes from our actual data, so we use our sample means: For the Full Model, µ1 is estimated as Y 1, and µ2 is estimated as Y 2. That is, our sample means for each group in our 2-group study. Psych 626 Data Analysis & NHST– Dr. Mascolo 4 For the Null Model, we only have to estimate the overall mean -- as though we just have one big set of data -- not 2 different groups. In our course we assume that all groups in a study have equal size (N), so we can calculate the overall mean of our data by averaging the 2 groups means: Y1 + Y2 (no subscript = no individual groups) – the different levels of the Predictor Variable are 2 ignored Y = O.K., this may all sound theoretical, so what follows is a concrete example of all this -- and it introduces 2 very fundamental statistics used for data analysis. The following example is fashioned after Lockhart (1999), in my estimation the most thorough & intelligible explanation of basic data analysis – including its historical development – ever written. Let’s say we have a set of 8 scores from the Interpersonal Effective Survey (IES): 4 from a Treatment group (T), 4 from a Control group (C). We’ve been using Y to represent the Response Variable (in this case, the IES scores), so we’ll use X to represent the Predictor Variable (in this case, Treatment versus Control): X: T C C T C T T C Y: 12 7 9 14 11 14 16 13 Now the Null Model simplifies the analysis by ignoring the Predictor Variable X. So for example, to calculate the mean for these data, simply add up all the scores – like one group of 8 scores – we ignore the fact that 4 of the scores (the Treatment group) is different from the other 4 (the Control Group) The 8 scores total 96, so the mean of the group is: Y= åY 96 = = 12 N 8 So if we use 12 (the overall mean) as our estimate (Y’), we can calculate how much each of the 8 scores deviates from this estimate: X: T C C T C T T C Y: 12 7 9 14 11 14 16 13 Y: 12 12 12 12 12 12 12 12 0 -5 -3 +2 -1 +2 +4 +1 -- These Y - Y- Y: Y ’s are what we mean by e’s Take a moment to add these e’s up -- they sum to zero -- hopefully this is not a surprise. In fact, we are going to continue our analysis by calculating the Sum of Squares (SS) for each model -- the same calculations you learned for the previous Section Exam. So, we continue by squaring each of these deviations and then adding them up: å(Y - Y ) = 2 (0) 2 + (-5) 2 + (-3) 2 + (+2) 2 + (-1) 2 + (+2) 2 + (+4) 2 + (+1) 2 = 60 = SStotal Psych 626 Data Analysis & NHST– Dr. Mascolo 5 So this Sum of Squares represents the sum of squared deviations from the overall mean of the data - as though we just had 1 group of 8 scores instead of 2 groups of 4 scores. This SS represents the amount of error (e’s) associated with the Null Model, which ignores the Predictor Variable. This is our baseline amount of error -- the amount associated with the simpler model, and it is called the SStotal. Now, our central question in data analysis is whether we can significantly reduce error when we use instead the Full Model, so we have to calculate a SS associated with that model. Remember this model takes into account the Predictor Variable -- that is why it is more complex -- so we cannot adopt this model unless we can show it is worth the added complexity. The SS associated with the Full Model is called the SSe -- and we calculate it the same way we calculate any SS, except in this case the inclusion of the Predictor Variable demands we calculate a SS1 for the treatment group and a separate SS2 for the control group: For the treatment group calculations, we use the mean for the first group, which equals 14: X: T T T T Y: _ Y: 12 14 14 16 14 14 14 14 -2 0 0 +2 _ Y - Y: å(Y1 - Y1 ) = (-2) 2 + (0) 2 + (0) 2 + (+2) 2 = 8 ….. This is SS1 2 For the control group calculations, we use the mean for the second group, which equals 10: X: C C C C Y: _ Y: 7 9 11 13 10 10 10 10 -3 -1 +1 +3 _ Y - Y: å(Y2 - Y2 ) = (-3) 2 + (-1) 2 + (-1) 2 + (+3) 2 = ∑e2 = 20 ….. This is SS2 2 Our last step is to add these 2 SS’s together to get the SS associated with the Full Model -- if this seems too good to be true, it kind of is because we’re supposed to meet a certain criterion called the “Assumption of Homogeneity of Variance” -- I’ll explain that later in class. So for now, we just laze along and calculate SSe = SS1 + SS2 = 8 + 20 = 28 Let’s review: ο· ο· When we use the simpler (Null) model we had this error measure: SStotal = 60 When we use the more complex (Full) model we had this error measure: SSe = 28 Psych 626 Data Analysis & NHST– Dr. Mascolo 1 So even though the Full Model costs us the added complexity of having to take into account the Predictor Variable (and so have to estimate 2 parameters instead of just 1), but the Full Model also provides a better “fit” for the data, that is, a decrease in error. In fact, there’s a measure for this better fit (decreased error); it’s called the “ SSmodel”: it is a measure of the reduction in error when we go from the Null Model to the Full Model: SSmodel = SStotal - SSe. In our example, SSmodel = 60 - 28 = 32 And this leads to our 1st tool for data analysis: the Coefficient of Determination or R2. It is simply the reduction in error in proportion to the total error we started with. That is: R2 = SSmodel / SStotal In our example, R2 = 32/60 = .53 As a ratio, it is interpreted as “the proportion of variability in the Response Variable that is explained by the Predictor Variable”. The higher it is, the more tempted we are to abandon the simpler Null Model in favor of the more complex Full Model – but there’s no objective standard or cut-off that says “out with the Null, in with the Full”. And this leads to our 2nd tool for data analysis: the Magnitude of Effect or Cohen’s d. It measures how strongly the PV effects changes in the RV in the statistical sense. Cohen is not pretending that he is measuring a causal connection between the PV and the RV – that depends on your study’s design-- his statistic is calculated regardless of whether the PV was manipulated or randomly assigned. In fact, other than knowing the levels of measurement of your PV and RV, a statistician doesn’t really need to know about your study at all. Again, internal validity depends upon your research design, not your data analysis. Cohen’s d is expressed in terms of population parameters: m - m2 d= 1 s However, just like the story of William Gosset and the t distribution, we almost never know population parameters, so the calculation relies on estimates. Once again, sample means, variances, and standard deviations provide the estimates of the corresponding population parameters. And so the formula for Cohen’s d that we actually calculate is: * Y -Y d¢ = 1 2 MSe Y - Y 14 - 10 So in my example, d ¢ = 1 2 = = 1.85 2.16 MSe Like R2, there’s no objective standard or cut-off for rejecting Ho. Neither is designed to test the Null Hypothesis. However, Cohen does offer these guidelines for interpreting the Effect Size (ES): d’ Effect Size 0.8 Large 0.5 Moderate 0.2 Small Note: I’m not detailing error terms (the denominator) in our class -- MS stands for a “Mean Square” – another name for a Variance S2 – and is calculated as SSe / df. In a 2 group study df = (n1 - 1) + (n2 – 1). You lose 1 df for each parameter estimate -- another example of how the Full Model is less parsimonious than the Null Model. Psych 626 Data Analysis & NHST– Dr. Mascolo 1 Null Hypothesis Statistical Testing (NHST): A statistic that is designed to test the Null Hypothesis is the t-test, which is calculated this way: * Y1 - Y2 tobs = sy1 -y 2 This t calculation looks like Cohen’s d – and like Cohen it is based upon our data, so it is dubbed “tobs” (“t observed” – meaning based upon our data) What do we make of this statistic we calculated based upon our data? The answer is based on the first part of this document -- recall that we put the “burden of proof” on the Research Hypothesis because its Full Model is more complex than the Null Model associated with the Null Hypothesis (Ho). In other words, we assume Ho is true unless we have sufficient evidence to reject it. So we want to know whether our tobs is rare enough so that we may question our assumption that Ho is true. In fact, if our tobs is really rare, we may actually decide that our data provide sufficient evidence to reject Ho. To determine how rare tobs is, we construct a theoretical Sampling Distribution using the Central Limit Theorem and the Law of Large Numbers – in my Probability document I introduced the Sampling Distribution of the Mean. To determine how rare tobs is, we instead construct a theoretical Sampling Distribution of t. In particular, it is constructed with the assumption that Ho is true, and so the mean of this sampling distribution is 0. Why? If Ho is true, then µ1 = µ2, which is the same as saying µ1 - µ2 = 0. The numerator of tobs is πΜ 1 − πΜ 2, so if the Ho is true, then πΜ 1 − πΜ 2 will likely equal 0, and so tobs will equal 0. Now, the extreme tails of this distribution represent extremely improbable values of tobs if Ho is true – not impossible, but extremely improbable – so improbable that if tobs is large enough to fall in one of these extreme tails, we abandon our assumption that Ho is true and decide instead to reject Ho. For a t-test, these extreme tails constitute a “rejection region” for Ho. This rejection region, called alpha (α), is an objective & quantitative criterion for rejecting Ho, but it is also arbitrary. In Psychology, the very strong tradition arbitrarily sets α no larger than 5% (like with Gosset, there’s a whole backstory here). Before modern computer technology, researchers conducting a t-test consulted a Statistical Table (based on Gosset’s work) to determine the value of t that left the most extreme 5% of the Sampling Distribution. This value is called tcrit because it serves as the cut-off point for statistical significance. That is, in order to be considered statistically significant, tobs must meet or exceed tcrit. So this is the decision rule: If tobs ≥ tcrit, then tobs falls in the rejection region -- Reject Ho If tobs < tcrit, then tobs does not fall in the rejection region -- Accept Ho Note: Again, I’m not detailing error terms (the denominator) in our class -- π πΜ 1− πΜ 2 is the estimated standard error for the difference between means. The formula for t uses π πΜ for one group and π πΜ 1− πΜ 2 for two groups. Psych 626 Data Analysis & NHST– Dr. Mascolo 1 Put another way, tobs is significant if its probability is even less than α (e.g., .05). So the decision rule can be restated this way: If tobs ≥ tcrit, then p < .05 -- Reject Ho If tobs < tcrit, then p > .05 -- Accept Ho Statistical Tables were deleted from this edition of your text – I’ve attached a Table of t-values at the end of this document. The table requires 3 specifics: the alpha level, the degree of freedom, & whether our test is one-or two-tailed (we will always be using a two-tailed test – I’ll explain in lecture). For a two-group study, df = (n1 – 1) + (n2 – 1). That is, we lose 1 df per group (more specifically, 1 df per parameter estimate). Returning to my example, we are conducting a two-tailed test with12 degrees of freedom and alpha set at .05 – the Table says tcrit = 2.447. Now we calculate the t statistic: tobs = Y1 - Y2 14 -10 = = 2.619 sy1 -y 2 1.528 So, tobs = 2.619, which exceeds the tcrit of 2.447, and so we reject H0. Reporting this in the Results section of an APA journal would look like this: “t(6) = 2.619, p < .05” Modern statistical software will calculate your tobs and determine its exact p-value. For example, having already announced your alpha level is .05, your audience reads “t(6) = 2.619, p = .0396” and knows you were able to reject the Null. So really – what is α? Remember the underlying assumption of our Sampling Distribution is that Ho is true, but α is the portion of the distribution we use as a justification for rejecting Ho. So α is the portion of the distribution (i.e., probability) that we reject Ho when we shouldn’t. Earlier I stated that the very strong tradition in Psychology arbitrarily sets α no larger than 5%. Now I can be more specific about this tradition: a research finding is considered statistically significant only if the probability of incorrectly rejecting Ho is less than .05. Psych 626 Data Analysis & NHST– Dr. Mascolo 1 NHST Errors All the above is summarized and extended in this 4-square Truth Table: Reality Ho True Ho False Reject Ho Type I Error P(Type I) = α Correct P(Correctly Reject Ho) = Power Accept Ho Correct Type II Error P(Type II) = β Researcher Decision So let me ask you – which type of error is worse – Type I Error or Type II? That’s right – you can’t answer --- you need some context – a scenario so you could figure out what each type of error entails and then figure out the consequences of each. So for example, let’s say a pharmaceutical company is testing a new drug that might be effective in treating advanced stage Melanoma – which is quite likely fatal. Now add to this situation: the new drug is quite dangerous. So patients are randomly assigned into 2 groups – 1) treatment and 2) control, and their survival rates are tracked. So the Null Hypothesis (Ho) says the 2 groups will not differ in survival rate because the new drug doesn’t work: Ho: µ1 = µ2 (Drug is Ineffective) The Research Hypothesis (H1) says the 2 groups will differ in survival rate because the new drug does work: H1: µ1 ≠ µ2 (Drug is Effective) Now we can specify the consequences of each type of error: Type I Error – Ho is incorrectly rejected (Drug is called Effective but is actually Ineffective) Type II Error – Ho is incorrectly accepted (Drug is called Ineffective but is actually Effective) Which is worse? It’s true the drug is dangerous, but it’s also true the disease is fatal. Myself, I’d rather risk a Type I Error than a Type II Error. With Type I Error, an Ineffective drug is given to terminally ill patients, but with a Type II Error, an Effective drug is withheld from terminally ill patients, which I think is much worse for the patients. Psych 626 Data Analysis & NHST– Dr. Mascolo 2 Now, does our answer change if the study is testing Safety rather than Effectiveness? Let’s say we’ve already established the drug is Effective – but now we are measuring dangerous side effects. So here’s how this plays out: The Null Hypothesis (Ho) says the 2 groups will not differ in side effects because the new drug is Safe: Ho: µ1 = µ2 (Drug is Safe) The Research Hypothesis (H1) says the 2 groups will differ in side effects because the new drug is Dangerous: H1: µ1 ≠ µ2 (Drug is Dangerous) Now we can specify the consequences of each type of error: Type I Error – Ho is incorrectly rejected (Drug is called Dangerous but really is Safe) Type II Error – Ho is incorrectly accepted (Drug is called Safe but really is Dangerous) Which is worse? Remember, the drug is Effective, and the disease is fatal. Myself, I’m going to switch from my earlier decision; I’d rather risk a Type II Error than a Type I Error. With a Type II Error, a Dangerous drug is given to terminally ill patients, but with a Type I Error, a Safe drug is withheld from terminally ill patients. Another scenario: how would our analyses change if we were instead testing a new wrinkle cream that might also discolor the skin? By the way, this kind of analysis is not limited just to scientific data analysis. It also applies to decisions that clinicians make. Here’s a table that parallels the one above: Reality Disorder Absent Disorder Present Clinician DX Yes False Positive Sensitivity Decision DX No Specificity False Negative So the clinical equivalent of a Type I Error is a False Positive – like deciding a depressed patient needs to be hospitalized, but in truth he only has suicidal ideation, not intention. And the clinical equivalent of a Type II Error is a False Negative – like deciding a depressed inpatient can be given a weekend pass because his depression is lifting, but in truth his suicide risk is dangerously high for that same reason. Psych 626 Data Analysis & NHST– Dr. Mascolo 3 Problems/Limitations of NHST & Proposed Solutions I’m returning to the example from my introduction to NHST and the t-test – 8 subjects, 4 in Treatment, 4 in Control – here’s a summary: So in my tiny example, tobs = Y1 - Y2 14 -10 = = 2.619 sy1 -y 2 1.528 We began with the assumption that the Null Hypothesis was true, and we constructed a Sampling Distribution of t based on that assumption. Now, the extreme tails of this distribution are values of t that are very improbable given the Null Hypothesis – not impossible, but really unlikely – like less than 5% (.05). So we collected our data and calculated tobs = 2.619, which was large enough to land in last .05. How did we know? A t table “in the back of the book” shows that a tcrit of 2.447 is the cutoff for the most extreme .05 when df = 6, and so we decided that our initial assumption (Ho is True) should be rejected (Ho is False). When you write this up for publication in the APA journal, your audience reads “t(6) = 2.619, p < .05” – the (6) being your degrees of freedom (8 subjects – 2 parameters that have to be estimated). NHST Comes Under Fire There have grumblings for several decades about psychology’s overreliance on NHST. I’ll explain specifics in class, but here’s a brief outline: 1. NHST has dominated psychology – to the virtual exclusion any other form of analysis. 2. Scientists have become dependent on NHST – to the point of inappropriately extending its use (e.g., establishing replicability/reliability statistically instead of repeating the study) 3. Scientists have lapsed into mindless, robotic use of NHST, perhaps beguiled by the availability of easy-to-use software programs. 4. Scientists have forgotten how to interpret their statistical findings – including foundational concepts like p value, Type II Error, and Power. 5. Scientists have exploited the Achilles’ Heal of NHST – it can be used to disguise findings that are statistically significant but practically trivial. Psych 626 Data Analysis & NHST– Dr. Mascolo 4 One proposed solution: Confidence Intervals We have been using sample statistics to estimate parameters – like when we want to calculate m - m2 Cohen’s d, but it contains population parameters: d = 1 which we don’t know but can only Y -Y estimate with our sample statistics: d ¢ = 1 2 MSe s These sample statistics are being used as Point Estimates – our best single guess – but a “shot in the dark” is pretty limited. Political surveys report Point Estimates but also a Margin of Error. Combined, these constitute a Confidence Interval, where Confidence = Probability that the interval includes the true value of the population parameter being estimated -- in this case, µ1 - µ2. So Ho says these 2 population parameters are equal, so the difference between them is zero. That is, µ1 - µ2 = 0. So we can test Ho by building a 95% Confidence Interval based upon our data and then looking to see whether 0 is included – whether 0 lands somewhere between the Lower Limit and Upper Limit of the interval. If it does, then we are forced to accept Ho – after all, we are 95% certain that our interval includes the true value of µ1 - µ2, and if 0 is included, then 0 is a plausible value, and so Ho must be retained. If instead 0 lands below the Lower Limit or above the Upper Limit – so not in the interval – then 0 is an implausible value for the true difference between means, and so the Null Hypothesis can instead be rejected. OK, so how do we use our data to build this Confidence Interval? We start with the basic structure of any Confidence Interval – a best single guess (Point Estimate) and then we add and subtract a margin of error (Error Estimate): C.I. = Point Estimate +/- Error Estimate Now the Point Estimate of µ1 - µ2 is of course based on the only thing we have available to us -- our sample means: πΜ 1 − πΜ 2 -- which served as the numerator of our tobs calculation. The Error Estimate is also based upon values from our t-test: tcrit multiplied by the same standard error which served as the denominator of our tobs calculation. So we end up with: CI 95 = The tcrit in the Confidence Interval is the same as the tcrit used in the t-test above because a t-test with alpha = .05 is equivalent to a 95% Confidence Interval. For my example: πΆπΌ95 = (14 − 10) ± 2.447 × 1.523 so πΆπΌ95 = 4 ± 3.728 and finally CI95 = +0.3 -- +7.7 Psych 626 Data Analysis & NHST– Dr. Mascolo 5 So adding and subtracting the error term (and rounding the results) gives us a Confidence Interval with a Lower Limit (LL) of +0.3 and an Upper Limit (UL) of +7.7 The Confidence Interval improves upon the best single estimate (point estimate) of the difference between 2 means by specifying an interval bounded by lower and upper limits and, even better, by specifying a probability that the interval between these 2 boundaries includes the true difference between two population parameters -- µ1 - µ2. In my tiny example, the interval clearly does not include 0, so my conclusion is the same as with the t-test: reject Ho. Confidence Intervals illustrate the fundamental application of all Statistics and Data Analysis: estimating population parameters. They can be calculated to estimate a population mean or variance, a difference between two population means or a ratio of two variances. These last two are examples of how Confidence Intervals can be used for NHST – like a t-test or Analysis of Variance (ANOVA – next document). So why are Confidence Intervals at least a partial solution to the NHST criticisms listed above? Like a t-test, a Confidence Interval can be used as an NHST, but Confidence Intervals are also like R2 and Cohen’s d in providing information about Effect Size. So a Confidence Interval is preferable to a t-test because it is more informative – more transparent. So researchers may be able to reject Ho and claim to have demonstrated a statistically significant finding because their calculated Confidence Interval does not include 0, but the Confidence Interval may also expose that finding as trivial. Here’s an example: researchers hope to show that Cognitive Behavior Therapy (CBT) is more effective in treating depression if 12 sessions of Yoga are added to the treatment regimen. They randomly assign depressed patients to be treated with either CBT alone or CBT plus Yoga; the response (outcome) variable is Beck Depression Inventory score. So here are the results: CBT CBT + Yoga _ _ Y1 = 12.8 Y2 = 12.6 SS1 = 1.64 SS2 = 3.38 n1 = 25 n2 = 25 SStotal = 5.52 MSe = 0.10458 ππΜ 1− πΜ 2 = 0.09147 tcrit = 2.021 with πΌ = .05 and df = 48 tobs = 2.19 So the Results Section in an APA journal would read: “t(48) = 2.19, p < .05” In other words, Ho is rejected, so the results are statistically significant. However, if the results of a Confident Interval were added, the statement would be: “t(48) = 2.19, p < .05, 95% CI [0.02, 0.39]” The Results would still show Ho is rejected but would also show the results came very close to including 0 and so failing to reject Ho. Whether or not Ho is rejected, the Confidence Interval shows that at best there is only about a third of a BDI point. So these results show a great deal of precision (very small LL – UL range) but negligible Effect Size -- it seems clear there is no real clinical significance in these results. Psych 626 Data Analysis & NHST– Dr. Mascolo 6