ComSt 311 Redmond Statistics Reference Sheet Independent Variable: A variable/quality that does not depend on other variables. Sometimes manipulated, but often simply measured as a state condition (age, sex, education, etc.). Dependent Variable: A variable (quality) that is affected and changed by independent variables. In the statement, "women are more empathic than men," sex is the independent variable and empathy is the dependent variable. Another way of considering the relationship between two variables is to say that by knowing one variable (a person's sex) we can predict another variable (level of empathy). Significance p < 0.1 p < .05 p <.01 p < .001 Chance of the statistic happening purely by chance: 0.1 means 10 out of 100 times the result is by chance, .05 (5 out of 100), .01 (1 out of 100), .001 (1 out of 1,000) Frequency (number) N = 245 Means (average) (usually appears to show the number of respondents) M = 3.45 (on some scale, such as 1-5) Standard Deviation SD = 2.18 (relative to the mean and scale). (SD is an indication of how representative the mean is of the sample, the larger the SD relative to the mean, the less representative it is—also usually, the flatter the bell curve. Just over 68% of the sample falls within one standard deviation (+/-) of the mean. Thus in the example above, 68% fall between 1.27 (3.45-2.18) and 5.63 (3.45 + 2.18) which is above the 5.0 scale (thus also indicating a skewed sample). Variance: The degree to which a set of numbers are spread out--their distribution. Thus for a sample of five people responding on a five point scale, if they all answered 3, there is no variance (SD = 0), if each chose a different number, there is complete variance (SD = 1.6). Notice that for both groups the means would be 3 and thus for the first group, it would be a very accurate predictor of a person’s score. For the second group, it would only accurately predict 1 out of the 5 and not be a very good predictor. Standard deviation reflects this.1 Tests of Differences T-Test—Tests the difference between means of two groups. Are the two groups under comparison significantly different on the measured quality? Sample: t = (153) = -1.83, p < .05 t (degree of freedom # of participants-1) = value, p < value This means that in a sample of 154 respondents, a t value of –1.83 was obtained in calculating the difference in two means, and that that value would occur by chance only 5% of the time. Analysis of Variance (ANOVA)—Similar to t-test but instead of just two groups, it tests the difference between three or more groups by examining the variance of each group to a grand mean (between group) and within each group (within group) on one independent variable (factor) F (df of categories, df of participants) = value, p < value (sometimes eta squared 2 is provided to further support the strength—interpreted like r2—as percent of variation accounted for) Sample: F (2, 154) = 3.47, p <.05, 2 = .38 (38% of the variance accounted for) Multiple Factor Analysis of Variance (MANOVA)—the same as ANOVA except that it can handle more than one independent variable (factor) and determine the interaction effects among those factors. Shows as F tests among the various configurations. 1 ComSt 311 Redmond Interaction Effects Sex x Age x Education on Affectionate Communication Sex x Age on Affectionate Communication Sex x Education on Affectionate Communication Age x Education Affectionate Communication F (3, 65) = … F (2, 65) = F (2, 65) = F (2, 65) = Main Effects (the effect of each independent variable by itself, controlling for the effects of the other independent variables) Sex on Affectionate Communication Age on Affectionate Communication Education on Affectionate Communication F (1, 65) = F (5, 65) = F (7, 65) = Chi Square (χ2) Also compares groups when the frequency of responses between nominal/categorical data (males/females, freshmen/sophomores/juniors/seniors, ComSt majors/nonmajors, etc.). Chi square simply involves comparing how two groups rated or ranked given options to see if there are different. The underlying assumption is that the distributions of any responses will be the same. Chi square can also be used to compare against “expected” distributions of responses such as equally distributed across the choices. Example: (reading 3). Number indicting a given quality is part of “What is a date?” Quality % of College Students % of Single Adults (older) A couple/dyad 72.4% 60.4% Heterosexual 22% 29.2% Social 46.8% 29.3% Chi square is used to answer the question about whether the choices of these two groups are significantly different. The actual study has more qualities. Finding: χ2 (25, n= 1,149) = 71.45, p < .001). The 25 is the number of qualities and the 1,149 is the number of total responses). Tests of Relationships Correlation (Pearson’s Product-Moment Correlation)—tests whether two variables vary together either positively or negatively and is a number between 1.0 and -1.0 Assesses the amount of variation that is common to both variables—in essence, how often does a response on one quality match up with a quality on another quality. The square of the correlation coefficient indicates how much variance is accounted for between the two variables (called the coefficient of determination). r = .40 produces . r2 = .16, meaning 16% of the variance is accounted for (84% is due to other factors) r = .70 produces r2 = .49, meaning 49% of the variance is accounted for (51% by other factors) r = -.20 becomes r2 = .04, meaning only 4% is accounted for. A correlation can be statistically significant but account for very little variance between the two variables. Just as SD tells how well a mean reflects the general scores, a correlation reflects how well the sloped line reflects the relationship between where two pairs of numbers are plotted in space. In the example below, the distance of each point away from the line reflects the amount of variance that is not accounted for. For r = .95, the total distances of all the points away from the line is small, while for r = .34, a lot more distance (variation) from the line. In essence, the ability to know what the value of one variable (X) based on knowing the value of another variable (Y) is not very good in the second correlation. A smaller r2 means that a lot of other factors are affecting variation in the Y scores. BUT, REMEMBER CORRELATION IS NOT CAUSE AND EFFECT. ComSt 311 Redmond 5 4.5 4 3.5 r = .95 r = .34 3 2.5 Multiple Correlation—a correlation calculated between three or more variables; a large R = +1.0 to –1.0 2 70 Correlation—a 90 110 130 between 150 two 170 Partial correlation variables (X & Y) in which the effect of one or more other variables (Y) is removed from between the two variables (X & Y). Factor Analysis—A method for determining which set of variables are most closely related to one another, and which ones aren’t. Typically a set of items such as a questionnaire are analyzed to see which items inter-correlate (multiple correlation) the best. Those items that relate well to one another constitute a “factor.” Factor lists include the items and their correlation to the factor (the other items in the factor). Researchers label the factors, and you should carefully examine the items and the labels to insure you understand what the label represents. Regression Analysis Regression analysis is a statistical way of demonstrating prediction based only on a linear model creating a slope much like correlations. Attempts to answer the question: “What is the best predictor of…?”. Which set of known independent variables best predicts an dependent variable and with what weighting? Similar to ANOVA (F-test) in that it attempts to account for the most variance possible by creating an equation based upon possible combinations and weightings of independent variables. F-test is used to determine if the regression analysis is statistically significant. Regression equation format: values of Y = a constant value + some slope + error Y = a + b*X + e Y= dependent variable a = some constant (called an intercept—where the line is intercepted) b = slope/beta weight the value by which the known value is multiplied (shows relative contributions when more than one independent variable is included) X = the value of the independent variable. e = error Interpreting Regression Analysis: Usually, the formula is not reported, but simply the beta weights, R2 change values, F test values. Beta weights “β” are listed for each variable to indicate its strength of contribution to the overall prediction of the dependent variable. Each independent variable is evaluated for making a significant improvement in the prediction (usually an F test), if it fails to reach significance, it is not included in the final regression equation. A correlation of determination R2 is calculated comparing the predicted value of the dependent variable based on the each variable’s contribution is compared with the actual/observed results reported by participants. This value is between 1.0 and -1.0. Example: Reading # 8 Coping strategies predict type of commitment in dating relationships. High moral commitment used reframing, β = .15, R2 change .02, modeling, β = .14, R2 change .01, and punishing, β = -.16, R2 change .02. (F results and significance are reported for each as well). Notice the Beta and R2 are fairly small making this marginally meaningful.