Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang Part I: Statistical Ideas 1. Getting started with 1.1 Opening a data file SPSS • The initial pop-up menu or window • File > Open > Data > .sav file • The Data Editor • Variable View • Data View • The SPSS Viewer displays output data (.spv file or data) 1.2 Entering your own data • Rows in SPSS are cases (for each research participant) • Columns are separate variables • Name each case making the first column the ID number of each participant • Define the column variables in “Variable View” (A variable (or a factor) is a collection of data that all belong to the same sort). • • • • • • • • Name: Type: type of variables (e.g. Numeric, etc.) Label: giving a variable a more detailed or descriptive name Value: giving categorical variables a numeric value Align: data aligned to the right or left of the column Missing: assigning a value for missing data (e.g. 999) Width; Columns Measure: • Application activity: Q’s 1-3, pp. 15-16 1.3 Importing data into SPSS • Conversion of data saved in another format like Excel, SAS, Access, etc. into SPSS data. • File > Open > New Query or the “Database Wizard” in the pop-up window • See how the data will be arranged when imported into SPSS. 1.4 Saving data in SPSS • File > Save or File > Save as • Open your next file before closing the previous file to avoid opening the SPSS program again. • Save (parts of) output data displayed in the SPSS Viewer window as .spv files, which can be opened later as “Output” (File > Open > Output). • Application activity: Q’s 1-2, P18 1.5 Manipulating variables • • • • • • • Moving or deleting columns or rows Combining or recalculating variables Recoding group boundaries Making cutpoints for groups Excluding cases from your data (Select Cases) Sorting variables Generating random numbers in Excel. 1.5.1 Moving or deleting columns or rows • • • • CUT: delete and paste Copy: copy and paste Clear: delete entirely Insert Variable: put in a new blank or column 1.5.2 Combining or recalculating variables • Combination of some of the original variables, or performance of some math operation on the variables (e.g. calculating percentages) • Transform > Compute Variable • Move the variable (s) to the “Numeric Expression” box and add the appropriate math operators • Example dataset: Torres's • Application activities: Q’s 1-2, P21 1.5.3 Recoding group boundaries • Making groups different from the original ones • Transform > Recode Into Different Variables • Move the variable(s) you want to recode into the “Numeric Variable Output Variable” box and give the new variable a name in the “Output Variable” area. Press Change to name your new variable. • Press the Old And New Values button and define your old and new groups. • Avoid using the “Output variables are string” box unless you do not want to use your new variable in stat calculations. Using numbers to label groups instead of strings because if the labels are strings this category is not seen as a variable. • You will most likely give your new variables numbers but you can later informatively label them in the “Variable View” tab. • Example Dataset: DeKeyser2000.sav file • Application activities: Q’s 1-2, P24 1.5.4 Making cutpoints for groups • Transform > Visual Binning, helping you to decide how to make groups or collapse your large range of values into a much smaller choice of categories. • The dialogue box displays a histogram of scores on the test, which shows separate bins which are taller when there are more cases of scores in that bin. • Make Cutpoints: • 3 choices • Entering 2 cuts yields 3 groups, and so on. • Example Dataset: DeKeyser2000.sav 1.5.5 Excluding cases from your data (Select Cases) • Exclude some part of the dataset you have gathered • Data > Select Cases (Select the cases we want to keep, not we want to get rid of!) • Press If button to express a conditional argument • Specify in the “Output” section of the dialogue box as to what to do with the unselected cases • Example Dataset: Obarow.sav • Application activities: Q’s 1-2, P27 1.5.6 Sorting variables • Order the data in a column from smallest to largest or vice versa. • Data > Sort Cases • (c.f.) Data > Sort Variables choice will move the columns around • Copy the sorted column data to the beginning of the file and give it a name • Do not depend on the SPSS row numbers to define the cases or your participants. • Put in you own column with some kind of ID number for your participants, so you can still remember which row of data belongs to which participant even though the data are moved around. • Example Dataset: the Obarow data file Summative Application activities for manipulating variables (As Homework) Questions 1-5 CH2 / Larson-Hall (2010); pp. 28-29 1.5.7 Generating random numbers in Excel • Randomize participants, sentences, test items, and so on in a research experiment. • Generating random numbers in Excel • Type into the formula bar the syntax: • = RANDBETWEEN(1,100) • The command will generate random numbers between 1 and 100 • Make sure the Analysis ToolPak has been installed in the Excel, or find Add-Ins in the Tools Menu. 2. Preliminaries to understanding Statistics • Descriptive statistics summarize and make understandable a group of numbers collected in a research study • Inferential statistics (or parametric statistics) make inferences about larger groups based on numbers from a particular group of people • Parametric vs. non-parametric statistics, which do not rely on the data having a normal distribution. • Parametric statistics uses what can actually be measured (the statistics) from an actual sample to estimate what the thing of interest that we can’t actually measure from the population is (the parameter). • Statistics are the measurements we take from our particular sample, but these statistics are really just guesses or estimation for the actual parameter or our population • Robust statistics, a kind of non-parametric statistics, is more robust to deal with violations in normality and outliers by providing objective and replicable ways to remove outliers such as using a trimmed mean. 2.1 Measurement Scales & Variables Type of scale Purpose of scale Example of scale use Nominal Counting frequency Finding number of NS of Chinese in an ESL class Variables measured Nominal or categorical variables Ordinal Rank ordering Ranking students according to Ordinal scores of the voc test variables Interval Measuring intervals Determining z-scores or Continuous, standard scores on a grammar numeric or test interval variables Ratio Measuring intervals Measuring height, weight, from a real zero speed, or absolute point temperature Application activity: Q’s 1-8, pp. 39-40 Continuous or numeric variables 2.1.1 Dependent vs. Independent Variables • The distinction lies in the way the variables function in the experiment • Independent variables are those that may have an effect on the dependent variables; e.g. L1 background, age, proficiency level, etc. • Dependent variables are those which are affected; e.g. scores of a voc test, reaction time, number of accurate grammatical forms • Accurate determination of the type of variables (categorical or continuous; dependent or independent) in a research question will lead to the appropriate stat test you need to analyze your data. • Application activity: Q’s 1-5, pp. 35-37 2.1.2 Frequency data vs. score data • Frequency data show how often a variable is present in the data. The data are noncontinuous and describe nominal (discrete, categorical) variables. • Score data show how much of a variable is present in the data. The data are continuous but the intervals of the scale may be either ordinal or interval measurements of how much. 2.1.2 Frequency data vs. score data *Practice 2.1: • Since her ESL advanced composition students seemed to use few types of cohesive ties, Wong (1986) wondered if they could recognize appropriate ties (other than those they already used). She constructed a passage with multiple-choice slots (a multiplechoice cloze test) for cohesive ties. The ties fell into four basic types (conjunctive, lexical, substitution, ellipsis). Her students were from seven major L1 groups. What are the variables? Do the measures yield frequencies or scores? • Your group consensus: 2.1.2 Frequency data vs. score data *Practice 2.2: • Brusasco (1984) wanted to compare how much information translators could give under two different conditions: when they could stop the tape they were translating by using the pause button and when they could not. There were a set number of information units in the taped text. What are the variables? If the number of information units are totaled under each condition, are these frequencies or scores (e.g. how often or how much)? What is your group consensus: 2.1.3 Operationalization of variables *Practice 2.3: • In many studies in the field of applied linguistics, bilingualism is a variable. Part of the operational definition of that variable will include the values that code the variable. It may be coded as 1 = yes, 2= no, or as 1 = French/English, 2= German/French, 3= French/Arabic, 4= Cantonese/Mandarin, 5= Spanish/Portuguese. In this case the variable has been scaled as a(n) ______ variable. Each number represents a ______. Bilingualism might be coded as 1= very limited, 2=limited, 3=good, 4=fluent, 5=very fluent. In this case, the variable has been measured as a(n) _______ variable. Bilingualism could be coded on the basis of a test instrument giving scores from 1 to 100. the variable has then been measured as a(n) _____ variable. 2.1.4 Moderator variables • Distinction between major independent variables and moderating independent variables • For example, in the study of compliments, gender is the most important variable to look at in explaining differences in student performance. However, length of residence might moderate the effect of gender on compliment offers / receipts. In this case, length of residence is a variable functioning as a moderator variable. • Moderator variables mediate or moderate the relationship between the independent and dependent variables 2.1.5 Control variables • A control variable is a variable that is not of central concern in a particular research project but which might affect the outcome. It is controlled by neutralizing its potential effect on the dependent variable. • For example, handedness can affect the ways that Ss respond in many tasks. In order not to worry about this variable, you could institute a control by including only right-handed Ss in your study. • If you are doing an experiment involving Spanish, you might decide to control for language similarity and not include any speakers of non-Romanic languages in your study. • Whenever you control a variable in this way, remember that you are also limiting the generalizability of your study • In the above examples, the effect of an independent variable is controlled by eliminating it. The control variables in the examples are nominal (discrete, discontinuous) 2.1.5 Control variables • For scored, continuous variables, it is possible statistically to control for the effect of a moderating variable. That is. We can adjust for preexisting differences in a variable. • This procedure is called ANCOVA, and the variable that is controlled is called a covariate. • For example, in a study on how well males and females from different L1 groups perform on a series of computer-aided tasks. The focus of the study is the evaluation of the CAI lessons and the possible effect of gender and L1 group membership on task performance. In collecting the data, we might notice that not all students read through the materials at the same speed. We would like to adjust the task performance scores taking reading speed into account. But this stat adjustment controls for preexisting differences in a variable which is not the focus of the study. Unlike previous examples, the variable is not deleted; i.e. slow readers or rapid readers are not deleted from the study. • While reading speed may be an important variable, it is not the focus of the research so, instead, its effect is neutralized by stat procedure. 2.1.6 Other intervening variables Tendency to draw a direct relation between independent and dependent variables. For example, additional education and increased income. Why the lack of a direct relationship between the two variables? There seems to be an intervening variable at work, a variable that was not included in the study (the variable of age group). An intervening variable is the same thing as a moderating variable. The only difference is that the intervening variable has not been or cannot be identified in a precise way for inclusion in the research. In actual research, intervening variables may be difficult to represent since they may reflect internal mental processes (e.g. L1/L2 transfer, etc). Intervening variables are a source of ‘”error” in our research. 2.1.7 Scale or variable transformation Comparing variables that are not measured in the same units and have different means and variability is difficult. For example, how would you compare someone's performance in the long jump to someone else's performance in a 1 mile run to evaluate who is the "better" athlete? Standard scores have a known mean and variability. Converting raw, observed scores to standard scores aids in comparisons. Four standard scores: percentiles , Z scores, T scores, and stanines. Z-score table: z-scores corresponding to cumulative area proportions of the normal distribution (z-distribution, M=0, SD=1) Stanines 2.1.7 Scale or variable transformation • Linear transformation: the simplest transformation since it entails altering each score in the distribution by a constant; (e.g.) conversion of raw scores to percentage scores • P = R/T*100 • P: Percentage score; R: Raw score; T: Total number of items or highest possible score (T and 100 are constants) • Normalization transformation: to standardize or normalize a distribution of scores; ordinal data will be changed to interval data, scores being anchored to a norm or a group performance mean as a point of reference; (e.g.) the z-score, T-score transformation *z = (X – M) / S; z-distribution, M=0, SD=1 *T = 10z + 50; T-distribution, M=50, SD=10 2.1.7 Scale or variable transformation *Practice 2.4 The mean of a reading test was 38 and the standard deviation 6. Find the z and T scores for each of the following raw scores: 38, 39, 50 ( z = (X – M) / S ; T = 10z + 50 ) 2.2 Working of statistical testing 2.2.1 Hypothesis testing • A stat test will not test the logical, intuitively understandable hypothesis, but instead, a null hypothesis. • Null hypothesis (H0): there is no difference between groups or there is no relationship between variables • “We can never prove something to be true, but we can prove something to be false” (Howell,2002) • Rejection of the null hypothesis gets people to accept the alternative or research hypothesis (Ha) • Application activity: Q’s 1-3, P43 2.2.1 Hypothesis testing The most basic test: z-test 2.2.1 Hypothesis testing The most basic test: z-test • Question: Is my sample part of the population? • z-test: What is the probability that my sample is typical of all possible samples of the same size that could be taken from the population? • Choice of α : What am I willing to accept as a minimum probability? • H0 : There is no difference between my sample and any other sample of the same size that cannot be attributed to natural variation 2.2.1 Hypothesis testing Research errors • Statistical tests provide decision support, not decisions • Results are probabilities, the researcher has to set the criteria for decisions • The choices can be correct or wrong, but we usually do not know which prevails, thus we live with probabilities 2.2.2 Two-tailed hypothesis testing 2.2.3 One-tailed hypothesis testing 2.2.3 One-tailed hypothesis testing 2.2.4 Hypothesis testing: critical value 2.2.5 Hypothesis testing: Consequences of decisions Possible Realities H0 Accepted H0 True H0 False Correct Decision Type II Error Type I Error Correct Decision Decisions H0 Rejected 2.2.5 The relation between the statistical decision and reality in hypothesis testing Statistical Decision Possible Realities NH is True β (Type II error) 1–β Correct decision *Common values for α= .05 and β= .20 Not reject NH (H0) Reject NH (H0) 1–α Correct decision α (Type I error) NH is False 2.2.6 Hypothesis testing: Analogy in legal trials *H0 : there is no difference between suspect and innocent people Statistical Decision Null True Released Null False Convicted Possible Realities Null True Null False Innocent Guilty Correct decision Type II error Type I error Correct decision 2.2.7 Hypothesis testing Type I (α ), Type II (β) errors 2.3 Power • Power is the probability of detecting a statistical (statistically significant; significant) result when there are in fact differences between groups or relationships between variables. • Sufficient power ensures that real differences are found and discoveries are not lost. • Calculating a priori power levels and using these to guide the choice of sample size avoids insufficient power to detect possible effects. • Power should be above .50 and would be judged adequate at .80 (Murphy and Myors, 2004) 2.4 Effect size The effect size is a measure of how much the independent variable or treatment changed the dependent variable. Effect size is one way to judge whether the effect or association has any practical meaning or use. It is possible to have a “statistically significant” (“p < 0.05”) result that actually has little practical value For instance, a one-year study shows that diet A is “significantly” better than diet B for weight loss, but the subjects in diet A lost an average of only 1 pound more than subjects in diet B over a one-year period. Would you believe that diet A is really a “better” diet for weight loss? 2.4 Effect size • P-value and significance testing depend on the power of a test and thus on group sizes, but effect sizes do not change no matter how many participants there are. • A null hypothesis significance test (NHST) merely tells whether the study had the power to find a difference that was greater than zero. An effect size gives insight into the size of this difference. If the size is large, the researcher has found something important to understand. • Understanding effect size measures (Larson-Hall, PP115-116) • Calculating effect sizes for power analysis (LarsonHall, PP116-120) • Set the power of the test to detect a difference from the null hypothesis that is of practical importance 2.4.1 Effect size measures • Group difference index (mean difference index): the d family of effect sizes (Table 4.7, P118) • Cohen’s d: measure the difference between two independent sample means, and express how large the difference is in SD • Relationship indexes: the r family of effect sizes: measure how much an independent and dependent variable vary together or the amount of covariation in the two variables; the more closely the two variables are related, the higher the effect size (Table 4.8, P119) • Try to give effect sizes for all associations you report on, not just those that are statistical. • Associations that are not statistical still have large enough effect sizes to be interesting and warrant future research 2.5 What influences power? Statistical choices Statistical test: prefer parametric Type of data: interval/ratio preferred larger and/or one- tailed Treatment Increased power of a statistical test Increased difference in means and Reduced standard error of the mean, SEM Design of instrument Reduced standard deviation of scores Increased sample size, n Sampling Measurement Increased instrument reliability Reduced error variance Reduced other sources of error Increased representativeness of sample But power is not everything... Choosing : Raising from .05 to .10? Nature of hypotheses Consequences of making a Type I Error Choice of and whether one or two tailed Power needed from a statistical test Consequences of making a Type II Error 2.6 Steps in hypothesis testing State the null hypothesis Decide whether to test it as a one- or two tailed hypothesis If no research evidence on the issue, select a twotailed hypothesis Without research evidence on the issue, select a onetailed hypothesis Set the probability level (αlevel) Select the appropriate statistical test(s) for the data Collect the data and apply the statistical test(s) Report the test results and interpret them correctly 2.2.7 Statistical reporting • The following numbers will often be reported in experimental results: • The statistic • The degrees of freedom (df) • The p-value • The effect size • (The 95% confidence interval) 2.2.2 Statistical reporting • One can tell what kind of statistic test was used by the statistic reported, which is usually represented by a symbol. • A t-test has a t • Correlation has a Pearon’s r • An ANOVA has an F • A chi-square has a chi (χ) • The statistic is calculated and its result is a number. • In general, the higher this number is and/or the greater it is than 1, the more likely the p-value is to be very small. 2.2.2 Statistical reporting • The degrees of freedom counts how many free components you have in your data set. • Four contestants on a game show to pick one of the 4 doors behind which prizes can be found. • Only 3 of the 4 people have a choice, but the last person’s choice is fixed. • Only 3 choices include some variation, so 3 degrees of freedom • Degrees of freedom is a piece of information necessary to determine the critical value for finding statistical significance or not • Degrees of freedom provides information about the number of participants or groups (N) in the study, and a check on someone else’s stat work. 2.2.2 Statistical reporting • The p-value is the probability that we would find a statistic as large as the one we found if the null hypothesis were true. • The p-value represents the probability of the data given the hypothesis, written as p (D| H0). • The lower the p-value, the more confidence we would have in rejecting the null hypothesis and assume there are some differences among the groups or some relationship between variables. • The larger a statistic is, the more likely it will have a small p-value. • Meaning of the p-value: The probability of finding a [name of a statistic] this large or larger if the null hypothesis were true is [a pvalue ] 2.2.2 Statistical reporting • Application activity: Q’s 5-8, pp. 51-53 • 2.2.2 Statistical reporting Reporting the confidence interval (or 95% confidence interval (CI)), along with effect sizes is vital to improving researchers’ intuitions about what statistical testing means. • The CI provides more info than p-value about effect size and is more useful for further testing comparisons. The 95% confidence interval gives the range of values that the mean difference would take if the study were replicated 100 times (e.g. a mean difference of 3 points when the 95% CI is (1,4.28) ) • The p-value shows if the comparison found a significant difference between the groups, but the CI shows how far from zero the difference lies, giving an intuitively understandable measure of effect size. • The width of the CI indicates the precision with which the difference can be calculated or the amount of sampling error. Confidence interval 95% Confidence interval Confidence interval • The bigger the sample size, the higher the power, the smaller the CI, the less sampling error, the more precise and certain the statistic is as an estimate of the parameter • Application activity; Q’s 1-3, pp. 122124 3. Describing data numerically and graphically 3.1 Numerical summaries of data Central tendency of data distribution Mean Mode Median Variability of data distribution Variance (s2): the average squared distance from the mean to any point (= Σ(X – M)2 / (N-1) ) Standard deviation (s; SD): the positive squared root of the variance Standard error of the mean (SEM): the SD of the distribution of the mean (= SD / √N ) ; used to estimate confidence interval Number of participants or observations (N) Range: a single number that is the max data point minus the min data point 3.1 Using SPSS to get numerical summaries of data Analyze > Descriptive Statistics > Descriptives. If you have groups, first separate them by going to Data > Split File, choosing the “Compare groups” option, and moving the group variable into the right-hand box Analyze > Descriptive Statistics > Explore. If you have groups, put the splitting variable into the “Factor List” box. Choose whether to receive just numerical statistics, plots, or both Example data set: LarsonHall.Forgotten.sav Application activities: Q2, P73 3.2 Graphic summaries of data: Examining the shape of distributions for normality Looking at data can give one some idea about whether data are normally distributed Looking at data will give you a means of verifying what is happening Graphics give you a way to look at your data that can replicate and even go beyond what you see in the numbers when you are doing your statistics Researchers are urged to look at their data before undertaking any statistical analysis Numerical checking of normality assumption by using formal tests such as Sharpiro-Wilk test 3.2.1 Histograms vs. bar plots The histogram divides data up into partitions and, in the usually case, gives a frequency distribution of the number of scores contained in each partition (or a bin) In case of the use of proportions, the overall area of the graph will equal 1, and we can overlay the histogram with a normal curve in order to judge whether our data seems to follow a normal distribution In contrast, a bar plot is a common plot in the field, but it shows the mean score of some group rather than a frequency or proportion count of the score divided up into various breaks. 3.2.1 Histograms vs. bar plots Histograms and bar plots are both graphics producing bars, but they answer very different questions, and only the histogram is appropriate for looking at the distribution of scores (cf. Figure 3.10, P78) Histograms give information about whether distributions are symmetrically distributed or skewed, and whether they have one mode (peak) or several, and a histogram with an overlaid normal curve can be evaluated to see whether the data should be considered normally distributed. Histograms Histograms Bar plots Bar plots 3.2.2 Skewness and kurtosis Two ways to describe deviations in the shape of distributions. If a sampling distribution is skewed, it is not symmetric around the mean. Positively skewed, when scores are bunched up toward the left side of the graph (so that the tail goes to the right and toward larger numbers) Negatively skewed, when scores are bunched up toward the right side of the graph (so that the tail goes to the left and toward smaller and negative) Skewness describes the shape of the distribution as far as symmetry along a vertical line through the mean goes. 3.2.2 Skewness and kurtosis Kurtosis describes the shape of the distribution as far as the concentration of scores around the mean goes. Kurtosis refers to the relative concentration of scores in the center, the upper and lower ends (tails), and the shoulders (between the center and the tails) of a distribution. Platykurtic (like a plateau), when a distribution too flat at the peak of the normal curve Leptokurtic, when a curve has too many scores collected in the center of the distribution (cf. Figure3.13; P81) 3.2.3 Stem and leaf plots Stem and leaf plots display the same kind of information as a histogram that uses frequency counts, but use the data itself to show the distribution, and thus retain all of the data points. (cf. Table 3.5, P82) In a stem-and-leaf plot each data value is split into a stem and a leaf. The leaf is usually the last digit of the number and the other digits to the left of the leaf form the stem. The number 123 would be split as: stem12, leaf 3 It has the advantage over grouped frequency distribution of retaining the actual data while showing them in graphic form Stem and leaf plots Stem and leaf plots 3.2.4 Q-Q plots The quantile-quantile plot (Q-Q plot) plots the quantiles of the data under consideration against the quantiles of the normal distribution. A quantile means the fraction (or percent) of points below the given value. The 25th quantile notes the point at which 25% of the data are below it and 75% are above it. The Q-Q plot uses points at many different quantiles. If the sampling distribution and the normal distribution are similar, the points should fall in a straight line. If the Q-Q plot shows that there is not a straight line, this tells us it departs from a normal distribution, and can also give us some information about what kind of distribution it is (cf. Figure3.15; P83) 3.2.4 Q-Q plots The advantages of the q-q plots: The sample sizes do not need to be equal. Many distributional aspects can be simultaneously tested. Shifts in location, shifts in scale, changes in symmetry, and the presence of outliers can all be detected from this plot. If the two data sets come from populations whose distributions differ only by a shift in location, the points should lie along a straight line that is displaced either up or down from the 45-degree reference line. Q-Q Plots Detrended Q-Q Plots *These 2 batches do not appear to have come from populations with a common distribution. *The batch 1 values are significantly higher than the corresponding batch 2 values. *The differences are increasing from values 525 to 625. Then the values for the 2 batches get closer again. 3.2.5 Obtaining graphics to assess normality in SPSS Analyze > Descriptive Statistics > Explore, pick graphs in the “plots” button. Data > Split Files to split up groups first. Analyze > Descriptive Statistics > Frequencies. This can call up histograms with overlaid normal distribution curves. Application activities: Q2, P86 3.2.5 Examining the shape of distribution: The assumption of homogeneity The homogeneity of variances assumption: the variances of the groups are equal; another important assumption when parametric statistics are applied to group data. For a stat test of two groups, given equal distributions and equal variances of the groups, all we need to do is to check whether their mean scores differ enough to consider the groups part of the same distribution or in fact as two separate distributions. (cf. Figure 3.18, P87, for illustration of density plots) Non-homogeneous variances can be a reason for not finding group differences which you thought would be there when performing a stat test 3.2.5 Examining the shape of distribution: The assumption of homogeneity Three ways of examining the homogeneity of variances: Just look at the numerical output for the SD Look at side-by-side boxplots of groups, which show the amount of variability in the central part of the distribution. (cf. Figure 3.19, P88) Levene’s test: if p > .05, the null hypothesis is not rejected, meaning homogeneous or equal variances; if p < .05, the null hypothesis is rejected, meaning not equal variances Sample size should be big enough to have enough power to detect violations of assumptions Application activities: Q’s 1-3, PP 88-89 Box plots Box plots with outliers (FlegeYenikKomshianLiu) Boxplots & Interquartile Range (IQR) 4. Statistical tests 4.1 Correlation: A test of relationships There are exactly 2 variables The 2 variables have no levels within them Only two averages of the variables Both variables are continuous The variables con not necessarily be defined as independent and dependent, having no cause and effect relationship RQ examples: PP131-132 The non-parametric alternative to a (Pearson’s) correlation is Spearman’s rank order correlation test 4. Statistical tests 4.2 Partial correlation: A test of relationships There are three or more variables (the influence of more than one variable can be factored out at a time) The 2 variables have no levels within them Three or more averages of the variables Both variables are continuous The variables cannot necessarily be defined as independent and dependent, having no cause and effect relationship RQ examples: P133 No non-parametric alternative to a partial correlation 4. Statistical tests 4.3 Multiple regression: A test of relationships There are 2 or more variables The variables have no levels within them Three or more averages of the variables All variables are continuous One variable must be dependent and the others are independent RQ examples: PP134-135 No non-parametric alternative to a partial correlation 4. Statistical tests 4.4 Chi-square: A test of relationships Exactly 2 variables Each variable has 2 or more levels (categories) within them (e.g. gender, experimental group, L1 background) Cannot calculate averages of the variables, only frequencies of each category All variables are categorical The variables cannot necessarily be defined as independent and dependent, having no cause and effect relationship RQ examples: P136 The chi-square test of independence is a non-parametric test, and there is no parametric alternative to a partial correlation 4. Statistical tests 4.5 T-test: A test of group differences Independent-samples t-test Exactly 2 variables One variable is categorical with only 2 levels and is the independent variable. People in each group must be different from each other The other variable is continuous and is the dependent variable Only 2 averages of the variables RQ examples: P138 The non-parametric alternative to an independentsamples t-test is the Mann-Whitney U-test 4. Statistical tests 4.5 T-test: A test of group differences Paired-samples t-test Exactly 2 variables One variable is categorical with only 2 levels and is the independent variable. People in each group must be the same. The other variable is continuous and is the dependent variable. Only 2 averages of the variables RQ examples: PP138-139 The non-parametric alternative to a pairedsamples t-test is the Wilcoxon signed ranks test 4. Statistical tests 4.6 One-way ANOVA: A test of group differences Exactly 2 variables One variable is categorical with 3 or more levels and is the independent variable. The other variable is continuous and is the dependent variable. 3 or more averages of the variables RQ examples: P140 The non-parametric alternative to a one-way ANOVA is the Kruskall-Wallis test 4. Statistical tests 4.7 Factorial ANOVA: A test of group differences More than 2 variables Two-way ANOVA (2 IVs) Three-way ANOVA (3 IVs) 2 or more variables are categorical and they are independent variables. 2 (gender) x 3 (condition) ANOVA Only one variable is continuous and is the dependent variable. One advantage over a simple t-test or one-way ANOVA is that the interaction between variables can be explored RQ examples: P142 No non-parametric alternative to a factorial ANOVA 4. Statistical tests 4.8 Repeated-measures ANOVA: A test of group differences More than 2 variables 2 or more variables are categorical and they are independent variables. At least one independent variable is within-groups, the same people tested more than once and in more than one of the groups At least one independent variable is between-groups, splitting people so each is found in only one group Only one variable is continuous and is the dependent variable RQ examples: P144 No non-parametric alternative to a repeated-measures ANOVA 4. Statistical tests 4.9 ANCOVA: A test of group differences More than 2 variables One or more variables are categorical and they are independent variables. 2 or more variables are continuous Exactly one is the dependent variable The other variable or variables are the ones being controlled for (the covariates) RQ examples: P143 No non-parametric alternative to an ANCOVA 4. Statistical tests 4.10 Repeated-measures ANCOVA: A test of group differences More than 2 variables 2 or more variables are categorical and they are independent variables. At least one independent variable is within-groups, the same people tested more than once and in more than one of the groups At least one independent variable is between-groups, splitting people so each is found in only one group 2 or more variables are continuous Exactly one is the dependent variable The other variable or variables are the ones being controlled for (the covariates) RQ examples: P144 5.Finding relationships using correlation 5.1 Scatterplots : Visual inspection of your data Examining the linearity assumption Graphs > Legacy Dialogs > Scatter / Dot > Simple Scatter (for two-variable SP) > Define > one variable in the x-axis, and another in the y-axis, press OK Adding a regression line (fit line) or a Loess line to a SP Open the Chart Editor by double-clicking the created SP > Elements > Fine Line at Total > Properties > Linear (for a straight regression line) OR Loess (for a line fitting the data more closely) 5.Finding relationships using correlation Example datasets: DeKeyser (2000) Viewing simple SP data by categories Simple SP > Set Markers By, adding a categorical variable > customize the graph by adding fit lines, changing labels, or changing properties of the plotting characters from the Chart Editor Application activities: Q’s 1-5, PP156-157 5.Finding relationships using correlation 5.2 Multiple Scatterplots Graphs > Legacy Dialogs > Scatter / Dot > Matrix Scatter (for more than two variables) > Define 5.Finding relationships using correlation 5.3 Assumptions of parametric correlation (Pearson’s r) (cf. Table 6.1, P160) Linearity between each pair of variables Independence of observations Normal distribution of variables Homoscedasticity (constant variance) (the variance of the residuals for every pair of points on the independent variable is equal) 5.Finding relationships using correlation 5.4 Effect size for correlation R2 as a measure of how much of the variance in one variable is accounted for by the other variable R2 as a measurement of how tightly the points in a scatterplot fit the regression line. R2 is a percentage of variance (PV) effect size, from the r family of effect sizes. Cohen (1992)’s definition of effect size for R2 : R2 = .01 (small) R2 = .09 (medium) R2 = .25 (large) Effect size for correlation (R2) 5.Finding relationships using correlation 5.5 Calculating correlation coefficients Analyze > Correlate > Bivariate > Move variables on the left to the Variables on the right > Choose correlation coefficient type Application activities: Q’s 1-4, P165 5.Finding relationships using correlation 5.6 Output and reporting of a correlation 4 pieces of info desired in the output Correlation coefficient (Pearson’s r, Spearman’s rho, etc) 95% CI Sample size (N) involved in the correlation p-value Calculation of the CI (by typing in r and N) at the http://glass.ed.asu.edu/stats/analysis/rci.html Double-click on the table > SPSS Pivot Table > Format > Table Looks Output of a correlation Correlations Total score on gjtscore gjtscore Pearson Correlation 1 Sig. (2-tailed) N Total score on aptitude test Pearson Correlation totalhrs totalhrs aptitude test .079 .184 ** .267 .009 200 200 200 .079 1 .075 Sig. (2-tailed) .267 N 200 200 200 .184** .075 1 .009 .293 200 200 Pearson Correlation Sig. (2-tailed) N **. Correlation is significant at the 0.01 level (2-tailed). .293 200 5.Finding relationships using correlation 5.7 Sample of reporting a correlation Larson-Hall (2010, PP165-166) Written and tabular forms 5.Finding relationships using correlation 5.8 Partial correlation Analyze > Correlate > Partial Put the variable you want to control for in the Controlling For box Put the other variables in the Variables box Reporting results of partial correlation (P168) 5.Finding relationships using correlation 5.9 Point-Biserial correlations (rpb) & Test Analysis Correlation between a dichotomous variable (only two choices) and a continuous variable One way to determine item discrimination in classical test theory is to conduct a corrected point-biserial correlation, scores for the item crosses with scores for the entire test, minus that particular item Analyze > Scale > Reliability Analysis Put the score for the total test and the individual items in the “Items” box. Open the Statistics and tick “Scale if item deleted.” 5.Finding relationships using correlation 5.10 Inter-rater Reliability Inter-rater reliability or the measurement of Cronbach’s alpha as intraclass correlation for cases of judges rating persons Problem with using the average inter-item correlation as a measurement of reliability between judges is that we are not sure whether the judges rated the same people the same way, or just if the trend of higher and lower scores for the same participant was followed 5.Finding relationships using correlation 5.10 Inter-rater Reliability Analyze > Scale > Reliability Analysis Put the items which contain judges’ ratings of the participants in the “Items” box. Open the Statistics and tick “intraclass correlation coefficient” box Choose Two-Way Random. Also tick “Scale if item deleted” and “Correlations.” Look for Cronbach’s alpha in the output For overall test reliability, put all of dichotomous test items into the “Items” box in the Reliability analysis and obtain Cronbach’s alpha, also known as the KR-20 measure of reliability 6. Looking for groups of explanatory variables through multiple regression Explanatory variables vs. response variables Y = α + β1 xi1 + … + βk xik + errori TOEFL score = some constant number (the intercept) + aptitude score + a number which fluctuates for each individual (the error) MR examines whether the explanatory variables (EV) we’ve posited explain very much of what is going on in response variables (RV) MR can also predicts how people in the future will score on the response variable 6. Looking for groups of explanatory variables through multiple regression 6.1 Standard multiple regression (SMR) In SMR, the importance of the EV variable depends on how much it uniquely overlaps with the RV. SMR answers the two questions: What are the nature and size of the relationship between the RV and the set of EV? How much of the relationship is contributed uniquely by each EV? 6. Looking for groups of explanatory variables through multiple regression 6.2 Sequential (Hierarchical) multiple regression (HMR) In HMR, all of the areas of the EV’s that overlap with the RV will be counted, but the way that they will be included depends on the order in which the researcher enters the variables into the equation The importance of any variable can be emphasized in HMR, depending on the order in which it is entered. If two variables overlap to a large degree, then entering one of them first will leave little room for explanation for the second variable HMR answers the question: Do the subsequent variables entered in each step add to the prediction of the RV after differences in the variables from the previous step have been eliminated? 6. Looking for groups of explanatory variables through multiple regression 6.4 Starting the MR Analyze > Regression > Linear Put the RV in the box “Dependent” For Standard regression: put all EV into the “Independent” box with the Method set at “Enter” For sequential regression: put all EV’s into the “Independent” box with the Method set at “Enter”. Push the Next button after entering each one. Enter the EV in the order you want them into the regression equation. Open the buttons: Statistics, Plots, and Options 6. Looking for groups of explanatory variables through multiple regression 6.5 Regression output in SPSS Analyze > Regression > Linear Regression Output Descriptive Statistics results of the course Final score Mean Std. Deviation N Student English results of the evaluation by proficiency motivation scale teachers LangAnxiety 74.46 2.185 3.0370 3.0741 2.7315 10.386 .7024 .97057 .98770 .77163 54 54 54 54 54 Regression Output Correlations Student English results of the proficiency motivation scale Final score Pearson Correlation Final score Student English proficiency results of the motivation scale results of the course evaluation by teachers LangAnxiety Sig. (1-tailed) .565 .616 .565 1.000 .211 .616 .211 1.000 .374 .170 .115 .032 -.088 .031 Final score . .000 .000 .000 . .063 .000 .063 . .003 .109 .203 .410 .265 .411 Final score 54 54 54 Student English proficiency 54 54 54 54 54 54 54 54 54 54 54 54 Student English proficiency results of the motivation scale results of the course evaluation by teachers LangAnxiety N 1.000 results of the motivation scale results of the course evaluation by teachers LangAnxiety Correlations results of the course evaluation by teachers Pearson Correlation LangAnxiety Final score .374 .032 Student English proficiency .170 -.088 .115 .031 1.000 -.077 results of the motivation scale results of the course evaluation by teachers Regression Output Variables Entered/Removed Model 1 Variables Variables Entered Removed Student English proficiency 2 3 results of the a . Enter results of the course evaluation by teachers 4 Method . Enter a motivation scale b . Enter a LangAnxiety a a. All requested variables entered. b. Dependent Variable: Final score . Enter Regression Output e Model Summary Model R Square R Adjusted R Std. Error of the Square Estimate .565 a .319 .306 8.653 2 .760 b .577 .561 6.885 3 .797 c .635 .613 6.460 d .640 .611 6.479 1 4 .800 a. Predictors: (Constant), Student English proficiency b. Predictors: (Constant), Student English proficiency, results of the motivation scale c. Predictors: (Constant), Student English proficiency, results of the motivation scale, results of the course evaluation by teachers d. Predictors: (Constant), Student English proficiency, results of the motivation scale, results of the course evaluation by teachers, LangAnxiety e. Dependent Variable: Final score e Model Summary Change Statistics Model R Square Change F Change df1 df2 Sig. F Change 1 0.32 24.355 1 52 .000 2 0.26 31.141 1 51 .000 3 0.06 7.933 1 50 .007 4 0.01 .707 1 49 .404 e. Dependent Variable: Final score Regression Output Coefficients a Standardized Unstandardized Coefficients Model 1 B (Constant) Student English proficiency 2 (Constant) Student English proficiency results of the motivation scale 3 (Constant) Student English proficiency results of the motivation scale results of the course evaluation by teachers 4 (Constant) Student English proficiency results of the motivation scale results of the course evaluation by teachers Coefficients Std. Error Beta 56.214 3.881 8.351 1.692 42.866 3.906 6.728 1.377 .455 5.563 .997 .520 36.815 4.248 6.175 1.307 .418 5.346 .939 .500 2.577 .915 .245 33.913 5.482 6.269 1.316 .424 5.301 .943 .495 2.629 .920 .250 .977 1.162 .073 LangAnxiety .565 a. Dependent Variable: Final score Coefficients a 95.0% Confidence Interval for B Model 1 t (Constant) Student English proficiency 2 (Constant) Student English proficiency results of the motivation scale 3 (Constant) Sig. Lower Bound Upper Bound 14.485 .000 48.426 64.001 4.935 .000 4.956 11.747 10.975 .000 35.024 50.707 4.884 .000 3.963 9.493 5.580 .000 3.562 7.564 8.666 .000 28.282 45.347 Regression Output Residuals Statistics Minimum Predicted Value Maximum a Mean Std. Deviation N 58.93 94.72 74.46 8.311 54 -1.869 2.437 .000 1.000 54 1.000 2.685 1.942 .344 54 59.85 94.85 74.49 8.305 54 Residual -9.760 21.291 .000 6.230 54 Std. Residual -1.506 3.286 .000 .962 54 Stud. Residual -1.600 3.417 -.002 1.006 54 -11.029 23.017 -.025 6.816 54 -1.627 3.875 .010 1.050 54 Mahal. Distance .282 8.120 3.926 1.624 54 Cook's Distance .000 .189 .019 .034 54 Centered Leverage Value .005 .153 .074 .031 54 Std. Predicted Value Standard Error of Predicted Value Adjusted Predicted Value Deleted Residual Stud. Deleted Residual a. Dependent Variable: Final score Regression Output: P-P plot for diagnosing normal distribution of data Regression Output: Plot of studentized residuals crossed with fitted values 6. Looking for groups of explanatory variables through multiple regression 6.6 Reporting the results of regression analysis Correlations between the explanatory variables and the response variable Correlations among the explanatory variables Correlation matrix with r-value, p-value, and N Standard or sequential regression? R square or R square change for each step of the model Regression coefficients for all regression models (esp. unstandarized coefficients, labeled B, and the coefficient for the intercept, labeled “constant” in SPSS output) For standard regression, report the t-tests for the contribution of each variable to the model 6. Looking for groups of explanatory variables through multiple regression 6.6 Reporting the results of regression analysis The multiple correlation coefficient, R2, expresses how much of the variable in scores of the response variable can be explained by the variance in the statistical explanatory variables The squared semipartial correlations (sr2) provideds a way of assessing the unique contribution of each variable to the overall R. These numbers are already a percentage variance effect size (of the r family) Example reporting on Lafrance & Gottardo (2005): P198 7. Finding group differences with Chi-Square when all variables are categorical