Fundamental Statistics in Applied Linguistics Research Spring 2010

advertisement
Fundamental Statistics in
Applied Linguistics Research
Spring 2010
Weekend MA Program on Applied English
Dr. Da-Fu Huang
Part I: Statistical Ideas
1. Getting started with
1.1 Opening a data file
SPSS
• The initial pop-up menu or window
• File > Open > Data > .sav file
• The Data Editor
• Variable View
• Data View
• The SPSS Viewer displays output data
(.spv file or data)
1.2 Entering your own data
• Rows in SPSS are cases (for each research participant)
• Columns are separate variables
• Name each case making the first column the ID number of each
participant
• Define the column variables in “Variable View”
(A variable (or a factor) is a collection of data that all belong to the
same sort).
•
•
•
•
•
•
•
•
Name:
Type: type of variables (e.g. Numeric, etc.)
Label: giving a variable a more detailed or descriptive name
Value: giving categorical variables a numeric value
Align: data aligned to the right or left of the column
Missing: assigning a value for missing data (e.g. 999)
Width; Columns
Measure:
• Application activity: Q’s 1-3, pp. 15-16
1.3 Importing data into SPSS
• Conversion of data saved in another format like
Excel, SAS, Access, etc. into SPSS data.
• File > Open > New Query or the “Database
Wizard” in the pop-up window
• See how the data will be arranged when
imported into SPSS.
1.4 Saving data in SPSS
• File > Save or File > Save as
• Open your next file before closing the previous file to
avoid opening the SPSS program again.
• Save (parts of) output data displayed in the SPSS
Viewer window as .spv files, which can be opened later
as “Output” (File > Open > Output).
• Application activity: Q’s 1-2, P18
1.5 Manipulating variables
•
•
•
•
•
•
•
Moving or deleting columns or rows
Combining or recalculating variables
Recoding group boundaries
Making cutpoints for groups
Excluding cases from your data (Select Cases)
Sorting variables
Generating random numbers in Excel.
1.5.1 Moving or deleting columns or rows
•
•
•
•
CUT: delete and paste
Copy: copy and paste
Clear: delete entirely
Insert Variable: put in a new blank or column
1.5.2 Combining or recalculating variables
• Combination of some of the original variables,
or performance of some math operation on the
variables (e.g. calculating percentages)
• Transform > Compute Variable
• Move the variable (s) to the “Numeric
Expression” box and add the appropriate math
operators
• Example dataset: Torres's
• Application activities: Q’s 1-2, P21
1.5.3 Recoding group boundaries
• Making groups different from the original ones
• Transform > Recode Into Different Variables
• Move the variable(s) you want to recode into the “Numeric
Variable  Output Variable” box and give the new variable a
name in the “Output Variable” area. Press Change to name your
new variable.
• Press the Old And New Values button and define your old and
new groups.
• Avoid using the “Output variables are string” box unless you do
not want to use your new variable in stat calculations. Using
numbers to label groups instead of strings because if the labels are
strings this category is not seen as a variable.
• You will most likely give your new variables numbers but you
can later informatively label them in the “Variable View” tab.
• Example Dataset: DeKeyser2000.sav file
• Application activities: Q’s 1-2, P24
1.5.4 Making cutpoints for groups
• Transform > Visual Binning, helping you to decide
how to make groups or collapse your large range of
values into a much smaller choice of categories.
• The dialogue box displays a histogram of scores on the
test, which shows separate bins which are taller when
there are more cases of scores in that bin.
• Make Cutpoints:
• 3 choices
• Entering 2 cuts yields 3 groups, and so on.
• Example Dataset: DeKeyser2000.sav
1.5.5 Excluding cases from your data
(Select Cases)
• Exclude some part of the dataset you have
gathered
• Data > Select Cases (Select the cases we want to
keep, not we want to get rid of!)
• Press If button to express a conditional argument
• Specify in the “Output” section of the dialogue
box as to what to do with the unselected cases
• Example Dataset: Obarow.sav
• Application activities: Q’s 1-2, P27
1.5.6 Sorting variables
• Order the data in a column from smallest to largest or
vice versa.
• Data > Sort Cases
• (c.f.) Data > Sort Variables choice will move the columns
around
• Copy the sorted column data to the beginning of the file
and give it a name
• Do not depend on the SPSS row numbers to define the
cases or your participants.
• Put in you own column with some kind of ID number for
your participants, so you can still remember which row of
data belongs to which participant even though the data
are moved around.
• Example Dataset: the Obarow data file
Summative Application activities for
manipulating variables
(As Homework)
Questions 1-5
CH2 / Larson-Hall (2010); pp. 28-29
1.5.7 Generating random numbers in Excel
• Randomize participants, sentences, test items,
and so on in a research experiment.
• Generating random numbers in Excel
• Type into the formula bar the syntax:
• = RANDBETWEEN(1,100)
• The command will generate random numbers
between 1 and 100
• Make sure the Analysis ToolPak has been
installed in the Excel, or find Add-Ins in the Tools
Menu.
2. Preliminaries to understanding Statistics
• Descriptive statistics summarize and make understandable a
group of numbers collected in a research study
• Inferential statistics (or parametric statistics) make inferences
about larger groups based on numbers from a particular group of
people
• Parametric vs. non-parametric statistics, which do not rely on
the data having a normal distribution.
• Parametric statistics uses what can actually be measured
(the statistics) from an actual sample to estimate what the thing of
interest that we can’t actually measure from the population is
(the parameter).
• Statistics are the measurements we take from our particular
sample, but these statistics are really just guesses or estimation for
the actual parameter or our population
• Robust statistics, a kind of non-parametric statistics, is more
robust to deal with violations in normality and outliers by
providing objective and replicable ways to remove outliers such
as using a trimmed mean.
2.1 Measurement Scales & Variables
Type of
scale
Purpose of scale
Example of scale use
Nominal Counting frequency Finding number of NS of
Chinese in an ESL class
Variables
measured
Nominal or
categorical
variables
Ordinal
Rank ordering
Ranking students according to Ordinal
scores of the voc test
variables
Interval
Measuring intervals Determining z-scores or
Continuous,
standard scores on a grammar numeric or
test
interval
variables
Ratio
Measuring intervals Measuring height, weight,
from a real zero
speed, or absolute
point
temperature
Application activity: Q’s 1-8, pp. 39-40
Continuous or
numeric
variables
2.1.1 Dependent vs. Independent Variables
• The distinction lies in the way the variables function
in the experiment
• Independent variables are those that may have an
effect on the dependent variables; e.g. L1 background,
age, proficiency level, etc.
• Dependent variables are those which are affected; e.g.
scores of a voc test, reaction time, number of accurate
grammatical forms
• Accurate determination of the type of variables
(categorical or continuous; dependent or independent)
in a research question will lead to the appropriate stat
test you need to analyze your data.
• Application activity: Q’s 1-5, pp. 35-37
2.1.2 Frequency data vs. score data
• Frequency data show how often a variable is
present in the data. The data are noncontinuous and describe nominal (discrete,
categorical) variables.
• Score data show how much of a variable is
present in the data. The data are continuous
but the intervals of the scale may be either
ordinal or interval measurements of how much.
2.1.2 Frequency data vs. score data
*Practice 2.1:
• Since her ESL advanced composition students seemed
to use few types of cohesive ties, Wong (1986)
wondered if they could recognize appropriate ties
(other than those they already used). She constructed
a passage with multiple-choice slots (a multiplechoice cloze test) for cohesive ties. The ties fell into
four basic types (conjunctive, lexical, substitution,
ellipsis). Her students were from seven major L1
groups. What are the variables? Do the measures
yield frequencies or scores?
• Your group consensus:
2.1.2 Frequency data vs. score data
*Practice 2.2:
• Brusasco (1984) wanted to compare how much
information translators could give under two different
conditions: when they could stop the tape they were
translating by using the pause button and when they
could not. There were a set number of information
units in the taped text. What are the variables? If the
number of information units are totaled under each
condition, are these frequencies or scores (e.g. how
often or how much)? What is your group consensus:
2.1.3 Operationalization of variables
*Practice 2.3:
• In many studies in the field of applied linguistics,
bilingualism is a variable. Part of the operational
definition of that variable will include the values that
code the variable. It may be coded as 1 = yes, 2= no, or
as 1 = French/English, 2= German/French, 3=
French/Arabic, 4= Cantonese/Mandarin, 5=
Spanish/Portuguese. In this case the variable has been
scaled as a(n) ______ variable. Each number represents
a ______. Bilingualism might be coded as 1= very
limited, 2=limited, 3=good, 4=fluent, 5=very fluent. In
this case, the variable has been measured as a(n)
_______ variable. Bilingualism could be coded on the
basis of a test instrument giving scores from 1 to 100.
the variable has then been measured as a(n) _____
variable.
2.1.4 Moderator variables
• Distinction between major independent variables and
moderating independent variables
• For example, in the study of compliments, gender is
the most important variable to look at in explaining
differences in student performance. However, length
of residence might moderate the effect of gender on
compliment offers / receipts. In this case, length of
residence is a variable functioning as a moderator
variable.
• Moderator variables mediate or moderate the
relationship between the independent and dependent
variables
2.1.5 Control variables
• A control variable is a variable that is not of central concern in
a particular research project but which might affect the outcome.
It is controlled by neutralizing its potential effect on the
dependent variable.
• For example, handedness can affect the ways that Ss respond in
many tasks. In order not to worry about this variable, you
could institute a control by including only right-handed Ss in
your study.
• If you are doing an experiment involving Spanish, you might
decide to control for language similarity and not include any
speakers of non-Romanic languages in your study.
• Whenever you control a variable in this way, remember that
you are also limiting the generalizability of your study
• In the above examples, the effect of an independent variable is
controlled by eliminating it. The control variables in the
examples are nominal (discrete, discontinuous)
2.1.5 Control variables
• For scored, continuous variables, it is possible statistically to control
for the effect of a moderating variable. That is. We can adjust for
preexisting differences in a variable.
• This procedure is called ANCOVA, and the variable that is
controlled is called a covariate.
• For example, in a study on how well males and females from
different L1 groups perform on a series of computer-aided tasks.
The focus of the study is the evaluation of the CAI lessons and the
possible effect of gender and L1 group membership on task
performance. In collecting the data, we might notice that not all
students read through the materials at the same speed. We would
like to adjust the task performance scores taking reading speed into
account. But this stat adjustment controls for preexisting differences
in a variable which is not the focus of the study. Unlike previous
examples, the variable is not deleted; i.e. slow readers or rapid
readers are not deleted from the study.
• While reading speed may be an important variable, it is not the focus
of the research so, instead, its effect is neutralized by stat procedure.
2.1.6 Other intervening variables
 Tendency to draw a direct relation between independent
and dependent variables. For example, additional
education and increased income. Why the lack of a direct
relationship between the two variables?
 There seems to be an intervening variable at work, a
variable that was not included in the study (the variable
of age group).
 An intervening variable is the same thing as a moderating
variable. The only difference is that the intervening
variable has not been or cannot be identified in a precise
way for inclusion in the research.
 In actual research, intervening variables may be difficult
to represent since they may reflect internal mental
processes (e.g. L1/L2 transfer, etc).
 Intervening variables are a source of ‘”error” in our
research.
2.1.7 Scale or variable transformation
Comparing variables that are not measured in
the same units and have different means and
variability is difficult.
For example, how would you compare
someone's performance in the long jump to
someone else's performance in a 1 mile run to
evaluate who is the "better" athlete?
 Standard scores have a known mean and
variability. Converting raw, observed scores to
standard scores aids in comparisons.
 Four standard scores: percentiles , Z scores,
T scores, and stanines.
Z-score table:
z-scores corresponding to cumulative area proportions of
the normal distribution (z-distribution, M=0, SD=1)
Stanines
2.1.7 Scale or variable transformation
• Linear transformation: the simplest transformation since it
entails altering each score in the distribution by a constant;
(e.g.) conversion of raw scores to percentage scores
• P = R/T*100
• P: Percentage score; R: Raw score;
T: Total number of items or highest possible score
(T and 100 are constants)
• Normalization transformation: to standardize or normalize a
distribution of scores; ordinal data will be changed to
interval data, scores being anchored to a norm or a group
performance mean as a point of reference; (e.g.) the z-score,
T-score transformation
*z = (X – M) / S; z-distribution, M=0, SD=1
*T = 10z + 50; T-distribution, M=50, SD=10
2.1.7 Scale or variable transformation
*Practice 2.4
The mean of a reading test was 38 and the standard
deviation 6. Find the z and T scores for each of the
following raw scores: 38, 39, 50
( z = (X – M) / S ; T = 10z + 50 )
2.2 Working of statistical testing
2.2.1 Hypothesis testing
• A stat test will not test the logical, intuitively
understandable hypothesis, but instead, a null
hypothesis.
• Null hypothesis (H0): there is no difference
between groups or there is no relationship between
variables
• “We can never prove something to be true, but we
can prove something to be false” (Howell,2002)
• Rejection of the null hypothesis gets people to
accept the alternative or research hypothesis (Ha)
• Application activity: Q’s 1-3, P43
2.2.1 Hypothesis testing
The most basic test: z-test
2.2.1 Hypothesis testing
The most basic test: z-test
• Question: Is my sample part of the population?
• z-test: What is the probability that my sample is
typical of all possible samples of the same size
that could be taken from the population?
• Choice of α : What am I willing to accept as a
minimum probability?
• H0 : There is no difference between my sample
and any other sample of the same size that cannot
be attributed to natural variation
2.2.1 Hypothesis testing
Research errors
• Statistical tests provide decision support,
not decisions
• Results are probabilities, the researcher has
to set the criteria for decisions
• The choices can be correct or wrong, but
we usually do not know which prevails,
thus we live with probabilities
2.2.2 Two-tailed hypothesis testing
2.2.3 One-tailed hypothesis testing
2.2.3 One-tailed hypothesis testing
2.2.4 Hypothesis testing: critical value
2.2.5 Hypothesis testing:
Consequences of decisions
Possible Realities
H0 Accepted
H0 True
H0 False
Correct
Decision
Type II Error
Type I Error
Correct
Decision
Decisions
H0 Rejected
2.2.5 The relation between the statistical
decision and reality in hypothesis testing
Statistical
Decision
Possible Realities
NH is True
β
(Type II error)
1–β
Correct
decision
*Common values for α= .05 and β= .20
Not reject
NH (H0)
Reject
NH (H0)
1–α
Correct decision
α
(Type I error)
NH is False
2.2.6 Hypothesis testing:
Analogy in legal trials
*H0 : there is no difference between suspect and
innocent people
Statistical
Decision
Null True
Released
Null False
Convicted
Possible Realities
Null True
Null False
Innocent
Guilty
Correct decision
Type II error
Type I error
Correct
decision
2.2.7 Hypothesis testing
Type I (α ), Type II (β) errors
2.3 Power
• Power is the probability of detecting a
statistical (statistically significant; significant)
result when there are in fact differences
between groups or relationships between
variables.
• Sufficient power ensures that real differences
are found and discoveries are not lost.
• Calculating a priori power levels and using
these to guide the choice of sample size avoids
insufficient power to detect possible effects.
• Power should be above .50 and would be
judged adequate at .80 (Murphy and Myors,
2004)
2.4 Effect size
The effect size is a measure of how much the
independent variable or treatment changed the
dependent variable.
Effect size is one way to judge whether the effect
or association has any practical meaning or use.
It is possible to have a “statistically significant” (“p <
0.05”) result that actually has little practical value
For instance, a one-year study shows that diet A is
“significantly” better than diet B for weight loss, but
the subjects in diet A lost an average of only 1
pound more than subjects in diet B over a one-year
period. Would you believe that diet A is really a
“better” diet for weight loss?
2.4 Effect size
• P-value and significance testing depend on the power
of a test and thus on group sizes, but effect sizes do
not change no matter how many participants there are.
• A null hypothesis significance test (NHST) merely
tells whether the study had the power to find a
difference that was greater than zero. An effect size
gives insight into the size of this difference. If the
size is large, the researcher has found something
important to understand.
• Understanding effect size measures (Larson-Hall,
PP115-116)
• Calculating effect sizes for power analysis (LarsonHall, PP116-120)
• Set the power of the test to detect a difference from
the null hypothesis that is of practical importance
2.4.1 Effect size measures
• Group difference index (mean difference index): the d
family of effect sizes (Table 4.7, P118)
• Cohen’s d: measure the difference between two
independent sample means, and express how large the
difference is in SD
• Relationship indexes: the r family of effect sizes:
measure how much an independent and dependent
variable vary together or the amount of covariation in the
two variables; the more closely the two variables are
related, the higher the effect size (Table 4.8, P119)
• Try to give effect sizes for all associations you report on,
not just those that are statistical.
• Associations that are not statistical still have large
enough effect sizes to be interesting and warrant future
research
2.5 What
influences
power?
Statistical
choices
Statistical
test: prefer
parametric
Type of data:
interval/ratio
preferred

larger and/or
one- tailed
Treatment
Increased
power of a
statistical test
Increased
difference in means
and 
Reduced
standard error of the
mean, SEM
Design of
instrument
Reduced
standard deviation
of scores
Increased
sample
size, n
Sampling
Measurement
Increased
instrument
reliability
Reduced
error
variance
Reduced
other sources
of error
Increased
representativeness
of sample
But power is not everything...
Choosing  :
Raising  from .05 to .10?
Nature of
hypotheses
Consequences
of making a
Type I Error
Choice of 
and whether
one or two tailed
Power needed
from a
statistical test
Consequences
of making a
Type II Error
2.6 Steps in hypothesis testing
State the null hypothesis
Decide whether to test it as a one- or two tailed
hypothesis
If no research evidence on the issue, select a twotailed hypothesis
Without research evidence on the issue, select a onetailed hypothesis
Set the probability level (αlevel)
Select the appropriate statistical test(s) for the
data
Collect the data and apply the statistical test(s)
Report the test results and interpret them
correctly
2.2.7 Statistical reporting
• The following numbers will often be reported
in experimental results:
• The statistic
• The degrees of freedom (df)
• The p-value
• The effect size
• (The 95% confidence interval)
2.2.2 Statistical reporting
• One can tell what kind of statistic test was used by the
statistic reported, which is usually represented by a
symbol.
• A t-test has a t
• Correlation has a Pearon’s r
• An ANOVA has an F
• A chi-square has a chi (χ)
• The statistic is calculated and its result is a number.
• In general, the higher this number is and/or the greater
it is than 1, the more likely the p-value is to be very
small.
2.2.2 Statistical reporting
• The degrees of freedom counts how many free
components you have in your data set.
• Four contestants on a game show to pick one of the
4 doors behind which prizes can be found.
• Only 3 of the 4 people have a choice, but the
last person’s choice is fixed.
• Only 3 choices include some variation, so 3
degrees of freedom
• Degrees of freedom is a piece of information
necessary to determine the critical value for finding
statistical significance or not
• Degrees of freedom provides information about the
number of participants or groups (N) in the study,
and a check on someone else’s stat work.
2.2.2 Statistical reporting
• The p-value is the probability that we would find a statistic
as large as the one we found if the null hypothesis were true.
• The p-value represents the probability of the data given the
hypothesis, written as p (D| H0).
• The lower the p-value, the more confidence we would have
in rejecting the null hypothesis and assume there are some
differences among the groups or some relationship between
variables.
• The larger a statistic is, the more likely it will have a small
p-value.
• Meaning of the p-value:
The probability of finding a [name of a statistic] this
large or larger if the null hypothesis were true is [a pvalue ]
2.2.2 Statistical reporting
• Application activity: Q’s 5-8, pp. 51-53
•
2.2.2 Statistical reporting
Reporting the confidence interval (or 95%
confidence interval (CI)), along with effect sizes is
vital to improving researchers’ intuitions about what
statistical testing means.
• The CI provides more info than p-value about effect size
and is more useful for further testing comparisons. The 95%
confidence interval gives the range of values that the mean
difference would take if the study were replicated 100 times
(e.g. a mean difference of 3 points when the 95% CI is (1,4.28) )
• The p-value shows if the comparison found a significant
difference between the groups, but the CI shows how far
from zero the difference lies, giving an intuitively
understandable measure of effect size.
• The width of the CI indicates the precision with which the
difference can be calculated or the amount of sampling error.
Confidence interval
95% Confidence interval
Confidence interval
• The bigger the sample size, the higher the
power, the smaller the CI, the less
sampling error, the more precise and
certain the statistic is as an estimate of the
parameter
• Application activity; Q’s 1-3, pp. 122124
3. Describing data numerically and graphically
3.1 Numerical summaries of data
 Central tendency of data distribution
Mean
Mode
Median
 Variability of data distribution
Variance (s2): the average squared distance from the mean to
any point (= Σ(X – M)2 / (N-1) )
Standard deviation (s; SD): the positive squared root of the
variance
Standard error of the mean (SEM): the SD of the distribution
of the mean (= SD / √N ) ; used to estimate confidence
interval
 Number of participants or observations (N)
 Range: a single number that is the max data point minus the min
data point
3.1 Using SPSS to get numerical summaries
of data
 Analyze > Descriptive Statistics > Descriptives. If you
have groups, first separate them by going to Data >
Split File, choosing the “Compare groups” option, and
moving the group variable into the right-hand box
 Analyze > Descriptive Statistics > Explore. If you have
groups, put the splitting variable into the “Factor List”
box. Choose whether to receive just numerical statistics,
plots, or both
 Example data set: LarsonHall.Forgotten.sav
 Application activities: Q2, P73
3.2 Graphic summaries of data:
Examining the shape of distributions
for normality
 Looking at data can give one some idea about whether
data are normally distributed
 Looking at data will give you a means of verifying what
is happening
 Graphics give you a way to look at your data that can
replicate and even go beyond what you see in the
numbers when you are doing your statistics
 Researchers are urged to look at their data before
undertaking any statistical analysis
 Numerical checking of normality assumption by using
formal tests such as Sharpiro-Wilk test
3.2.1 Histograms vs. bar plots
The histogram divides data up into partitions and, in
the usually case, gives a frequency distribution of
the number of scores contained in each partition (or
a bin)
In case of the use of proportions, the overall area of
the graph will equal 1, and we can overlay the
histogram with a normal curve in order to judge
whether our data seems to follow a normal
distribution
In contrast, a bar plot is a common plot in the field,
but it shows the mean score of some group rather
than a frequency or proportion count of the score
divided up into various breaks.
3.2.1 Histograms vs. bar plots
Histograms and bar plots are both graphics
producing bars, but they answer very different
questions, and only the histogram is appropriate
for looking at the distribution of scores (cf.
Figure 3.10, P78)
Histograms give information about whether
distributions are symmetrically distributed or
skewed, and whether they have one mode (peak)
or several, and a histogram with an overlaid
normal curve can be evaluated to see whether
the data should be considered normally
distributed.
Histograms
Histograms
Bar plots
Bar plots
3.2.2 Skewness and kurtosis
 Two ways to describe deviations in the shape of
distributions.
 If a sampling distribution is skewed, it is not symmetric
around the mean.
Positively skewed, when scores are bunched up toward the
left side of the graph (so that the tail goes to the right and
toward larger numbers)
Negatively skewed, when scores are bunched up toward the
right side of the graph (so that the tail goes to the left and
toward smaller and negative)
 Skewness describes the shape of the distribution as far as
symmetry along a vertical line through the mean goes.
3.2.2 Skewness and kurtosis
 Kurtosis describes the shape of the distribution as far as
the concentration of scores around the mean goes.
 Kurtosis refers to the relative concentration of scores in
the center, the upper and lower ends (tails), and the
shoulders (between the center and the tails) of a
distribution.
Platykurtic (like a plateau), when a distribution too flat at the
peak of the normal curve
Leptokurtic, when a curve has too many scores collected in
the center of the distribution (cf. Figure3.13; P81)
3.2.3 Stem and leaf plots
 Stem and leaf plots display the same kind of information as
a histogram that uses frequency counts, but use the data
itself to show the distribution, and thus retain all of the data
points. (cf. Table 3.5, P82)
 In a stem-and-leaf plot each data value is split into a stem
and a leaf. The leaf is usually the last digit of the number
and the other digits to the left of the leaf form the stem. The
number 123 would be split as: stem12, leaf 3
 It has the advantage over grouped frequency distribution of
retaining the actual data while showing them in graphic
form
Stem and leaf plots
Stem and leaf plots
3.2.4 Q-Q plots
 The quantile-quantile plot (Q-Q plot) plots the quantiles
of the data under consideration against the quantiles of the
normal distribution.
 A quantile means the fraction (or percent) of points below
the given value.
 The 25th quantile notes the point at which 25% of the data
are below it and 75% are above it. The Q-Q plot uses
points at many different quantiles.
 If the sampling distribution and the normal distribution are
similar, the points should fall in a straight line. If the Q-Q
plot shows that there is not a straight line, this tells us it
departs from a normal distribution, and can also give us
some information about what kind of distribution it is (cf.
Figure3.15; P83)
3.2.4 Q-Q plots
The advantages of the q-q plots:
The sample sizes do not need to be equal.
Many distributional aspects can be simultaneously
tested.
Shifts in location, shifts in scale, changes in
symmetry, and the presence of outliers can all be
detected from this plot.
If the two data sets come from populations whose
distributions differ only by a shift in location, the
points should lie along a straight line that is
displaced either up or down from the 45-degree
reference line.
Q-Q Plots
Detrended Q-Q Plots
*These 2 batches do not appear to have come from populations
with a common distribution.
*The batch 1 values are significantly higher than the
corresponding batch 2 values.
*The differences are increasing from values 525 to 625. Then
the values for the 2 batches get closer again.
3.2.5 Obtaining graphics to assess normality in SPSS
Analyze > Descriptive Statistics > Explore, pick
graphs in the “plots” button.
Data > Split Files to split up groups first.
Analyze > Descriptive Statistics > Frequencies.
This can call up histograms with overlaid normal
distribution curves.
Application activities: Q2, P86
3.2.5 Examining the shape of distribution:
The assumption of homogeneity
 The homogeneity of variances assumption: the variances
of the groups are equal; another important assumption
when parametric statistics are applied to group data.
 For a stat test of two groups, given equal distributions
and equal variances of the groups, all we need to do is to
check whether their mean scores differ enough to
consider the groups part of the same distribution or in fact
as two separate distributions. (cf. Figure 3.18, P87, for
illustration of density plots)
 Non-homogeneous variances can be a reason for not
finding group differences which you thought would be
there when performing a stat test
3.2.5 Examining the shape of distribution:
The assumption of homogeneity
 Three ways of examining the homogeneity of variances:
Just look at the numerical output for the SD
Look at side-by-side boxplots of groups, which show
the amount of variability in the central part of the
distribution. (cf. Figure 3.19, P88)
Levene’s test: if p > .05, the null hypothesis is not
rejected, meaning homogeneous or equal variances;
if p < .05, the null hypothesis is rejected, meaning not
equal variances
Sample size should be big enough to have enough
power to detect violations of assumptions
Application activities: Q’s 1-3, PP 88-89
Box plots
Box plots with outliers
(FlegeYenikKomshianLiu)
Boxplots & Interquartile Range (IQR)
4. Statistical tests
4.1 Correlation: A test of relationships
There are exactly 2 variables
The 2 variables have no levels within them
Only two averages of the variables
Both variables are continuous
The variables con not necessarily be defined as
independent and dependent, having no cause
and effect relationship
RQ examples: PP131-132
The non-parametric alternative to a (Pearson’s)
correlation is Spearman’s rank order
correlation test
4. Statistical tests
4.2 Partial correlation: A test of relationships
 There are three or more variables (the influence of more
than one variable can be factored out at a time)
 The 2 variables have no levels within them
 Three or more averages of the variables
 Both variables are continuous
 The variables cannot necessarily be defined as
independent and dependent, having no cause and effect
relationship
 RQ examples: P133
 No non-parametric alternative to a partial correlation
4. Statistical tests
4.3 Multiple regression: A test of
relationships
There are 2 or more variables
The variables have no levels within them
Three or more averages of the variables
All variables are continuous
One variable must be dependent and the others
are independent
RQ examples: PP134-135
No non-parametric alternative to a partial
correlation
4. Statistical tests
4.4 Chi-square: A test of relationships
 Exactly 2 variables
 Each variable has 2 or more levels (categories) within
them (e.g. gender, experimental group, L1 background)
 Cannot calculate averages of the variables, only
frequencies of each category
 All variables are categorical
 The variables cannot necessarily be defined as
independent and dependent, having no cause and effect
relationship
 RQ examples: P136
 The chi-square test of independence is a non-parametric
test, and there is no parametric alternative to a partial
correlation
4. Statistical tests
4.5 T-test: A test of group differences
Independent-samples t-test
 Exactly 2 variables
 One variable is categorical with only 2 levels and is the
independent variable. People in each group must be
different from each other
 The other variable is continuous and is the dependent
variable
 Only 2 averages of the variables
 RQ examples: P138
 The non-parametric alternative to an independentsamples t-test is the Mann-Whitney U-test
4. Statistical tests
4.5 T-test: A test of group differences
Paired-samples t-test
Exactly 2 variables
One variable is categorical with only 2 levels and
is the independent variable. People in each group
must be the same.
The other variable is continuous and is the
dependent variable.
Only 2 averages of the variables
RQ examples: PP138-139
The non-parametric alternative to a pairedsamples t-test is the Wilcoxon signed ranks test
4. Statistical tests
4.6 One-way ANOVA: A test of group
differences
Exactly 2 variables
One variable is categorical with 3 or more levels
and is the independent variable.
The other variable is continuous and is the
dependent variable.
3 or more averages of the variables
RQ examples: P140
The non-parametric alternative to a one-way
ANOVA is the Kruskall-Wallis test
4. Statistical tests
4.7 Factorial ANOVA: A test of group differences
 More than 2 variables
 Two-way ANOVA (2 IVs)
 Three-way ANOVA (3 IVs)
 2 or more variables are categorical and they are
independent variables.
 2 (gender) x 3 (condition) ANOVA
 Only one variable is continuous and is the dependent
variable.
 One advantage over a simple t-test or one-way ANOVA
is that the interaction between variables can be explored
 RQ examples: P142
 No non-parametric alternative to a factorial ANOVA
4. Statistical tests
4.8 Repeated-measures ANOVA: A test of group
differences
 More than 2 variables
 2 or more variables are categorical and they are
independent variables.
At least one independent variable is within-groups,
the same people tested more than once and in more
than one of the groups
At least one independent variable is between-groups,
splitting people so each is found in only one group
 Only one variable is continuous and is the dependent
variable
 RQ examples: P144
 No non-parametric alternative to a repeated-measures
ANOVA
4. Statistical tests
4.9 ANCOVA: A test of group differences
More than 2 variables
One or more variables are categorical and they
are independent variables.
2 or more variables are continuous
Exactly one is the dependent variable
The other variable or variables are the ones being
controlled for (the covariates)
RQ examples: P143
No non-parametric alternative to an ANCOVA
4. Statistical tests
4.10 Repeated-measures ANCOVA: A test of
group differences
 More than 2 variables
 2 or more variables are categorical and they are
independent variables.
At least one independent variable is within-groups,
the same people tested more than once and in more
than one of the groups
At least one independent variable is between-groups,
splitting people so each is found in only one group
 2 or more variables are continuous
Exactly one is the dependent variable
The other variable or variables are the ones being
controlled for (the covariates)
 RQ examples: P144
5.Finding relationships using correlation
5.1 Scatterplots : Visual inspection of your data
 Examining the linearity assumption
 Graphs > Legacy Dialogs > Scatter / Dot
> Simple Scatter (for two-variable SP) > Define
> one variable in the x-axis, and another in the y-axis,
press OK
 Adding a regression line (fit line) or a Loess line to a
SP
 Open the Chart Editor by double-clicking the created
SP > Elements > Fine Line at Total
> Properties > Linear (for a straight regression line)
OR Loess (for a line fitting the data more closely)
5.Finding relationships using correlation
Example datasets: DeKeyser (2000)
Viewing simple SP data by categories
Simple SP > Set Markers By, adding a
categorical variable > customize the graph by
adding fit lines, changing labels, or changing
properties of the plotting characters from the
Chart Editor
Application activities: Q’s 1-5, PP156-157
5.Finding relationships using correlation
5.2 Multiple Scatterplots
Graphs > Legacy Dialogs > Scatter / Dot
> Matrix Scatter (for more than two variables)
> Define
5.Finding relationships using correlation
5.3 Assumptions of parametric correlation
(Pearson’s r) (cf. Table 6.1, P160)
Linearity between each pair of variables
Independence of observations
Normal distribution of variables
Homoscedasticity (constant variance)
(the variance of the residuals for every pair of
points on the independent variable is equal)
5.Finding relationships using correlation
5.4 Effect size for correlation
R2 as a measure of how much of the variance in
one variable is accounted for by the other variable
R2 as a measurement of how tightly the points in a
scatterplot fit the regression line.
R2 is a percentage of variance (PV) effect size,
from the r family of effect sizes.
Cohen (1992)’s definition of effect size for R2 :
 R2 = .01 (small)
 R2 = .09 (medium)
 R2 = .25 (large)
Effect size for correlation (R2)
5.Finding relationships using correlation
5.5 Calculating correlation coefficients
Analyze > Correlate > Bivariate
> Move variables on the left to the Variables
on the right > Choose correlation coefficient
type
 Application activities: Q’s 1-4, P165
5.Finding relationships using correlation
5.6 Output and reporting of a correlation
 4 pieces of info desired in the output
Correlation coefficient (Pearson’s r, Spearman’s
rho, etc)
95% CI
Sample size (N) involved in the correlation
p-value
Calculation of the CI (by typing in r and N) at
the http://glass.ed.asu.edu/stats/analysis/rci.html
Double-click on the table > SPSS Pivot Table >
Format > Table Looks
Output of a correlation
Correlations
Total score on
gjtscore
gjtscore
Pearson Correlation
1
Sig. (2-tailed)
N
Total score on aptitude test Pearson Correlation
totalhrs
totalhrs
aptitude test
.079
.184
**
.267
.009
200
200
200
.079
1
.075
Sig. (2-tailed)
.267
N
200
200
200
.184**
.075
1
.009
.293
200
200
Pearson Correlation
Sig. (2-tailed)
N
**. Correlation is significant at the 0.01 level (2-tailed).
.293
200
5.Finding relationships using correlation
5.7 Sample of reporting a correlation
 Larson-Hall (2010, PP165-166)
Written and tabular forms
5.Finding relationships using correlation
5.8 Partial correlation
 Analyze > Correlate > Partial
Put the variable you want to control for in the
Controlling For box
Put the other variables in the Variables box
Reporting results of partial correlation (P168)
5.Finding relationships using correlation
5.9 Point-Biserial correlations (rpb) & Test
Analysis
 Correlation between a dichotomous variable (only two
choices) and a continuous variable
 One way to determine item discrimination in classical
test theory is to conduct a corrected point-biserial
correlation, scores for the item crosses with scores for
the entire test, minus that particular item
Analyze > Scale > Reliability Analysis
Put the score for the total test and the individual
items in the “Items” box. Open the Statistics and
tick “Scale if item deleted.”
5.Finding relationships using correlation
5.10 Inter-rater Reliability
Inter-rater reliability or the measurement of
Cronbach’s alpha as intraclass correlation for cases
of judges rating persons
Problem with using the average inter-item
correlation as a measurement of reliability between
judges is that we are not sure whether the judges
rated the same people the same way, or just if the
trend of higher and lower scores for the same
participant was followed
5.Finding relationships using correlation
5.10 Inter-rater Reliability
Analyze > Scale > Reliability Analysis
Put the items which contain judges’ ratings of the
participants in the “Items” box. Open the
Statistics and tick “intraclass correlation
coefficient” box
Choose Two-Way Random. Also tick “Scale if
item deleted” and “Correlations.”
Look for Cronbach’s alpha in the output
For overall test reliability, put all of dichotomous test
items into the “Items” box in the Reliability analysis and
obtain Cronbach’s alpha, also known as the KR-20
measure of reliability
6. Looking for groups of explanatory
variables through multiple regression
Explanatory variables vs. response variables
 Y = α + β1 xi1 + … + βk xik + errori
TOEFL score = some constant number (the
intercept) + aptitude score + a number which
fluctuates for each individual (the error)
MR examines whether the explanatory variables
(EV) we’ve posited explain very much of what is
going on in response variables (RV)
MR can also predicts how people in the future will
score on the response variable
6. Looking for groups of explanatory
variables through multiple regression
6.1 Standard multiple regression (SMR)
In SMR, the importance of the EV variable depends
on how much it uniquely overlaps with the RV.
SMR answers the two questions:
What are the nature and size of the relationship between
the RV and the set of EV?
How much of the relationship is contributed uniquely by
each EV?
6. Looking for groups of explanatory
variables through multiple regression
6.2 Sequential (Hierarchical) multiple regression (HMR)
 In HMR, all of the areas of the EV’s that overlap with the
RV will be counted, but the way that they will be included
depends on the order in which the researcher enters the
variables into the equation
 The importance of any variable can be emphasized in HMR,
depending on the order in which it is entered. If two
variables overlap to a large degree, then entering one of
them first will leave little room for explanation for the
second variable
 HMR answers the question:
 Do the subsequent variables entered in each step add to the
prediction of the RV after differences in the variables from the
previous step have been eliminated?
6. Looking for groups of explanatory
variables through multiple regression
6.4 Starting the MR
Analyze > Regression > Linear
Put the RV in the box “Dependent”
For Standard regression: put all EV into the
“Independent” box with the Method set at “Enter”
For sequential regression: put all EV’s into the
“Independent” box with the Method set at
“Enter”. Push the Next button after entering each
one. Enter the EV in the order you want them
into the regression equation.
Open the buttons: Statistics, Plots, and Options
6. Looking for groups of explanatory
variables through multiple regression
6.5 Regression output in SPSS
Analyze > Regression > Linear
Regression Output
Descriptive Statistics
results of the
course
Final score
Mean
Std. Deviation
N
Student English
results of the
evaluation by
proficiency
motivation scale
teachers
LangAnxiety
74.46
2.185
3.0370
3.0741
2.7315
10.386
.7024
.97057
.98770
.77163
54
54
54
54
54
Regression Output
Correlations
Student English
results of the
proficiency
motivation scale
Final score
Pearson Correlation
Final score
Student English proficiency
results of the motivation
scale
results of the course
evaluation by teachers
LangAnxiety
Sig. (1-tailed)
.565
.616
.565
1.000
.211
.616
.211
1.000
.374
.170
.115
.032
-.088
.031
Final score
.
.000
.000
.000
.
.063
.000
.063
.
.003
.109
.203
.410
.265
.411
Final score
54
54
54
Student English proficiency
54
54
54
54
54
54
54
54
54
54
54
54
Student English proficiency
results of the motivation
scale
results of the course
evaluation by teachers
LangAnxiety
N
1.000
results of the motivation
scale
results of the course
evaluation by teachers
LangAnxiety
Correlations
results of the
course
evaluation by
teachers
Pearson Correlation
LangAnxiety
Final score
.374
.032
Student English proficiency
.170
-.088
.115
.031
1.000
-.077
results of the motivation
scale
results of the course
evaluation by teachers
Regression Output
Variables Entered/Removed
Model
1
Variables
Variables
Entered
Removed
Student English
proficiency
2
3
results of the
a
. Enter
results of the
course
evaluation by
teachers
4
Method
. Enter
a
motivation scale
b
. Enter
a
LangAnxiety
a
a. All requested variables entered.
b. Dependent Variable: Final score
. Enter
Regression Output
e
Model Summary
Model
R Square
R
Adjusted R
Std. Error of the
Square
Estimate
.565
a
.319
.306
8.653
2
.760
b
.577
.561
6.885
3
.797
c
.635
.613
6.460
d
.640
.611
6.479
1
4
.800
a. Predictors: (Constant), Student English proficiency
b. Predictors: (Constant), Student English proficiency, results of the
motivation scale
c. Predictors: (Constant), Student English proficiency, results of the
motivation scale, results of the course evaluation by teachers
d. Predictors: (Constant), Student English proficiency, results of the
motivation scale, results of the course evaluation by teachers,
LangAnxiety
e. Dependent Variable: Final score
e
Model Summary
Change Statistics
Model
R Square
Change
F Change
df1
df2
Sig. F Change
1
0.32
24.355
1
52
.000
2
0.26
31.141
1
51
.000
3
0.06
7.933
1
50
.007
4
0.01
.707
1
49
.404
e. Dependent Variable: Final score
Regression Output
Coefficients
a
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
Student English proficiency
2
(Constant)
Student English proficiency
results of the motivation
scale
3
(Constant)
Student English proficiency
results of the motivation
scale
results of the course
evaluation by teachers
4
(Constant)
Student English proficiency
results of the motivation
scale
results of the course
evaluation by teachers
Coefficients
Std. Error
Beta
56.214
3.881
8.351
1.692
42.866
3.906
6.728
1.377
.455
5.563
.997
.520
36.815
4.248
6.175
1.307
.418
5.346
.939
.500
2.577
.915
.245
33.913
5.482
6.269
1.316
.424
5.301
.943
.495
2.629
.920
.250
.977
1.162
.073
LangAnxiety
.565
a. Dependent Variable: Final score
Coefficients
a
95.0% Confidence Interval for B
Model
1
t
(Constant)
Student English proficiency
2
(Constant)
Student English proficiency
results of the motivation
scale
3
(Constant)
Sig.
Lower Bound
Upper Bound
14.485
.000
48.426
64.001
4.935
.000
4.956
11.747
10.975
.000
35.024
50.707
4.884
.000
3.963
9.493
5.580
.000
3.562
7.564
8.666
.000
28.282
45.347
Regression Output
Residuals Statistics
Minimum
Predicted Value
Maximum
a
Mean
Std. Deviation
N
58.93
94.72
74.46
8.311
54
-1.869
2.437
.000
1.000
54
1.000
2.685
1.942
.344
54
59.85
94.85
74.49
8.305
54
Residual
-9.760
21.291
.000
6.230
54
Std. Residual
-1.506
3.286
.000
.962
54
Stud. Residual
-1.600
3.417
-.002
1.006
54
-11.029
23.017
-.025
6.816
54
-1.627
3.875
.010
1.050
54
Mahal. Distance
.282
8.120
3.926
1.624
54
Cook's Distance
.000
.189
.019
.034
54
Centered Leverage Value
.005
.153
.074
.031
54
Std. Predicted Value
Standard Error of Predicted
Value
Adjusted Predicted Value
Deleted Residual
Stud. Deleted Residual
a. Dependent Variable: Final score
Regression Output:
P-P plot for diagnosing normal distribution of
data
Regression Output:
Plot of studentized residuals crossed with
fitted values
6. Looking for groups of explanatory
variables through multiple regression
6.6 Reporting the results of regression analysis
 Correlations between the explanatory variables and the
response variable
 Correlations among the explanatory variables
 Correlation matrix with r-value, p-value, and N
 Standard or sequential regression?
 R square or R square change for each step of the model
 Regression coefficients for all regression models (esp.
unstandarized coefficients, labeled B, and the coefficient
for the intercept, labeled “constant” in SPSS output)
 For standard regression, report the t-tests for the
contribution of each variable to the model
6. Looking for groups of explanatory
variables through multiple regression
6.6 Reporting the results of regression analysis
 The multiple correlation coefficient, R2, expresses how
much of the variable in scores of the response variable
can be explained by the variance in the statistical
explanatory variables
 The squared semipartial correlations (sr2) provideds a
way of assessing the unique contribution of each
variable to the overall R.
 These numbers are already a percentage variance effect
size (of the r family)
 Example reporting on Lafrance & Gottardo (2005):
P198
7. Finding group differences with Chi-Square
when all variables are categorical
Download