Litosseliti Reading Notes Chapters 3 and 4

advertisement
Notes
Chapters 3 and 4
Litosseliti http://www.city.ac.uk/people/academics/evangelia-litosseliti
Chapter 3 by Sebastian M. Rasinger
About the author: http://www.sebastianrasinger.com/?page_id=5
“Quantitative Methods: Concepts, Frameworks and Issues”
Overview/Outline: (Repeated in 3.1 Introduction)
1.
General characteristics of quantitative research, based on the key differences between quantitative
and qualitative methodology
2.
Real linguistic examples for discussion of the quantifiability of data—the quality of being
measurable—comparing it to qualitative approaches, such as are many discourse analytic frameworks
3.
(Section 3.2) The concepts of quantitative linguistic variables, hypotheses, theories and laws,
reliability and validity
4.
(Section 3.3) Critical evaluation of the most frequently used research designs in quantitative
research—longitudinal, cross-sectional, or experimental designs.
5.
(Section 3.4) Using questionnaires in quantitative research, design features, phrasing questions,
sequencing questions, and measuring tools for different variables of interest in linguistic studies.
Questionnaire coding is also discussed.
3.2
Quantitative versus qualitative methods
In order to explain the difference, the discussion begins with an exchange of utterances between a Mom
and a 2.5 year old.
The section distinguishes between a qualitative discussion of the exchange and what the mom does, etc—
looking a patterns, sequences, and the characteristics of both the mom’s and the child’s utterances.
And then it introduces MLU. I don’t know why she doesn’t cite Roger Brown, whose metric this first was.
Qualitative vs Quantitative
Qualitative—concerned with structures and patterns. “how” something is. by their very nature, inductive
Rampton 1995 study of “crossing” was inductive (note that below we have a more recent meta-study:
Rampton and Charalambous 2010
http://www.kcl.ac.uk/sspp/departments/education/research/ldc/publications/workingpapers/58.pdf
Quantitative—how much or how many or… Allows us to compare large numbers of
instances….statistically. Typically presumed to be deductive—based on a theoretical framework, we
develop hypotheses, which we then TEST—against the idea of a null hypothesis (which we don’t learn
about until chapter 4). So, “a good hypothesis must have the potential of being wrong” p. 53.
Scott and Marshall 2005. Dictionary of Sociology.
Example of results in quantitative question asking—onset of SLA or Critical Period Hypothesis?
Mention of Birdsong and Molis, 2001; Johnson and Newport, 1991.
Two values are extracted: proficiency levels (using some metric) and age.
“Talking about quantitative methods inevitably means talking about variables.
p. 53: from OED “a variable is something which is liable to vary or change; a changeable factor, feature,
or element.”
Gender is a frequent variable in linguistics, but we need to rethink our assumptions about how it can be
measured.
Important feature (p. 54): “whenever we want to quantitatively measure something—that is, assign a
variable value to a particular case, we need to thoroughly think about a reliable way to make this decision.
We need a set of clear and objective definitions for each category or outcome. Moreover, our measure
must be designed in such a way that it comprises as many cases as possible.”
Dialectology and sociolinguistics has looked at the presence, absence, or different realization of certain
linguistic features.
Milroy 1987. It comes down to counting the numer of different realizations of the vowels being studied—
how often does some vowel appear vs another, and what vowel value do we get?
The procedure that leads to the “translation” of (physical) properties of a case into a numerical value is
known as operationalization”
Reliability and Validity
Reliability refers to our measure repeatedly delivering the same or near same results.
Replication should give us the same results. But we can’t really replicate with the same population; people
learn. So, there’s the “split half” method. Take a measure across a group, then split the group randomly
and see how things turn out.
Validity refers to our measure actually measuring what it is supposed to measure—hence measurement or
instrument validity.
Research design, theoretical background and actual methods used are inseparably linked and form the
overall framework for any study. These three parts must work well together. p. 57.
One type of design is cross-sectional. A selection of a population across a relatively homogenous group at
one point in time—a “snapshot.”
Another is longitudinal. These can be panel designs or cohort designs. Bryman 2004 is cited as saying that the
crucial differene between the two is that panel designs allow us to observe both cohort and aging effects,
but cohort studies will only identify aging effects, hence allowing us to control for third variables.
Longitudinal studies require attention to sample retention. A way to circumvent this challenge is to have
“apparent time studies.” Woods 2000 collected data from three generations of a family at about the same
time.
Data collection issues
Is language observed in its natural environment? (sociolinguistic research in longitudinal and cross-sectional
studies.) But we can only observe (look for and hope to find) variables—does a particular speech
community show a particular linguistic feature?
What about experimental design? An experimental design manipulates the variables.
Experimental groups (EG) treatment
Control Groups (CG) no treatment
You can do within subject effects: before and after getting it (“pre-and post-stimulus”)
And between-subject effects—some get it and some don’t (EG and CG once)
Between subject designs offer the challenge of making sure that the two groups are similar enough not to
introduce other uncontrolled/uncontrollable variables.
3.4
Panacea Questionnaires: Design, Use, and Abuse
Questionnaires are frequently used to measure people’s attitudes to and perceptions of languages.
(Matched Guise tests described in Chapter 4, for example.)
The concept of “ethnolinguistic vitality”
Bourhis, et al. 1981
Giles, et al., 1977
language use
Extra and Yagmur, 2004
Rasinger, 2007
The data from questionnaires is easy to use. Questionnaires are difficult to design well—so that they
reliably generate valid data.
How many questions? What data do we need from the questionnaire? Which questions are directed to
the research questions? And how do we design them to focus on these?
Questions:
Are your students interested in learning xyz?
To what extent are students interested in learning xyz?
On a scale from 1 to 5, where 5 indicates “very interested,” and 1 indicates “not nterested at all,” to what
extent do you think your students are interested in learning English?
1 2 3 4 5
Likert Scales—are about a spectrum of agreement vs disagreement
Notice how important it is to know what you’re looking for, so you can figure out how to find it (p. 62)
Compehensive (comprehensible?) and “objective” questions
People have to understand the question
Questionnaires should be (merely) scientific tools that help us to measure diferent aspects of “reality”—
very similar to a voltmeter measuring an electric potential. And as such, it must measure neutrally and
objectively.
Open versus closed questions and multiple item responses
multiple choice/ Likert scale ? or open answer—Interviews are like this--Can we get smaller samples? Focus groups?
What about response sets and acquiescence responses?
People may respond “agree” when they think they’re supposed to be in agreement, rather than when they
are in agreement. One must be very clever in posing the questions.
The Oviatt has an electronic copy of Rasinger’s 2007 book Bengali-English in East London: A study I urban
multilingualism.
Coding the questionnaire = turning things into numbers.
Creating data matrices
The coding of all other variables works analogically: every potential variable value is assigned a particular
numerical value. Likert scales get numbers. Look at Table 3.3
Section 3.5
Summary
Chapter 4 by Erez Levon
About the author: http://webspace.qmul.ac.uk/elevon/
4.1
What quantitative analyses do
They’re about counting and measuring. For something to be counted, two conditions are “normally
considered to be necessary:”
a.
what we want to count must be countable
b.
what we want to count must have the potential to be variable (in other words, the results we
expect to get when we count must not be constant—who would care? And see below).
What does this mean?”
“The condition of quantifiability requires that you operationalize the possible set of responses so that they
can be counted in a clear and coherent way.” (and we’re referred back to 3.2).
An example is given of creating categories for the question about which issues most affected voters’ choice
of candidates, the categories being environment, economy, and education.
Using categories gives structure to the diversity of responses; we count with a purpose, and enter the
numbers we reap into the structured categories (with outliers in other places) , and this piece of
operationalization is called coding.
Think of the range of research questions you could ask in the context of this example—and see what the
implications for the design might be.
And more on variability: the variability requirement is about the possibility of variation—if we actually
find none, it’s ok; variability is a requirement about the possible existence of variation—it must be
assumed to exist in principle.
Statistics fall into two general categories: descriptive statistics and inferential statistics.
Descriptive statistics provide indices about the general shape or quality of the data—they include mean
and median.
Note the claim: “What these calculations (mean/median) allow us to do is identify potential patterns in our
data set.” We need to find out if the pattern has any weight to it. So, we must turn to inferential
statistics to learn whether what looks like a pattern (a correlation) is one—whether the apparent
dependency is justifiable.
The earrings vs shoes example works (does the result that more people without earrings bought red shoes
mean anything, or is it the consequence of the fact that more people overall weren’t wearing earrings, so
people with no earrings bought more of everything, not just red shoes.). But to test whether or not there
may be a pattern, we turn to inferential statistics.
And we need a hypothesis to test in the context of collecting and evaluating the data.
An experimental hypothesis is a(n educated) guess about what might be going on with the data. Note
that many linguists argue that the hypothesis should be active even before the data are collected. We don’t
wander out, and collect lots of data to see what might be there…
An experimental hypothesis suggests that the variation in our data (dependent variability) will depend on
some other factors (factors that we select and vary independently).
And the null hypothesis avers that there is no relationship—that whatever pattern we (think we) see is
accidental. What we’re doing, in the analysis of data we collect in the presence of a hypothesis, in fact, is
trying to falsify the null hypothesis. And that’s the result we’re looking for. Levon makes a point of
stressing that: “Inferential statistics provide a probabilistic measure that allows us to gauge the extent to
which the null hypothesis is true—or false. What we want is for the likelihood that it is true to be very,
very, very small—that’s why we like those “p-values” with lots of zeros following the lone decimal point
after a zero: 0.05 (at least, or should I say most.). We like to find statements such as
p ≤ 0.01
Let’s look at the summary on p. 72.
So, how do we get those “p-values”?
4.2
What Quantitative Method to Use
In other words….
“So, how do we get those “’p-values’”?
There are “hundreds of different inferential statistical tests” Really? Goodness.
The choice, however, depends on the kind and number of variables we consider.
Categorical Variables—is it an x or a y or a z ? Are they wearing earrings or not?
Continuous Variables—Anything scalar. How old? (a scale)—35 or 40?
In this chapter, we’re looking only at studies with one independent and one dependent variable. But either
of these these can be categorical or continuous.
When the independent variable is continuous, correlation analyses are used.
When there are combinations of categorical and continuous independent variables, other kinds of
statistical tests are used (Generalized Linear Models, Linear Mixed Models).
Many studies have more than one independent and more than one dependent variables, and these can get
matched in a variety of different ways—(ANalysisOfVAriance)s, (when there are more than one
dependent variables) MultivariateANOVAs (to find out more about the relationship between the
independent variables themselves, between them as a group and between the dependent variables, as a
group as well and to compare the effects of the multiple independent variables on the multiple dependent
ones), and Linear Regressions are given as examples.
When the independent variable is categorical, the independent variable can still be either categorical or
continuous. The chapter looks at both, as the character of the dependent variable with regard to this issue
determines the choice of the statistical test selected:
chi-square (χ2) is selected when the dependent variable is categorical
t-test is used when the dependent variable is continuous
Chi-square tests calculate what the distribution of variable values would be if the null hypothesis were
true for the sample being studied. This distribution is compared to what is actually found, so that the null
hypothesis can be tested (and falsified).
They cannot be used to examine data from continuous dependent variables.
t-tests are used with continuous dependent variables.
Descriptive methodologies are also used, and are the first steps in t-tests. The example—with the varying
range of heights of 10 people—heights are the variables, of course—could be a set of dependent variables
(we want to see what relationship, if any there is in taking some vitamin and the height of some group of
people) or of independent variables (we want to see what relationship height has to shoe preference—tie
or slip-on). In either case, we can describe the variation in height using the mean, the median, and the
standard deviation.
mean: the average value in a given set of values
median: the midpoint in a set of values, where half of the values are below that one and half are above
standard deviation: the extent to which any single value will deviate from the mean—this tells us how well the
mean actually represents the set of values, and experimenters usually seek 1 standard deviation from the
mean as a “good sign.”
t-tests examine the means and standard deviations of two sample populations. The goal is to find
out if the respective values of the means are significantly (or not significantly) different from one another
(beyond what their respective face values say).
A note:
Not to belabor the obvious, but “significant” and “significantly” have formal definitions here—we are
talking about computed numerical values that are considered reliable in deeming results significant or not.
It seems important to remind ourselves of that.
4.3
4.3.1
Processing the data
Chi-square tests
Jason Baldridge on definiteness and indefiniteness in Hindi
http://www.ling.upenn.edu/~jason2/papers/hindidef.htm
Note: LING and Computers course at UT Austin—Jason Baldridge:
http://lnc-s11.utcompling.com/course-notes
The Hindi indefiniteness example. Note the term calquing. What does it mean? How is it used here?
Note the dependent variable: use of Hindi-derived articles [Null article vs Overt article]
Independent variable: speakers’ educational and functional level in English [Groups 1, 2, and 3]
One each, both categorical.
So, chi-square is the evaluative measure of choice.
It works on “raw data:” raw numbers. It requires at least five tokens per cell (what’s a cell?)
Steps:
1.
Create a table of observed data with cells holding the data. Notice how the data from the
independent and dependent variables are arranged.
2.
Make sure that there are at least 5 tokens in each cell or a total of tokens from the study that is
equal to five times the total number of cells. We have 6 cells in Table 4.1, and our total of tokens (=null
and overt articles found—all sentences that needed articles) is 380. That’s way more. And we already
have more than 5 tokens (=null vs overt articles found) in each cell. This is about the robustness of the
data.
3.
Construct a table of expected values. The expected values reflect what we’d get if the independent
variable had no effect on the dependent variable.
Computing expected values: For every cell, multiply that cell’s column total by that cell’s row
total and then divide that number by the grand total of values (the grand total of all collected tokens; it’s
380 for us. So, for cell one, it’s 42 x 177/ 380. Do that for each cell.
4.
Next, we compute the difference between the observed value and the expected we square that
value, and then we divide that square by the expected value
 (Observed – Expected)2
Expected
5.
Find the value, and based on that value, the determination of the p-value is undertaken.
Cowart, Wayne (1997). Experimental syntax: Applying objective methods to sentence judgments.
Thousand Oaks, CA: SAGE
P291 .C68 1997eb
6.
The next step is to determine the degrees of freedom: df
DF may be thought of as setting the general parameters under which the statistical test holds true
df = (# of chart rows – 1 ) x (# of chart columns – 1)
7.
We look at a significance chart, next, to find out what p-value is associated with our chi-square
statistic with what we discover are its “available” degrees of freedom. [look at the chart on p. 81]
What we want to find is a high percentage of chance that the null hypothesis is false. In the book’s
example, we can be 95% confident that it is—that is what a p-value of p=0.05 means. In fact, the data
presented yield a p-value of 0.0001; our chapter author stopped at what is “usually expected in the
humanities and social sciences.”
Another example is given on pp. 81ff, using data about African American English.
Note that from the table on p. 82, the first “expected value” would be
62 (column total) x 23 (row total) / 88 (study total). [16.2] Next we begin computing the chi-square
statistic, which is the result of adding all the computed values of each result in
(ov – ev)2
So, for the first cell we have (20—16.2)2
ev
16.2
Make certain we understand the summary on p. 83
Note that Blake and Cutler’s article on AAE is available for looking at—put it up on Moodle.
4.3.2 t-tests
used for data with one independent and one dependent variable, where the independent variable is
categorical, but the dependent variable is continuous.
Carmen Fought’s 1999 analysis of vowel fronting among Latina/o speakers in Los Angeles.
Very roughly,
F1 is pharynx
F2 is mouth
Table on p 84
The independent variable: Class
The dependent variable: ratios of /u/ to /i/ F2s
Step 1
Calculate the mean and standard deviation for both categories of F2 ratios
The mean is the average
The SD is calculated thusly:
First, calculate the difference of each data point from the mean, and square the result of each: Next, calculate the mean of
these values, and take the square root: This quantity is the population standard deviation, and is equal to the square root of the
variance.
Step 2 Calculating the t-test statistic
Knowing which of the many to use depends on two things:
a.
we need to know if we have paired or unpaired data. Paired data refer to experiments where
there is some natural relationship between subjects in each of the two groups before the data
are even collected. The most common example of paired data is what is called a repeated
measures experiment—e.g., “before and after” tests.
b.
Determine whether the two groups in the analysis are of equal or unequal size.
The formula for independent (unpaired) equal samples:
t = the difference between the means of the two groups’ values
the pooled standard deviation
The pooled standard deviation is computed this way:
√ s 21 + s22
__________
n (where n=the number in each group (they’re equal; it’s 13 in this example)
Follow the steps on pp 85-86
Then, calculate the degrees of freedom. For t-tests of independent samples with equal sample size, we
calculate df by taking the total number of subjects in both groups and subtracting 2. In these data, the
total was 26, so 26-2=24
Next, we take this value to the significance chart—one that is designed for t-tests.
Let’s look a bit at the discussion on p. 87-The last example given is from a classic study by Wallace Lambert, et al. in Montreal—the matched guise
experiment.
The groups of subjects were different size—so a different t-test calculation is needed.
Research methods in Sociolinguistics
Holmes and Hazen P40.3 .R47 2014
Research methods in sociolinguistics : a practical guide / edited by Janet Holmes and Kirk Hazen. Wiley/Blackwell
2014.
t-test is summarized on p. 89
4.4
Interpreting the results
Looking again…
Carmen Fought’s results:
Statistical significance and “real world significance” are not always the same thing. But…you’d have to do
a matched guise test to see how sensitive speakers are to the differences—as small as they seem. Or,
introduce and discuss qualitative data—did she do this?
“The basic point is that quantitative methods can only take you so far. They can act as a crucial first step
in mapping out the sociolinguistic terrain and in telling you what people are doing with language.
The Blackwell Guide to Research Methods in Bilingualism and Multilingualism. Li Wei and Melissa G. Moyer, eds.
Blackwell. 2008. P115 .B575 2008
Download