Stats

advertisement
Bridget Smith
Response to and analysis of statistics for “Eth – forsake thigh name”
Throughout the course of my research dealing with variability in the dental fricative,
especially voicing, I have attempted a number of different ways to analyze and display my data. The
earliest attempts were abysmal, but at the time, I had only one, then two speakers to look at. Going
from studies such as Stevens, et al. (1992), and Pierello, et al. (1997), I looked at voicing, but also
included measurements such as intensity and duration, and took into account such factors as phonetic
environment, and to some extent stress. I tried to graph the environments very explicitly at first, such
as whether the preceding sound was a liquid or a nasal, a palatal or a velar, in addition to whether or
not it was voiced, hoping to remove all exceptions. With only two subjects, there was no statistical
analysis.
For my second year paper, I had 7 usable speakers analyzed. I continued to play around with
varying measures of intensity, due to their importance in previous studies of fricatives, but finally
dropped these because they were not remotely descriptive of voicing, which I had decided was the
issue I wanted to focus on. Intensity ratios may again come into play later, when dealing with
variation in sonority. I also realized that my interpretation of stress was lacking, and gave that up to
focus on phonetic environment, specifically voiced and voiceless segments immediately adjacent the
dental fricatives in question.
I would like to point out that these meanderings have given me more than what I had hoped –
in that not looking directly at what I had thought I might find – that is weirdness in the voicing of
<th> - but rather looking at all the various possibilities surrounding <th>, I have stumbled upon
wondrous kinds of variation that I have been able to explore from an historical perspective. What
began as an exercise in phonetic analysis has expanded to include phonological and historical
methodologies, which have become even more exciting than the purely phonetic analysis.
The data presented in my second year paper included for both theta and eth and /f/ and /v/,
duration of the fricative and duration of voicing within the fricative. (Scatterplots of intensity and
duration were created, but dropped because they contained no useful information. It was simpler to
state that intensity did not distinguish phonemes or voicing.) Two charts were created for both /f/ and
/v/ and theta and eth for each speaker One showed the duration of the fricative and voicing sorted by
phoneme, and the other sorted by phonetic environment. The graphs from only one speaker were
used in the final paper, along with assurances that, although the other speakers were not as clear as
the exemplar I’d chosen, they did pattern in the same basic way. No statistics were used.
The figures show duration of the fricative in milliseconds on the horizontal axis, duration of
voice bar on the vertical axis. In figure 1, the filled circles represent /f/ and the empty circles
represent /v/. Those to the right have longer duration. Because the duration of the voice bar was
measured within the period of frication, its total possible duration is limited to the total duration of
the fricative. Those tokens which are completely voiced will therefore appear along a 45 degree
angle, having a 1:1 correspondence, or slope. Those tokens for which 50% or less of the duration
shows a voice bar appear along or below a 23 degree angle, having a slope of 1:2 or less.
The selected speaker shows fairly good distribution of /f/ and /v/ according to phoneme, with a
few exceptions. /f/s are generally longer in duration, clustered to the right, and have less voice bar,
which is shown by their clustering below the 23 degree angle.
Figure 1: <f/v> duration of frication (x axis) and voice bar (y axis), sorted by phoneme.
Figure 2 shows the same tokens, sorted by phonetic environment. The filled circles are
surrounded on both sides by voiced sounds. The empty circles represent an adjacent voiceless sound
including a pause, in which the vocal folds do not vibrate. By comparing figures 1 and 2, I pointed
out that many occurrences of phonemic /f/ coincide with the presence of a voiceless sound
immediately preceding or following. Likewise, many instances of /v/ occur between voiced sounds.
These environmental conditions may be similar to those responsible for originally conditioning the
voicing status of the fricative, and have remained unchanged. This distribution resembles the
variation in voicing described by Stevens, et al. (1992). There are, however, many tokens that do not
seem to be affected by environment, and are better described phonemically.
Figure 2: <f/v> duration of frication (x axis) and voice bar (y axis), sorted by phonetic environment.
Figure 3 shows the dental fricative divided by phoneme. Theta tends to have generally less voice
bar than eth; however there is a great deal of overlap. Duration also varies greatly. Some canonical
thetas (filled circles) are voiced and many canonical eths (open circles) are voiceless.
Figure 3: <th> duration of frication (x axis) and voice bar (y axis), sorted by phoneme.
When the tokens are marked for voicing of surrounding segments, their voice bar is much better
accounted for, as in figure 4. Here the filled circles represent an adjacent voiceless segment or pause.
Empty circles represent surrounding voiced sounds. Unlike with /f/ and /v/, the distribution of
voicing in dental fricatives is far better accounted for by phonetic environment than that which was
achieved by phonemic sorting.
Figure 4: <th> duration of frication (x axis) and voice bar (y axis), sorted by phonetic environment.
The benefit of this display is that it shows duration of voicing compared to overall duration.
This make it easy to find individual exceptions, and was effective in presentation format, as I could
show spectrograms and play sound files of the exceptions in situ. Because duration did not pattern as
expected, this supports the theory that eth and theta do not pattern as /f/ and /v/ do, that is
phonemically.
The downside includes difficulty explaining the significance of the 23 and 45 degree angles.
It also limits my ability to demonstrate that the pattern occurs on a larger scale. When pooled
together, the data from the 7 subjects do not present a compelling visual pattern because each
individual had different patterns of variation.
In my analysis for the statistical methods course, I presented only the voicing data in barplots,
comparing the phonemic sorting with environmental sorting. These results were even more
astounding. With all 7 subjects’ data combined, a pattern emerged that was not visible in the
scatterplots. The boxplot shows percentages of voicing (as opposed to raw duration) with the main
boxes containing values up to 2 standard deviations from the mean (95% of the data), which is
shown by the black bar across each box. This particular boxplot is actually a composite of two
separate boxplots, which highlights their differences. The grey boxes represent phonemes. Phonemic
theta is shown to the left, and has longer “whiskers” (one additional standard deviation) and more
circles representing outliers. (This can be seen more easily on the individual boxplot, but to conserve
space, you’ll just have to take my word on it.) The mean is around 20% voiced. Phonemic eth is
represented by the grey box on the right. Its “whiskers” extend all the way to zero, and its mean is
around 50% voiced.
Figure 5: <th> accounted for by phoneme in grey, and by phonetic environment in white. 7 speakers, 396 tokens total.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
The white boxes represent environmental voicing, so that the value of “1,” which is overlaid
on theta, represents those tokens which had an immediately adjacent voiceless segment, including a
pause. The white box to the right, with the value of “2,” indicates those tokens which were
completely surrounded by voiced segments. (At this point, I must confess that I need a line of code to
help me disentangle “th” from “1” and “eth” from “2” so that this is more clear.) According to this
figure, theta is only slightly better accounted for by environment than phoneme. This may have to do
with the fact that there were a disproportionately higher number of eths (286) than thetas (110), but it
may also point toward greater stability of phonemic theta. This deserves looking into, and was not as
apparent in the scatterplots, so is a benefit of using the boxplots. The value “2,” for a completely
voiced environment has only 174 tokens, while “1” for any voiceless environment contains 222
values, but falls into a similar, though slightly smaller range as the voiceless theta tokens. This does
point to better accounting for variance than with phoneme. Eth, however, shows drastic improvement
by accounting for environment. True, the number of tokens are fewer, but the range is so greatly
diminished, and raised to the voiced end of the spectrum, with a mean of 100% voiced, that we
cannot account for this by number alone.
The boxplots have the advantage of dovetailing nicely with a statistical measurement. For one
thing, the results for all of the speakers are combined into one graph, as they should be combined for
any kind of statistical measurement. Also, the comparison between phoneme and environment can be
displayed in this one single graph. Voicing, or “voice bar” is represented by percentages, which are
easier to understand than the 23 and 45 degree lines arbitrarily imposed by the raw duration. The
downside is that individual tokens are not accounted for, and duration is no longer mentioned.
(Because it is only interesting in comparison with /f/ and /v/.)
For the statistical methods paper, I followed Mary’s lead by producing a “squared partial r” as
described by Cohen and Cohen (1983). If I understand it correctly, the equation takes r^2 from the
linear regression of y against x1 and x2, (or the proportion of variance accounted for by x1 and x2)
then subtracts the r^2 from a linear regression of y against only x2 (proportion accounted for by x2),
then divides the whole thing by 1 minus the proportion from x2.
R2Y.12=summary(lm(dat$percVoice~dat$vless+dat$phonenumber))$r.squared
# R2Y.12 yields a proportion of [1] 0.6432622 that is accounted for by
both phoneme and environment
r2Y2=summary(lm(dat$percVoice~dat$phonenumber))$r.squared
# r2Y2 yields only [1] 0.1365537, which is how much variance is accounted
for by phoneme alone
# subtracting the one from the other
R2Y.12-r2Y2
# yields us [1] 0.5067084
# then the denominator is 1 minus the proportion from phoneme, or x2,
# so the total that is completely unaccounted for by phoneme
/(1-r2Y2))
# and the total of the entire computation
[1] 0.2019833
This is a proportion of how much x1 accounts for the variance that x2 does not, out of the total
variance that it does not account for. Proportions (or percentages) are nice because they are easy to
read and understand. So, for instance, phoneme accounts for only 20% of variance after phonetic
environment is taken into account. The proportion accounted for by environment minus phoneme,
however, is 0.5868442, almost 59%.
Numbers feel like a more tangible way of describing the data, but they can be easily misused.
Looking at these figures, I begin to wonder if this is the best way to present the data. There is more
nuance in the scatterplots, and exceptions can be accounted for. If you are looking for the “big
picture,” the numbers suggest that the idea of phonemic categorization of <th> is nearly non-existent.
These numbers allow us to overlook the fine details that suggest a much more complex pattern than
is suggested by these numbers. Looking at exceptions in the scatterplots leads to more detailed
examination of the spectrograms, which yields interesting information about voicing, duration, and
sonority, and possibly more. Even the barplot, with greater accuracy of theta than eth, leads to more
questions about why this is the case, which may shed even more light on these patterns.
It is interesting and significant that phonetic environment is a much better predictor of
voicing than phoneme for <th>, but it doesn’t tell the whole story. It may have statistical power, but,
to my mind, it loses its explanatory power. If I had not gone in search of some of the exceptions, I
would not have discovered some of the most interesting patterns, such as gemination, resyllabification, devoicing of vowels and sonorants, pre-voicing, etc., that have lead me to search
deeper, into the historical development of the entire language family, and wider, into place and
manner (in addition to voicing), stress and prosody. I don’t think my original question (going way
back to the beginning), “what’s up with <th>?” is answered by these statistics. That environment
better predicts voicing than phonemic category is not the only answer to this open-ended question.
Simple questions are great for focusing research into reasonable segments, but they must imply an
hypothesis. You have to have a reasonable assumption about how it’s going to turn out. You could
be wrong, but usually by the time you narrow the question enough to have a manageable study, you
have a pretty good idea about the kinds of answers you’re going to get. I suppose that is great for
getting grants and acceptance to conferences, and I can do that, too, but I much prefer the exploratory
methods I’ve been using, I’m afraid that when I no longer have the excuse that I’m new and
inexperienced, I won’t be able to get away with the exploratory scatterplots and the messing around
with various statistical procedures because nobody wants to fund you unless you’re halfway to your
answer already.
That being said, each method has its benefits and downsides. Which one you choose depends
on what question you hope to answer, and how you want to get to that answer. The scatterplots were
not the best method for what became my contribution to the ICHL proceedings, and my second-year
paper because the paper focused more on the answer obtained by the second analysis, that is strictly
blowing up the myth that <th> can be reasonably considered two distinct phonemes. My case would
have been stronger had I used the statistics above (or what follows in my late-breaking report) and
the boxplot. However, because I made mention of a follow-up focusing on the variation expressed in
the scatterplots, I hope to lure people into reading my GLAC paper, for which the scatterplots are
perfect. The GLAC paper delves into the variation and compares it to the historical development of
dental fricatives in the Germanic languages. The question in that paper deals more with historical
methodology (with a modern twist) and tackles some assumptions about sound change. It is more
expository in nature, perhaps best answering the question, “What does sound change look like?”
There can be room for both kinds of questions, and the trick, I suppose, is knowing when, and
in front of whom, to ask each type of question, and then being prepared to follow it up with the
appropriate figures and statistics. For instance, as Mary has pointed out, I did not take a firm stand on
the (not) phonemic results of theta and eth. Using the statistics and boxplots would have been a good
way to stake out a strong position on that issue. In the meantime, I have come up with lots of ideas
for smaller studies based on the variation I found in my exploration. The journey, though interesting,
is probably not something suitable for publication.
***late breaking news***
I finally got SPSS to do partial correlation. As expected, the numbers did not match up to my R
analysis. Here are the SPSS tables.
Correlations
Control Variables
vless
percVoice phonenumber
percVoic Correlation
e
Significance (1-tailed)
1.000
.449
.
.000
0
393
Correlation
.449
1.000
Significance (1-tailed)
.000
.
df
393
0
df
phonenumber
Control Variables
percVoice
vless
phonenumber
percVoice Correlation
vless
1.000
.766
Significance (1-tailed)
.
.000
df
0
393
Correlation
.766
1.000
Significance (1-tailed)
.000
.
df
393
0
So, then I looked at various sources for formulas for partial correlation. Either I didn’t understand the
formulas as they were written, or SPSS uses a different one, but I figured that what SPSS used was
an actual correlation, so I used wikipedia’s verbal description of partial correlation, which is “The
obvious way to compute a (sample) partial correlation is to solve the two associated linear regression
problems, get the residuals, and calculate the correlation between the residuals.” Then I messed
around in R, and came up with a very basic code:
XZ<-(lm(dat$percVoice~dat$vless))
resXZ<-residuals(XZ)
YZ<-(lm(dat$phonenumber~dat$vless))
resYZ<-residuals(YZ)
part<-cor(resXZ,resYZ)
part
which yielded the correlation of percentage voicing (percVoice) on phoneme (phonenumber)
partialling out environment (vless):
[1] 0.4494255
and changing around the code to get correlation of percentage voicing on environment, partialling
out phoneme:
[1] 0.7660576
The best news is that I can compare this value with the squared partial r, to see if the name is true to
the value. So, if I take the square root of the squared partial r, which I got from Mary’s code, I should
get the partial correlation:
> sqrt(0.5868442)
[1] 0.7660576
> sqrt(0.2019833)
[1] 0.4494255
And what do you know, that’s what they are.
This would be the additional analysis I was looking for to confirm my results. As far as additional
data, there are a lot of things that I would like to follow up on. Regarding the issue of phonemic
voicing, I think the case is made pretty convincingly with what I have here; however, voicing will
continue to be under investigation in the next few studies I do because I want to compare these
results with studies of place and of manner (realizations from stop through approximant) which will
also need to take into account such other factors as stress and prosodic hierarchy. I also want to
amass a corpus of speakers to compare patterns of variation, the stuff that’s not accounted for by
environmental voicing, and just how much is. I want to see if those patterns fit any kind of
sociolinguistic grouping, such as age, gender, or dialect. I would also like to compare patterns
between parents and children, to find out what kind of possible misinterpretation of the signal, if any,
or analogizing from one pattern to another may be involved. I would like to pinpoint the effects of
literacy, instead of just theorizing about them, using children and semi-literate populations, if
possible. This study has answered one very limited question, but it has raised many more.
References
Cohen, Jacob and Cohen, Patricia (1983) Applied multiple regression/correlation analysis for the
behavioral sciences. 2nd ed. Hillsdale, NJ: L. Erlbaum Associates.
Stevens et al (1992) “Acoustic and perceptual characteristics of voicing in
fricatives and fricative clusters,” J. Acoust. Soc. Am. 91, 2979-3000.
New! Analysis of f/v
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
partial correlation for phoneme: [1] 0.7197954
and for environment: [1] 0.4650595
almost the inverse of what I got for <th>.
Download