How (not) to write up quantitative results

advertisement
Results
... of quantitative research
... A work in progress
Names of chapters after Method
• What key words might we find?
–
–
–
–
–
–
–
Data Analysis
Findings
Results
Interpretation
Discussion
Implications
Conclusion
Results and Discussion separate
• Which would go in which?
1)
2)
3)
4)
5)
6)
Graphs derived from the data
Tables of means and percentages
Significance tests
Talk about what the graphs and other stats show
Talk about whether the stats show what was expected
Comparison /synthesis of what one instrument showed with
what another showed about the same thing
7) Talk about whether what is shown fits teacher experience
8) References to the RQs / RHs
9) References to what other studies found (covered in lit review)
10) Talk about how the qualitative data support/complement the
quantitative
11) Talk about implications for theory
Organisation of Results
• Whether or not Results are first given ‘bare’...
how to organise?
A) Instrum1…Instrum2…Instrum3
B) Descriptive stats….Inferential stats
C) RQ1….RQ2….RQ3…. or RH1...RH2...RH3...
D) Interesting theme1...theme2...
Refer back to the research Qs and
hypotheses
•
Not…
– The first research hypothesis assumes the teachers in
a private school use more English....
•
But…
– I expected that private school teachers would explain
meaning in English more than state school teachers
(2.3.2). Accordingly my first hypothesis (2.5.1) was
that there would be a difference between state school
and private school in the extent to which teachers
explain the meaning of new words in English....
Refer back to the variables
• Not…
– The two explanatory variables are the state school and the
private school
• But…
– One of our explanatory variables is school (state and
private)
• Not….
– This essay presents the results of correlating the two types
of school with the participants’ reading scores
• But….
– We first present the result for the relationship of school
with reading score
Results: Three things to report... And
interpret
• Descriptive stats
• Graphs and tables
• Inferential stats
How to persuade with percentages
• Which sounds more impressive, A or B?
A) 2 out of 4 subjects agreed
subjects agreed
B) 50% of
A) 80 out of 160 subjects agreed
subjects agreed
B) 50% of
• OK, but which result would you actually trust
more? How should one report such results?
How to perplex with percentages: What’s
it out of?
• What is unclear? How to restate this better?
In our survey we polled 50 people, though 10
declined to participate. …. 60% said yes to the
question ‘Do you like the English class?’…
How to perplex with percentages:
Report percentages or mean ratings?
• A five point Likert scale for agreement has been used. Any
problem?
52% of the student sample strongly agrees and 43% of them agree that
it is important for them to have the least number of errors in
writing, which leaves 2% only who disagree to this.
While 33% of the sample disagrees and 13 % strongly disagree that
correcting and discussing their writing errors in front of the class is
embarrassing for them, however, 25% agree and 22% strongly
agree. Adding up the percentages of the agreements and
disagreements together, it turns out there is a close outcome on
this point, almost a tie.
On our 1-5 agreement scale the mean rating was 3.1, close to the
midpoint
How to perplex with percentages:
Slippage of interpretation
Teachers were asked (item 10) whether they agree that
the students achieve the minimum level of the
objectives of the program. From figure 4.5 we can see
that there is no noticeable difference between the % of
teachers who 'strongly agree', 'agree', or 'disagree'
(around 31% each). ........30% of teachers report that
their students do not achieve the minimum level of the
program objectives. Thus, 70% of the pupils are
considered to have achieved most of the objectives
regardless of their level of proficiency, and only a third
of the pupils fail to achieve most of the objectives.
How to confuse with counts:
Percentage scores versus
group/aggregate percent
• Two ways of handling data arising from different
numbers of potential occurrences for different people.
Imaginary example of data where three subjects have
been recorded in quasi-natural conversation, and
counts have been made of their NS-like/correct use of
third person –s.
• Why do the percent differ in A and B? Which would the
statistician prefer and why?
A) Analysis with subjects as cases: percentage
scores and their mean
Case
Correct
Incorrect
Total
Percent
correct
Lnr1
12
12
24
50
Lnr2
8
12
20
40
Lnr3
3
9
12
25
Total
23
33
56
Mean %
correct
38.3%
B) Analysis with occurrences as cases: group
percent
Group frequency
Percent
Correct
23
41.1%
Incorrect
33
58.9%
Total
56
How to muddle with means: Appropriate
scales?
• Reporting background information about teachers of
young learners. What has gone wrong?
Mean
SD
N
AGE
2.0727
.83565 55
EDU
1.4182
.49781 55
MAJOR
.6727
.47354 55
Years of TEYL
2.4182
55
Years of teaching in Primary school
2.2000
.70448 55
Difficulty of teaching mixed ability class
3.1273
.88306 55
• The questionnaire items were of the following types:
- What is your age?
21-25
26-30
31-35
- What is your educational level?
BA
MA
PhD
- How many years have you been teaching in Primary school?
Less than 2
2-5
6-10
How to muddle with means: Slippage of
interpretation
Students produced each answer based on a Likert scale ranging
from 1 (strongly disagree) to 5 (strongly agree). See table X.
Table X – Students’ answers to writing quality items
Item
Number
Item wording
Average
number of
students
2
I don’t pay more attention to spelling when I use
the computer instead of writing by hand
1.9
6
The computer cannot help me write my papers
better
2.9
7
I am more careful about punctuation when writing 2.5
by hand than with the computer
Etc.
How to muddle with means: What does
mean mean?
• Do these make sense?
– We can see from the mean in Table 5 that all our participants scored above
average on the reading test
Test
Min
Max
Mean
SD
Reading
/60
33
57
42.8
8.35
Writing
/50
22
43
30.7
5.46
- We can see from the mean in Fig. 2 (4.72) that most of our participants scored
above average
Joke from WWW: Most of us have A Greater Than Average Number of Legs
The great majority of people have more than the average number of legs.
Amongst the 57 million people in Britain there are probably 5,000 people who
have only one leg. Therefore the average number of legs is
((5000 x 1) + (56,995,000 x 2)) / 57,000,000 = 1.9999123.
Since most people have two legs... need I say more?
- We can see from the mean in table 3 that the most popular score was 21
30
20
10
Std. Dev = 5 .08
Mean = 21
N = 72 .00
0
10 - 15
20 - 25
15 - 20
30 - 35
25 - 30
PARENT AL ENCO URAGEMENT
40 - 45
35 - 40
45 - 50
Talk about means… but not like this
• Not...
– This table shows the two figures collected from the thirteen participants.
•
But...
– Table 4.2 shows the means and SDs from our sample
• Not...
– Custom tables present the averages (means) of the two languages
• But...
– Table 4.2 presents the means for the two languages
• Not...
– To support what I have found in the error bars <of the three groups>, now
I will check the information provided in Table 4. As is easily seen, the
averages of the three groups... are similar to the means in the error bars.
• But...
– From the means (Fig 3 and Table 4) we can see....
• Not...
– The difference between mean scores is 17,9250%
• But...
– The difference between means scores is 17.93%
• Not...
– The difference between the means is instinctively large (0.4907 and
2.6111).
• But...
– The difference between the means is intuitively large at 2.12, given the
length of the scale (5 points)
• Not...
– Moreover, the averages/means of the state school is .46,
and the private school is .54 which shows that the results
are not as significant as we expect...
• But...
– The means of the schools are similar (state school M=0.46,
private school M = 0.54). This is not a pedagogically
substantial difference and furthermore it is not significant
(t=1.34 , p=0.563): hence our hypothesis is not supported
Talk about standard deviations… but..
•
Figure 2 shows the total number of respondents, their means and SDs
•
Ok as far as it goes, but what does the SD really tell us? Is any of these near
the truth?
1)
Both of the standard deviations for the two variables seem fairly large….
This would seem to infer that ..... there seems to be distribution of the
two variables throughout the speech community.
2)
The SD of present tense is larger than that of past tense…., so the
population of present tense is more spread.
3)
The standard deviation between the two schools is not significant
4)
All SDs are large which suggests that the students generally agree on the
answers they gave...
5)
The SDs of the two schools are .59 and .64, which are close to each
other....and it denotes that students agree with each other on the matter
and that there is no clear difference between the two groups.
How to cheat with charts
• http://privatewww.essex.ac.uk/~scholp/onevardesc.htm
Talk about graphs... but..
•
Not...
–
•
For rating scale scores, we need to use a histogram instead of a bar chart because we have
two EVs to compare with and the scale of scores need to be shown. Besides, we need a
graph for each school separately. So we can go through the routine to make a graph. First of
all, go to the Graph…Histogram, and choose the “teacher explaining words in English”
column and click it to the Variable item. Then put “school” in Column and press OK the
graph will come out.
But...
–
•
<nothing>
Not...
–
•
As the above graph indicates....in general there were rather more 5s than 4s
But...
–
As we see in fig 4.6, the number of students who rated English at school highly useful (5) far
exceeded those who gave it lower ratings.
•
What are the key terms to avoid these clumsy
statements describing graphs?
1) From the graphs we can see that…The scores are pushed
in different directions against one side in each graph
2) There are about 40 and 45 students stand on the first two
scale, respectively... heaped to the left/at the end of the
scale
3) Figure 2 provides us with the responses which are
inclined in different positions
http://privatewww.essex.ac.uk/~scholp/distrib.htm
•
Premature interpretation of graphs
– According to the histogram, it represents most state school
teachers do not use English to explain words as much as
private school teachers do. So all of a sudden, we may feel
that the research hypothesis is correct.
– Because of the large similarity between the two
histograms we can assume that the hypothesis does not hold
water.
– Both from graph 11 and 12, we see there is a significant
difference between the averages.
Talk about correlation and the
associated scatterplot
• Not...
– The two variables are not comparable, so a Pearson
correlational design was chosen
• But...
– Since we are interested in the relationship between two
continuous variables, the Pearson correlation was chosen
• Not...
– To investigate the correlation, the Pearson r correlation
coefficient is used as a method.
• But...
– In order to quantify this relationship, the Pearson r
correlation was calculated
•
•
•
•
•
•
Not...
– The correlations suggest that writers' linguistic knowledge is more likely to
have an influence on their use of affective behaviors than their writing
performance.
But...
– The correlations suggest that there is a stronger relationship between writers’
linguistic knowledge and their affective behavior than their writing
performance
Not...
– Table 3: Mann-Whitney significance test of correlation between school type
and teacher's word explanation in Greek and English.
But...
– Table 3: Mann-Whitney significance test of the relationship between school
type and teacher's word explanation in Greek and English.
Not...
– As you can see in the graph, there are some squares particularly off-line…in
fact, these two subjects were the ones plotted far off the trendy line.
But...
– As seen in Figure 4.3, there are two outliers which are markedly distant from
the overall linear trend
Talk about normality... wrong
1)
The bars and the mode are skewed to the end in both histograms and
the population is concentrated around the median.... so we do not
see a nice bell-shaped heap
2)
From the “bell-shaped heap”, we suddenly find out the heap of the
scale in both schools are the same, both on 0.5 which tells us that the
frequency the teachers explain words in English are the similar.
3)
sd in the state <school> is .59 and sd in the private is .64. Again this
proofs that the variables are not normally distributed in the two
schools.
4)
As the distribution is so non-normal for both variables, it is clear that
these results would have a significant effect upon the overall picture.
5)
The test distribution seems normal based on the figures <of the K-S
test>. That is why you use the Wilcoxon test not the t test.
Talk about normality... right
http://privatewww.essex.ac.uk/~scholp/distrib.htm
• The distribution of our sample data is skewed (fig 5),
indicating that the population sampled is unlikely to be
normally distributed
• The K-S test was non-significant, indicating that the
distribution of our data is not consistent with the normal
distribution
• Since our data is not distributed normally, we do not use
parametric significance tests / t tests / ANOVA but instead
use the Mann-Whitney tests / etc. / the ordinal option of
the Generalized Linear Model
Talk about the need for inferential
stats...
•
Not good thinking
1)
To ensure the above results to show the right things, I need to do the
inferential test of the difference... to look for what the p or sig value is...
2)
A Wilcoxon test was used to provide more details about our results. This
kind of test can provide more explanation of the use of language
3)
…to see if this relationship is one which bears some significance and
goes some way to supporting the hypothesis, some descriptive statistics
are needed.
• On the right lines but badly worded
4) Hence this needs to be tested inferentially... guiding us to the way we get
our results from some population. To assure the given results,... a T-test
was taken.
5) In addition a Mann-Whitney test was taken to ensure that there is a
sampling error that causes this slight difference.
• Better
6) The Mann Whitney test will indicate whether the null hypothesis might be
supported, i.e. the difference between the means might be so small that it
could easily arise by chance due to sampling error. Alternatively RH1 might
be supported.
7) In order to check if our results can be generalised to the wider population
which was sampled, we use inferential statistics.
Talk about p values and significance
•
Not...
–
•
But...
–
•
The difference between the two kinds of use is therefore
significant (p<.001).
Not...
–
•
It can therefore be said that the differences between the
different types of use are significant, as p<0.000
The P value, in Fig. 4, it calls sig. value, is .000. But
actually it is not equal to zero, as here it only shows 3
decimal places, it should be something a little bit more
than zero.
But...
–
From table 4 we see that p<.001
•
Not...
– In fig. 4, the sig. value (p value) is less than 0.5 for
each gender.
•
But...
– The difference between genders is significant (p<.05)
•
Not...
– The sig value is (.001), and therefore, the difference
is significant, since the sig value is substantially
smaller than the conventional significance level of
.005.
•
But...
– The difference is significant at the alpha level of .05
(p=.001)
• Not...
– The p value seems to be significant enough to
reject the null hypothesis. LED <Level of Education
of learners of English> makes a significant
difference with this type of information <keeping
notes of the translation and English definition of
new words>
• But…
– There is a significant difference between LEDs
<Levels of Education of learners of English> in this
type of information <keeping notes of the
translation and English definition of new words>.
Hence we reject the null hypothesis.
•
Not...
– The value of the significance test result is .170
and cannot considered to be significant.
•
But...
– The result was non-significant (p=0.170)
•
Not...
–
•
This is well above the level of .05 and so the
difference here is insignificant.
But...
– This is above our alpha threshold of .05 and so is
nonsignificant.
Talking about various inferential
stats: Interpretation of significance
• Not...
– The Pearson correlation coefficient shows no significant difference.
• But...
– The Pearson correlation showed no significant relationship
• Not...
– In the Wilcoxon test, the figure of z is -.8928. And the sig <.001 is
smaller than .05 so we can infer the samples are chosen randomly
from the population
• But...
– There was a significant difference between the two conditions
(Wilcoxon z = -.893, p<.001)
• Not...
– I applied the Wilcoxon test to make sure that the samples are
normally distributed from populations or not.
• But...
– The Wilcoxon test was used to test the significance of the difference
between the two conditions / occasions
• Not...
– t test show that the samples can't represent a real population.
• But...
– The t test shows a significant difference between the two groups
•
Not...
– <In a study comparing various aspects of writing
quality of two groups of students on two
occasions, before and after a term in which one
had written compositions on computer and the
other group by hand > Spelling was significant,
but content was not.
•
But...
– Pre-post improvement was significantly different
between groups for spelling but not for content.
Talk about implications of results for
hypotheses and research questions
•
Which is better?
–
–
–
•
This proves the hypothesis.
This reinforces my hypothesis.
Our hypothesis is therefore supported.
Which is better?
–
–
–
–
The null hypothesis will be refused
It is assumed that the null hypothesis, according to which
there is no difference…. will be rejected.
The null hypothesis is not supported
Therefore... I infer that there will be a null hypothesis to
the RQ.
Download