Chapter 17b - Analysing and Presenting Quantitative Data

Analysing and Presenting
Quantitative Data:
Inferential Statistics
Objectives
After this session you will be able to:
• Choose and apply the most appropriate
statistical techniques for exploring
relationships and trends in data
(correlation and inferential statistics).
Stages in hypothesis testing
• Hypothesis formulation.
• Specification of significance level (to see
how safe it is to accept or reject the
hypothesis).
• Identification of the probability distribution
and definition of the region of rejection.
• Selection of appropriate statistical tests.
• Calculation of the test statistic and
acceptance or rejection of the hypothesis.
Hypothesis formulation
Hypotheses come in essentially three forms.Those
that:
• Examine the characteristics of a single
population (and may involve calculating the
mean, median and standard deviation and the
shape of the distribution).
• Explore contrasts and comparisons between
groups.
• Examine associations and relationships between
groups.
Specification of significance
level – potential errors
• Significance level is not about importance – it is
how likely a result is to be probably true (not by
chance alone).
• Typical significance levels:
– p = 0.05 (findings have a 5% chance of being untrue)
– p = 0.01 (findings have a 1% chance of being untrue)
[
Identification of the probability
distribution
Selection of statistical tests –
examples
Research question
Independent
variable
Dependent variable Statistical test
Is stress counselling
effective in reducing
stress levels?
Nominal groups
(experimental and
control)
Attitude scores
(stress levels)
Paired t-test
Do women prefer
skin care products
more than men?
Nominal (gender)
Attitude scores
(product preference
levels)
Mann Whitney U
(data not normally
distributed)
Does gender
influence choice of
coach?
Nominal (gender)
Nominal (choice of
coach)
Chi-square
Do two interviewers Nominal
judge candidates the
same?
Rank order scores
Spearman’s rho
(data not normally
distributed)
Is there an
association between
rainfall and sales of
face creams?
Ratio data (sales)
Pearson Product
Moment (data
normally distributed)
Rainfall (ratio data)
Nominal groups and quantifiable
data (normally distributed)
To compare the performance/attitudes of two
groups, or to compare the performance/attitudes
of one group over a period of time using
quantifiable variables such as scores.
Use paired t-test which compares the means of
the two groups to see if any differences between
them are significant.
Assumption: data are normally distributed.
Paired t-test data set
Data outputs: test for normality
Case Processing Summary
Cases
Total
Valid
Missing
N
Percent
N
Percent
N
Percent
StressTime1
92
98.9%
1
1.1%
93
100.0%
StressTime2
92
98.9%
1
1.1%
93
100.0%
Tests of Normality
Shapiro-Wilk
Kolmogorov-Smirnov(a)
Statistic
df
Sig.
Statistic
df
Sig.
StressTime1
.095
92
.041
.983
92
.289
StressTime2
.096
92
.034
.985
92
.363
a Lilliefors Significance Correction
Data outputs: visual test for
normality
Statistical output
Paired Samples Statistics
Mean
Pair
1
StressTime1
StressTime2
Std.
Deviation
N
10.3587
92
3.48807
8.7500
92
3.19555
Std. Error
Mean
.36366
.33316
Paired Samples Test
Paired Differences
95% Confidence Interval
of the Difference
Mean
Pair
1
Stress Time 1
Stress Time 2
1.60870
Std.
Deviation
2.12239
Std.
Error
Mean
Lower
Upper
t
.22127
1.16916
2.04823
7.270
df
Sig. (2-tailed)
91
.000
Nominal groups and quantifiable
data (normally distributed)
To compare the performance/attitudes of two
groups, or to compare the performance/attitudes
of one group over a period of time using
quantifiable variables such as scores.
Use Mann-Whitney U.
Assumption: data are not normally distributed.
Example of data gathering
instrument
Mann-Whitney U data set
Statistical output
Tests of Normality
Shapiro-Wilk
Kolmogorov-Smirnov(a)
Sex
Attitude
Statistic
1
2
df
Sig.
Statistic
32
.000
.815
32
.167
68
Ranks
.000
.909
68
Test Statistics(a)
Attitude
Wilcoxon W
492.500
1020.500
Z
-4.419
Asymp. Sig. (2-tailed)
.000
a Grouping Variable: Sex
Ranks
Ranks
Attitude
Sex
1
2
Total
Sig.
.298
a Lilliefors Significance Correction
Mann-Whitney U
df
N
Mean Rank
32
31.89
68
59.26
100
Sum of Ranks
1020.50
4029.50
.000
.000
Association between two
nominal variables
We may want to investigate relationships between
two nominal variables – for example:
• Educational attainment and choice of career.
• Type of recruit (graduate/non-graduate) and
level of responsibility in an organization.
• Use chi-square when you have two or more
variables each of which contains at least two or
more categories.
Chi-square data set
Statistical output
Chi-Square Tests
Value
Pearson Chi-Square
Asymp. Sig.
(2-sided)
df
.382(b)
1
.536
Continuity
Correction(a)
.221
1
.638
Likelihood Ratio
.383
1
.536
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
.556
.320
Fisher's Exact Test
Linear-by-Linear
Association
.380
N of Valid Cases
201
1
.537
a Computed only for a 2x2 table
b 0 cells (.0%) have expected count less than 5. The minimum expected count is 33.08.
Symmetric Measures
Value
Nominal by
Nominal
Phi
Cramer's V
N of Valid Cases
.044
.044
Approx.
Sig.
.536
.536
201
a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
Correlation analysis
Correlation analysis is concerned with associations
between variables, for example:
• Does the introduction of performance management
techniques to specific groups of workers improve morale
compared to other groups? (Relationship: performance
management/morale.)
• Is there a relationship between size of company
(measured by size of workforce) and efficiency
(measured by output per worker)? (Relationship:
company size/efficiency.)
• Do measures to improve health and safety inevitably
reduce output? (Relationship: health and safety
procedures/output.)
Perfect positive and perfect
negative correlations
Highly positive correlation
Strength of association based
upon the value of a coefficient
Correlation figure
0.00
0.01-0.09
0.10-0.29
0.30-0.59
0.60-0.74
0.75-0.99
1.00
Description
None
Negligible
Weak
Moderate
Strong
Very strong
Perfect
Calculating a correlation for a
set of data
We may wish to explore a relationship when:
• The subjects are independent and not chosen
from the same group.
• The values for X and Y are measured
independently.
• X and Y values are sampled from populations
that are normally distributed.
• Neither of the values for X or Y is controlled (in
which case, linear regression, not correlation,
should be calculated).
Associations between two
ordinal variables
For data that is ranked, or in circumstances
where relationships are non-linear,
Spearman’s rank-order correlation
(Spearman’s rho), can be used.
Spearman’s rho data set
Statistical output
Correlations
MrJones
Spearman's rho
MrJones
Correlation Coefficient
Sig. (2-tailed)
N
MrsSmith
Correlation Coefficient
Sig. (2-tailed)
N
** Correlation is significant at the 0.01 level (2-tailed).
MrsSmith
1.000
.
30
.779(**)
.000
30
.779(**)
.000
30
1.000
.
30
Association between numerical
variables
We may wish to explore a relationship when there
are potential associations between, for example:
• Income and age.
• Spending patterns and happiness.
• Motivation and job performance.
Use Pearson Product-Moment (if the
relationships between variables are linear).
If the relationship is  or -shaped, use
Spearman’s rho.
Pearson Product-Moment data
set
Relationship between variables
180.00
160.00
Sales
140.00
120.00
100.00
80.00
20.00
30.00
40.00
50.00
Rainfall
60.00
70.00
Statistical output
Descriptive Statistics
Mean
Rainfall
Sales
Std.
Deviation
N
48.17
11.228
30
132.47
28.311
30
Correlations
Rainfall
Rainfall
Pearson Correlation
1
Sig. (2-tailed)
N
Sales
Pearson Correlation
Sig. (2-tailed)
N
Sales
-.813(**)
.000
30
-.813(**)
30
1
.000
30
** Correlation is significant at the 0.01 level (2-tailed).
30
Summary
• Inferential statistics are used to draw conclusions from
the data and involve the specification of a hypothesis
and the selection of appropriate statistical tests.
• Some of the inherent danger in hypothesis testing is in
making Type I errors (rejecting a hypothesis when it is, in
fact, true) and Type II errors (accepting a hypothesis
when it is false).
• For categorical data, non-parametric statistical tests can
be used, but for quantifiable data, more powerful
parametric tests need to be applied. Parametric tests
usually require that the data are normally distributed.