Projects Involving Statistics (& SPSS)

advertisement
Academic Skills Advice
Projects Involving Statistics (& SPSS)
Starting a project which involves using statistics can feel confusing as there seems to be
many different things you can do (charts, graphs, tests etc) and different ways of looking at
your data. This summary provides suggestions for getting started when using SPSS for
your project.
Using Statistics:
You can use statistics to: look at what has already happened,
draw conclusions,
predict what is likely to happen in the future.
Before you start:
You need to know exactly what your question is – what do you want
to know and why? Only collect data that will help you to answer your
question.
You will need to decide what type of sampling to use.
(You will probably already use sampling in real life, for example you might sample a small cube of
cheese in the supermarket and come to the conclusion that you will enjoy the whole block of
cheese.)
Experiment:
 Look at what’s already happened
 Collect, describe and organise your data
 Look at averages, spread, shape etc.
Predict:


Make general conclusions about the whole population based on your sample
Test your data and use your results to predict what might happen in the
future.
© H Jackson 2012 /2014/ Academic Skills
1
A brief overview of SPSS:
SPSS is a useful stats package which helps you to analyse your data and draw
conclusions from it.
It has 2 different windows:
the input window, where you set up and enter your data,
the output window, which appears every time you ask SPSS to do
something (e.g. produce a report or chart).
The Input window has 2 tabs:
the variable view (for specifying what type of data should be entered)
the data view (for inputting your data (like a spreadsheet)).
A suggested order to follow:
Experiment:

Set up your variables in the variable view tab.

Enter your data in the data view tab (1 row = 1 case (e.g. 1 subject’s data)).

It’s often good to start with “Explore” as this can give you ideas about your data and
where to start (analyse / descriptive statistics / explore).

Generate any appropriate charts or graphs that help to see what is happening with
your data (e.g. bar charts to compare frequencies, box plots to compare distributions).

Look at descriptive statistics (e.g. mean, max, min, standard deviation etc) and make
comparisons (decide what it tells you).

If your questions are about the relationships between data look at scatter graphs,
correlation, crosstabs, regression etc.
Predict:
Once you have done all the comparisons and drawn some conclusions you need to decide
how likely your results are to happen again in the future. You can test your idea
(hypothesis) by doing hypothesis testing:

Check for normality to see what sort of data you have and, therefore, which tests can
be performed (using histograms, Normal Q-Q plot and Kolmogorov Smirnov or Shapiro
Wilk). This helps to decide if parametric tests are appropriate.

Decide on the test to use (see decision making flow chart). The test you choose will
tell you how significant your results are and whether they are likely to happen again or
if they are just due to chance.
© H Jackson 2012 /2014/ Academic Skills
2
Some useful information:
The Null Hypothesis (often denoted 𝐻0 ): is the assumption that what you were testing is
not true and that things just happened by chance. Instead of trying to prove that your idea
is right you will be trying to prove that the null hypothesis is probably wrong.
E.g. of null hypothesis: 𝐻0 = there is no difference between the means (𝜇1 = 𝜇2 ).
The Alternative Hypothesis (often denoted 𝐻𝐴 or 𝐻1 ): this is your idea – what you think is
true. You have to assume this is wrong until you find evidence to say otherwise.
E.g. of alternative hypothesis: 𝐻𝐴 = there is a difference between the means (𝜇1 ≠ 𝜇2 ).
The ‘p’ value is the common name for the ‘sig’ value produced by the various tests. It is
“the probability of obtaining your results if 𝐻0 is true”. The ‘p’ value provides evidence for
us to decide whether we can reject 𝐻0 . Commonly if 𝒑 ≤ 𝟎. 𝟎𝟓 then you reject 𝑯𝟎 and
accept 𝑯𝑨 (normally this means that you have found a significant difference at the 5%
level).
If your test statistic falls in the critical region you would reject 𝐻0 .
When SPSS reports test statistics (e.g. ‘f’, ‘t’ value etc.) we tend to just look at the
significance (‘p’) value because SPSS has done all the hard work for us and found the
probability corresponding to the test statistic (we used to have to look this up in tables).
Generally if p<0.05 we “reject the null hypothesis” otherwise we say “there is not
enough evidence to reject the null hypothesis” (we cannot say that we “accept the null
hypothesis”). However, we can “accept the alternative hypothesis”.
An example scenario:
You have done some investigating and think that the population of the village you live in is
above average intelligence. If the average intelligence rating is 100 then your hypothesis
is that your village >100.
Remember that the null hypothesis says that you are wrong and everything is equal:
𝐻0 :
your village population has average intelligence (𝜇 = 100).
𝐻1 :
your village population has above average intelligence (𝜇 > 100).
You run an appropriate test and SPSS reports a ‘p’ value of 0.0065 (this is equivalent to
0.65%, i.e. less that 1%).
The ‘p’ value is less than 0.05 so you will reject the null hypothesis and report that you are
confident that your village has above average intelligence.
(The ‘p’ value is saying that if 𝐻0 is true then there is only a 0.65% chance of your data
“happening”. This is such a small chance that we conclude that 𝐻0 must not be true and reject it.)
© H Jackson 2012 /2014/ Academic Skills
3
Academic Skills Advice
Hypothesis Test Decision Making Flow Chart
Categorical
Data type?
Chi-Squared test
(one sample or two sample)
Continuous
Questions about relationships or
difference between outcome means?
Relationships
More than 2
Differences
How many groups?
(2 or more?)
How many variables?
Multiple
Regression
Analysis
2
2
Correlation Analysis
(Parametric or Non-Parametric?)
Parametric
Pearson’s r or
simple regression
Non-Parametric
Spearman’s Rank
Correlation
Parametric or Non-Parametric?
Parametric
T-test
(independent or
paired)
Non-Parametric
Mann-Whitney U Test (different groups)
Wilcoxin’s Rank Sums Test (same group)
More than 2
Parametric or Non-Parametric?
Parametric Non-Parametric
ANOVA
Please note that this flow chart is neither definitive nor exhaustive. There are other tests, and approaches
and you should bear this in mind with regard to your own data.
© H Jackson 2012 /2014/ Academic Skills
4
Kruskal-Wallis Test
Glossary of terms:
There are lots of technical (and sometimes confusing) terms used in statistics. This glossary is to give a basic idea, in “layman’s terms”,
of what things mean.
Categorical data
Data that can only take certain values – there are gaps between. E.g. shoe size (you can buy size 5 or
5 ½ but not size 5.3), counting (we count, 1, 2, 3, etc.), people in a class (there may be 22 or 23 but
not 22.5).
Continuous data
Data that can take any value – there are no gaps between it. E.g. height (a person can be 150cm or
150.1cm (or even 150.15cm)).
Differences
Looking at the difference between the means (averages) of different sets of data.
Non-parametric data
Does not satisfy the assumptions of parametric data.
Parametric data
A large part of this is checking that the data is normally distributed.
Strictly speaking we should also check that the data:
 Has the same variances
 Is at least interval (scale) level
 Is independent (e.g. answers from one participant do not affect another. Independent data such
as one person doing the same test twice is OK)
Relationships
Looking at the relationship between sets of data, for example does one set of data increase as the
other increases, or vice versa? Or is there no pattern (relationship)? This is often called correlation.
𝝁
The arithmetic mean (average)
© H Jackson 2012 /2014/ Academic Skills
5
Download