Powerpoint

advertisement
Topic 2
Summarising data /
Levels of measurement /
Introduction to SPSS
Main Issues for this session

Levels of measurement
 Data
types: nominal, ordinal, interval, ratio
Linking data types to statistical analyses
 Introduction to SPSS

2
Reading
Chapter 2 and Chapter 3
Frequency Distributions and Graphic
Representation
Fundamentals of Statistical Reasoning in Education,
Colardarci et al.
3
Levels of measurement
Sequential
Magnitude
Zero point
Nominal
Example
Descriptive
Statistics
Percent, ratio,
frequency
Ordinal
x
Interval
x
x
Arbitrary
Ratio
x
x
Absolute
Mean, SD, Min,
Max
4
Preparing a questionnaire and
codebook
Example questionnaire: WB_Pupil_MP.doc
 Example codebook: Pupil_codebooks.xls
 Example codebooks:

http://pisa2006.acer.edu.au/downloads.php
5
Codebook - 1
A codebook should be prepared as a
questionnaire is developed
 The purposes of a codebook are

 To
facilitate data entry, with codes shown on
the questionnaire if possible
 To plan for analysis; to help with determining
the types of analyses that are appropriate.
6
Codebook - 2
Numeric codes are easier to enter than
alphabetic codes
 Consider the appropriate field width and
range of answers. These can be useful
feedback to questionnaire design as well.
 Decide how to handle missing responses

7
Getting data into SPSS - 1
The EXCEL file Puipl_data.xls
contains
the pupil questionnaire data
 Import this data set into SPSS:
 Start SPSS

8
Getting data into SPSS - 2

Select from Menu
 File
-> Open -> Data
9
Getting data into SPSS - 3


Find the folder where the EXCEL file is stored.
In the file open dialog box, make sure the file type
is set to xls.
Select file
Pupil_data.xls
File type set
to “xls”
10
Getting data into SPSS - 4

Make sure the check box for “Read variable names from
the first row of data” is checked. (The EXCEL file has
variable names in the first row, and these will be read in
as SPSS variable names as well.
Check this box
11
Toggle between data view and
variable view

The tab at the bottom left corner shows
the data view or variable view.
Data view or
Variable view
12
Add Variable labels for variables 4 to 9
(PDOBDD to PHOMLANG)
Variable label
13
Add Value labels for variable PSEX
(The column after Variable Labels).
 Click in the value labels cell and the
following dialog box appears

14
Add missing values for variable
PSEX



(The column after Value Labels)
Click in the Missing values cell and a dialog box
appears.
Enter values representing missing values
15
Practice for other variables
Set variable labels, value labels and
missing values for some other variables
 Copy and pasting value labels and missing
values from a set of cells to other cells can
be done.
 Make sure you save the file often!!

16
Frequencies
For which types of variables, will it be
appropriate to compute frequencies?
Nominal, ordinal, interval and ratio?
 For which types of variables, will it be
appropriate to compute averages?
Nominal, ordinal, interval and ratio?

17
Compute frequencies in SPSS -1
Select from menu
 Analyze -> Descriptive Statistics ->
Frequencies

18
Compute frequencies in SPSS -2
Select the variables in the left-hand box
and move them to the right-hand box.
 Press OK.

19
Compute frequencies in SPSS -3
Explore the options under the Statistics
and Charts buttons, and see what kinds of
output you can produce.
 Compute frequencies for other variables
as a practice.

20
Constructs in a questionnaire - 1



Sometimes we are interested in a measure that
is not directly obtainable/observable as
questions like “are you a boy or a girl”.
For example, socio-economic status is
something that we have an interest in, but it is a
concept (like well-being) rather than something
that we can see and directly measure.
Such concepts are often called constructs, or
latent variables.
21
Constructs in a questionnaire - 2
Sociologists and statisticians have
developed methodologies to measure
constructs (or latent variables).
 Psychometrics is the science of the
measurement of latent variables.
 The field of psychometrics include
classical test theory (CTT) and item
response theory (IRT)

22
Constructs in a questionnaire - 3
To measure a construct, typically a
number of observable indicators are
collected (e.g., through a questionnaire).
 The data from these indicators are
aggregated in some way (e.g., to form a
total score) to be used as a measure of
the construct for each individual.

23
Constructs in a questionnaire - 4



A simple way to aggregate the indicators into a
measure for a construct is just to sum the scores
for the set of questions for each student.
These sums (or measures of the constructs) can
then be used as new variables as the basis of
further statistical analysis.
There are more sophisticated ways to aggregate
the indicator scores into a construct score (e.g,
using item response theory models).
24
Constructs in a questionnaire - 5
In SPSS, calculate sum scores for each
construct you identified, for each student.
 You can then use these new variables for
further analyses.
 Watch animated demo on how to compute
sum scores.
 HowToComputeSumScores_demo.swf

25
Outline

Categorical variables (ordinal and nominal)

Continuous variables (interval and ratio)
26
Download from subject website

Data file from TIMSS 2003 study for
Australia
TIMSS2003AUS.sav
 Student Questionnaire from TIMSS 2003
study for Australia
T03_Student_8.pdf
27
Categorical data

Nominal - numbers are used only as labels for
different objects within a set. For example,
 gender
 idbook (there

are 12 different test booklets)
Ordinal - numbers are used to reflect the rank
order of objects within a set according to a
specific criterion
 bsbgbook (number of books in
 bsbgmfed (mother’s education
the home)
level)
28
Summary of categorical variables

In general, summary of categorical variables addresses
the questions:



How many categories?
How many cases in each category or What are the proportions of
cases in each of the categories?
If a variable is ordinal, questions regarding trends and
association can be considered.

Examples:

For data file TIMSS2003AUS.sav, the possible questions
could be:



What are the proportions of female and male students in the
study?
What are the levels of education of parents for the students
surveyed?
Is there an association between levels of education of parents
and number of books in the home?
29
Hands-on (1)
Are there more girls than boys?
 Is there an association between Father’s
education level and the number of books
at home?
 Follow animated demo

 frequency_1_demo
 frequency_2_demo
 Explore_1_demo
 Explore_1_output_demo
30
Hands-on (2)
Is there a difference between girls and
boys in terms of whether they enjoy
mathematics (variable bsbmtenj)?
 Follow animated demo
 Crosstab_1_demo
 Crosstab_1_output_demo

31
Hands-on (3)

Is there a difference between girls and
boys in terms of whether they enjoy
SCIENCE (variable bsbstenj, (var 67))?
32
Things to watch out for in
comparing frequencies - 1
Consider if you should compare raw
frequencies or percentages.
 For percentages, make sure the
denominator (total) is the appropriate one
to use. For example, check row total,
column total, overall total.
 Check the scale to make sure there is no
exaggeration of differences

33
Things to watch out for in comparing frequencies –
Raw score or percentage?
34
Things to watch out for in comparing frequencies –
Raw score or percentage?

Percentages are better because there are
many more students speaking the test
language at home than those who do not.
35
Things to watch out for in comparing
frequencies – Check magnitude of scale


The graph on the right shows large differences. But
check the scale on the vertical axis. There are only a
few students. We can’t say there is a great difference.
Beware of visual deception.
36
Continuous data

Interval - numbers reflect both the rank
order of objects and the extent of the
differences between them (e.g.
temperature)

Ratio - scale has an absolute zero and
hence a ratio of scores is independent of
the units of the scale (e.g. height, weight,
age. )
37
Summary of continuous variables
Example of Questions
1.
2.
3.
4.
5.
6.
7.
What is the average score that the students
surveyed get?
What is the middle score? (median)
Which is the most frequent score? (mode)
What is the highest score ? (max)
What is the lowest score? (min)
What is the range of students’ scores?
(range)
To what extent are the scores close to the
mean? (variance and standard deviation)
38
Mean and Median

Mean (average, expected value)
 Sum

observations / number of observations
Median
 50%
subjects below and 50% subjects above
39
Variance and Standard
deviation
variance  
i
 xi   
2
n 1
Where µ is the mean, and n is the number of
observations.
standard deviation= variance
40
Normal Distribution

Many variables have a distribution shaped
like a bell curve.
41
Example descriptive statistics
Variable 154 (bsmmat01) is an estimate of
a student’s mathematics achievement.
 Follow animated demo:

 descriptive_1_demo
42
Histogram of continuous variable
Frequency analysis and bar charts may
fail because there are too many
categories.
 Use histogram.
 Variable 154 (bsmmat01) is an estimate of
a student’s mathematics achievement.
 Follow animated demo:

 histogram_1_demo
43
Compare histograms for groups
Compare mathematics achievement
distributions between groups based on
father’s education level.
 Follow animated demo:

 histogram_2_demo
44
Box-Plots

Box-plots are graphical representations of the
data in a five-number summary with the
addition of ‘cutoffs’ or ‘fences’ for the
identification of possible outliers (individual data
points are plotted beyond the fences if they occur)
45
Box plot for mathematics
achievement

Follow animated demo:
 boxplot_1_demo
 boxplot_2_demo
46
Output of Box-plot of mathematics
scores
47
Output of Box-plot of mathematics
scores by father’s education level
48
Parametric and Non-parametric

Mean and Median
Mean: average
 Median:

at the 50th percentile.
 The middle value
 score
49
Mean and Median
If the distribution of scores is symmetrical,
the mean and median will be close.
 If the distribution is skewed, then the mean
and median will be quite different.
 Mean is sensitive to outliers
 Median is not sensitive to outliers
 Example: income distribution

50
Examples of income distribution
What will be the mean?
 What will be the median?

51
Robust statistics
The mean will be much higher than the
median, because there are four people
with very high salaries.
 The median will not shift if the four highest
salaries are in the 150K range instead of
the 280 range, but the mean will change
by a great deal.
 The median is said to be “robust”.

52
Percentile Rank

The percentile rank of a raw score s, is
the percentage of people whose scores
are less than or equal to s. Example:
Raw
12
14
28
34
47
50
Rank
1
2
3
4
5
6
2/6
3/6
4/6
5/6
6/6
%Ran 1/6
k
53
Advantages and disadvantage
of percentile ranks
Simple to communicate.
 More “robust” (not affected by extreme
scores in the distribution)
 Raw scores turned into Ranks:

 reduce
raw scores to ordinal measurement.
Percentile ranks have uniform distribution,
not normal.
 Percentile differences in the middle of the
score range can exaggerate small
differences.

54
Compute percentile ranks in SPSS
Compute percentile ranks using
mathematics achievement score
 Follow animated demo:
 percentile_1_demo

55
Histogram of percentile ranks
Do a histogram of percentile ranks, what
do you see?
 Plot a scatter graph of mathematics
achievement (variable 154) with the newly
created variable of percentile ranks.
 How do you interpret the graph?

56
Download