Computer Use for Statistics - Goldsmiths Homepages Server

advertisement
Computer Use Classes Accompanying Statistics Lectures 2007-2008:
Coordinator: Prof. Alan Pickering; Tutors: Ian Tharp; Elizabeth Ward
Weeks 1 -5 Autumn Term: Revision of Basic Concepts
Time and venue: Monday 4-5.15 pm in WB304 (Psychology Department Computer Classroom) for
postgraduate students and Tuesday 4-5pm in same venue for undergraduate students
For students on the following courses:
Computer Use (PS71021A) MSc Research Methods in Psychology
Computer Use (PS81021A) Research Training for 1st Year PhD students
Advanced Statistics (PS71015A) MSc Occupational Psychology
Statistics (PS71042A) MSc Cognitive and Clinical Neuroscience
Optional for:
Multivariate Statistical Methods in Psychology (PS53011B) BSc Psychology
WEEK 1: Descriptive Statistics
Learning Outcomes:
After the session students should be able to:
1.
2.
3.
4.
5.
6.
Use SPSS to generate artificial datasets (with specific statistical properties) using the
COMPUTE command.
Use SPSS HELP to find out how specific SPSS COMPUTE functions are used.
Use SPSS to report descriptive statistics (i.e., measures of central tendency and dispersion)
for a variable.
Use SPSS to plot histograms of variables.
Use SPSS to test whether distributions of variables are normal.
Use SPSS to look at frequency distribution tables for a variable and to inspect extreme
values in a variable’s distribution.
Dataset To be Used:
Via Goldsmiths computer network: J:\psycholo\APstats\comp_classes\dicedata.sav
or via web: http://homepages.gold.ac.uk/aphome/basestat.html
Specific Tasks and Questions:
(i)
Use SPSS to generate a dataset of 500 observations which could represent the kind of data
one would obtain by rolling a fair dice 500 times. You will create a variable called dicescor
containing integer values (i.e. whole numbers) from the set {1,2,3,4,5,6}, with each integer
value occurring approximately equally often (a so-called random uniform distribution).
The COMPUTE function: The COMPUTE function (accessed through the Transform
dropdown menu) can be used to create data in SPSS datasets. Usually, this function would be
employed to create a “derived” score from variables already manually entered into the dataset.
However, you can also create simulated data from scratch, as we will do today. Note that the
COMPUTE function will not work without some data in the dataset already (because it creates
scores for the number of rows -- subjects -- that are already in the dataset). Try the COMPUTE
function on an empty dataset to see what happens.
1
Using COMPUTE, how could you generate a variable called group for 40 subjects, with a value of
1 for each subject?
Answer:
To generate “dice” scores: Load the prepared dataset called dicedata.sav into SPSS.
This dataset already contains a dummy variable with 500 observations, which means that any new
variable created by COMPUTE will have 500 observations. Use the name dicescor for the dice
score variable you will create.
To generate random variables (RVs): You can generate random variables in SPSS via the
COMPUTE command. First set the value of the random number seed (or “starting point”) to 1000
using the RANDOM NUMBER GENERATORS command from the Transform menu. SPSS has
a number of random variable functions available to the COMPUTE command, and they all look
like this RV.xxxxx. Because the expected distribution of dice scores is uniform, the function you
will need is RV.UNIFORM. Try out using the RV.UNIFORM function with the COMPUTE
command to see how the function works and what kind of values it generates.
To get integer values in the range 1-6 inclusive you will need to carry out further COMPUTE
operations. The arithmetic function TRUNC (truncate values) will be handy. Why? What is the
role of the random number seed?
Answers:
Using HELP to find out how to use COMPUTE functions: You can find out about all aspects of
SPSS through the TOPICS command on the Help dropdown menu. You need to select the “Index”
tab on the window and type in the item about which you need information. For the various
functions available for the COMPUTE command type “functions” and select the appropriate type.
(ii)
For the variable dicescor:
Use the “Frequencies” option of the SPSS DESCRIPTIVE STATISTICS command from the
Analyse dropdown menu to look at the frequencies of dice scores. Record the frequencies in
the box below.
Dice score
1
2
3
4
Frequency
(iii) For the variable dicescor:
Write down what you expect the mean to be?
Write down what you expect the median to be?
Write down what you expect the mode to be?
2
5
6
(iv) Use the SPSS DESCRIPTIVE STATISTICS command from the Analyse dropdown menu to
calculate descriptive statistics (measures of central tendency and dispersion).
Hint: The “Descriptives” option of the DESCRIPTIVE STATISTICS command on the
Analyse dropdown menu will not enable you to find the median value or mode of a variable;
one of the other options will give you everything you need.
Write down the following values that you obtained for dicescor:
Number of Observations (N):
Mean:
Median:
Mode:
Variance (Var):
Standard Deviation (S.D.):
Standard Error (S.E.) of mean:
What is the expected value for the mean of this sample?
[As a check, the values for the random sample of 500 observations that you generated
should be close to the expected values.]
(v)
Create a random normal variable called randnvar with a standard normal distribution (mean
= 0; s.d. = 1). You can do this with the COMPUTE command and the RV.NORMAL function.
(vi) Use the HISTOGRAM command on the Graphs menu to plot a histogram of randnvar. Are
the descriptive statistics for this variable close to the intended values? You can get another
interesting histogram from the “Frequencies” option of the DESCRIPTIVE STATISTICS
command. You should click on the “Charts” button and select plot histogram with normal
curve.
(vii) Test the normality of randnvar’s distribution. To do this you can use the “Explore” option of
the DESCRIPTIVE STATISTICS command on the Analyse menu. Click on the “Plots”
button. The “Boxplots” and “Stem-and-leaf” options should be selected by default. Also
select the “Normality plots with tests” option. You also should click on the “Statistics”
button and select the “Outliers” option to look at the 5 largest and 5 smallest values of
randnvar.
(viii) Use the COMPUTE command to find out what percentage of the sample of randnvar lies
below -2.0 and what percentage lies above +2.0. To do this, you will need to create 2 new
variables which have a value of 1 when randnvar lies below -2.0 (or above +2.0) and have a
value of 0 otherwise.
What are the percentages and their expected values?
Below -2.0:
Actual %
Expected %
Above +2.0:
Actual %
Expected %
(ix) Use SPSS to find the 95th percentile of the randnvar variable? Use the Frequencies option of
the DESCRIPTIVE STATISTICS command.
What is the value of this statistic?
Do you understand why SPSS reports the value it does?
3
Download