Computer Use Classes Accompanying Statistics Lectures 2007-2008: Coordinator: Prof. Alan Pickering; Tutors: Ian Tharp; Elizabeth Ward Weeks 1 -5 Autumn Term: Revision of Basic Concepts Time and venue: Monday 4-5.15 pm in WB304 (Psychology Department Computer Classroom) for postgraduate students and Tuesday 4-5pm in same venue for undergraduate students For students on the following courses: Computer Use (PS71021A) MSc Research Methods in Psychology Computer Use (PS81021A) Research Training for 1st Year PhD students Advanced Statistics (PS71015A) MSc Occupational Psychology Statistics (PS71042A) MSc Cognitive and Clinical Neuroscience Optional for: Multivariate Statistical Methods in Psychology (PS53011B) BSc Psychology WEEK 1: Descriptive Statistics Learning Outcomes: After the session students should be able to: 1. 2. 3. 4. 5. 6. Use SPSS to generate artificial datasets (with specific statistical properties) using the COMPUTE command. Use SPSS HELP to find out how specific SPSS COMPUTE functions are used. Use SPSS to report descriptive statistics (i.e., measures of central tendency and dispersion) for a variable. Use SPSS to plot histograms of variables. Use SPSS to test whether distributions of variables are normal. Use SPSS to look at frequency distribution tables for a variable and to inspect extreme values in a variable’s distribution. Dataset To be Used: Via Goldsmiths computer network: J:\psycholo\APstats\comp_classes\dicedata.sav or via web: http://homepages.gold.ac.uk/aphome/basestat.html Specific Tasks and Questions: (i) Use SPSS to generate a dataset of 500 observations which could represent the kind of data one would obtain by rolling a fair dice 500 times. You will create a variable called dicescor containing integer values (i.e. whole numbers) from the set {1,2,3,4,5,6}, with each integer value occurring approximately equally often (a so-called random uniform distribution). The COMPUTE function: The COMPUTE function (accessed through the Transform dropdown menu) can be used to create data in SPSS datasets. Usually, this function would be employed to create a “derived” score from variables already manually entered into the dataset. However, you can also create simulated data from scratch, as we will do today. Note that the COMPUTE function will not work without some data in the dataset already (because it creates scores for the number of rows -- subjects -- that are already in the dataset). Try the COMPUTE function on an empty dataset to see what happens. 1 Using COMPUTE, how could you generate a variable called group for 40 subjects, with a value of 1 for each subject? Answer: To generate “dice” scores: Load the prepared dataset called dicedata.sav into SPSS. This dataset already contains a dummy variable with 500 observations, which means that any new variable created by COMPUTE will have 500 observations. Use the name dicescor for the dice score variable you will create. To generate random variables (RVs): You can generate random variables in SPSS via the COMPUTE command. First set the value of the random number seed (or “starting point”) to 1000 using the RANDOM NUMBER GENERATORS command from the Transform menu. SPSS has a number of random variable functions available to the COMPUTE command, and they all look like this RV.xxxxx. Because the expected distribution of dice scores is uniform, the function you will need is RV.UNIFORM. Try out using the RV.UNIFORM function with the COMPUTE command to see how the function works and what kind of values it generates. To get integer values in the range 1-6 inclusive you will need to carry out further COMPUTE operations. The arithmetic function TRUNC (truncate values) will be handy. Why? What is the role of the random number seed? Answers: Using HELP to find out how to use COMPUTE functions: You can find out about all aspects of SPSS through the TOPICS command on the Help dropdown menu. You need to select the “Index” tab on the window and type in the item about which you need information. For the various functions available for the COMPUTE command type “functions” and select the appropriate type. (ii) For the variable dicescor: Use the “Frequencies” option of the SPSS DESCRIPTIVE STATISTICS command from the Analyse dropdown menu to look at the frequencies of dice scores. Record the frequencies in the box below. Dice score 1 2 3 4 Frequency (iii) For the variable dicescor: Write down what you expect the mean to be? Write down what you expect the median to be? Write down what you expect the mode to be? 2 5 6 (iv) Use the SPSS DESCRIPTIVE STATISTICS command from the Analyse dropdown menu to calculate descriptive statistics (measures of central tendency and dispersion). Hint: The “Descriptives” option of the DESCRIPTIVE STATISTICS command on the Analyse dropdown menu will not enable you to find the median value or mode of a variable; one of the other options will give you everything you need. Write down the following values that you obtained for dicescor: Number of Observations (N): Mean: Median: Mode: Variance (Var): Standard Deviation (S.D.): Standard Error (S.E.) of mean: What is the expected value for the mean of this sample? [As a check, the values for the random sample of 500 observations that you generated should be close to the expected values.] (v) Create a random normal variable called randnvar with a standard normal distribution (mean = 0; s.d. = 1). You can do this with the COMPUTE command and the RV.NORMAL function. (vi) Use the HISTOGRAM command on the Graphs menu to plot a histogram of randnvar. Are the descriptive statistics for this variable close to the intended values? You can get another interesting histogram from the “Frequencies” option of the DESCRIPTIVE STATISTICS command. You should click on the “Charts” button and select plot histogram with normal curve. (vii) Test the normality of randnvar’s distribution. To do this you can use the “Explore” option of the DESCRIPTIVE STATISTICS command on the Analyse menu. Click on the “Plots” button. The “Boxplots” and “Stem-and-leaf” options should be selected by default. Also select the “Normality plots with tests” option. You also should click on the “Statistics” button and select the “Outliers” option to look at the 5 largest and 5 smallest values of randnvar. (viii) Use the COMPUTE command to find out what percentage of the sample of randnvar lies below -2.0 and what percentage lies above +2.0. To do this, you will need to create 2 new variables which have a value of 1 when randnvar lies below -2.0 (or above +2.0) and have a value of 0 otherwise. What are the percentages and their expected values? Below -2.0: Actual % Expected % Above +2.0: Actual % Expected % (ix) Use SPSS to find the 95th percentile of the randnvar variable? Use the Frequencies option of the DESCRIPTIVE STATISTICS command. What is the value of this statistic? Do you understand why SPSS reports the value it does? 3