In collaboration with Supported by National Food Policy Capacity Strengthening Programme Topic 4 STATA and SPSS: Introductory User Guide Training Workshop on Analysis of Data for Measuring Food Availability, Access and Nutritional Status 14-26 January 2012 Bangladesh Academy for Rural Development, Comilla, Bangladesh R. Srinivasulu Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi 110067 India 1 Topic 4: Introduction to SPSS and STATA Software packages The objective of Topic 4 is to outline the basic characteristics of SPSS and /STATA software packages, and their relative advantages and disadvantages. 4.1 Basic Design Features of the SPSS Software Package SPSS (Statistical Package for the Social Sciences) is a statistical analysis and data management software package. One can extract data through SPSS from any type of file format. SPSS can generate tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and conduct complex statistical analyses. “SPSS originally stood for “Statistical Package for the Social Sciences”, but the name has since been changed to reflect the marketing of SPSS outside the academic community” (HarvardMIT data center). SPSS is a widely used package where researchers perform quantitative research in social science because it is easy to use and can be a good starting point to learn more advanced statistical packages. Researchers can perform syntax by using syntax editor in SPSS. The following sections concentrate on menu systems, type of windows and data manipulation. In the new version, SPSS 17.0, the syntax editor has been completely redesigned with features such as auto-completion, colour coding, bookmarks, and breakpoints. Auto-completion provides you with a list of valid command names, subcommands, and keywords; so you will spend less time referring to syntax charts. Colour coding allows you to quickly spot unrecognized terms as well as some common syntactical errors. Bookmarks allow you to quickly navigate large command syntax files. Breakpoints allow you to stop execution at specified points so you can inspect data or output before proceeding. Most SPSS users prefer to use its windows graphic interface that is, pointing with the mouse and clicking on the options they want. However, if someone wants to have greater control of typing in commands tend to use other statistical packages e.g. STATA. Nonetheless, SPSS provides a way to not only type commands but also switch between command based and the Windows point and click method. While this module will focus on the later, the command code will be mentioned briefly as well. 2 4.1.1 Why Do We Use SPSS? There are several reasons to use SPSS such as i) It is easier to handle and widely used for descriptive statistics and basic statistical analysis ii) One can use it with either a Windows point-and-click approach or through syntax (i.e., writing out of SPSS commands). Each has its own advantages, and the user can switch between the approaches. iii) Many of the widely used social science data sets come with SPSS format; this significantly reduces the work load for transferring the data into SPSS format. iv) SPSS is friendlier for performing simple statistical analysis. v) SPSS is reasonably strong on ANOVA related procedure. vi) SPSS is easiest to package to learn. Overall, SPSS is a friendly package for novice users, but has important limitations for experts in the field of econometrics. 4.1.2 Important Limitations There are two important limitations identified by Harvard-MIT Data Center as follows: Firstly, SPSS users have less control over statistical output than any other packages, for example, STATA users. For novice users, this hardly causes a problem, but once a researcher wants greater control over the equations or the output, she or he will need to either choose another package or learn techniques for working around SPSS’s Limitations. Secondly, SPSS has problems with certain types of data manipulations and it has some built in quirks that seem to reflect its early creation. The best known limitation is its weak lag functions, that is, how it transforms data across cases. For new users working off standard data sets, this is rarely a problem. However, once a researcher begins wanting to significantly alter data sets, he or she will have to either learn a new package or develop greater skills at manipulating SPSS. 4.1.3 The Main SPSS Window There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The window contains separate windows such as i) Data Editor, ii) Output Navigator, iii) Pivot Table Editor, iv) Chart Editor, v) Text 3 Output Editor and vi) Syntax Editor. It also contains tool bars, a collection of menus and a status bar. If one of the sub-windows becomes larger than the main window, or if it shifts outside the area of the main window, then the main window will be develop scroll bars (see details Babu and Sanyal, 2009). (i). Data Editor The Data Editor is a spreadsheet in which you modify your data. Each row corresponds to a case while each column represents a variable. The title bar displays the name of the open data file or "Untitled" if the file has not yet been saved. This window opens automatically when SPSS is started. This window contains 11 menus such as File, Edit, View, Data, Transform, Analyze, Graphs, Utilities, Add-ons, Window and Help. a. Sample File The dataset name: “MCG_hhexpenditure_0980608.sav” can be found in the IFPRI dataset. To open this dataset through SPSS, the following steps needs to be pursued: b. Opening a Data File From the menu one can choose the file through File\open\data…alternatively you can use the Open File button on the toolbar, after that a dialog box for opening files is displayed. By default, SPSS statistics data files (.sav extension) are displayed. Here, we use the file MCG_hhexpenditure_0980608.sav. Once open the file through the dialog box, the data file is displayed in the Data Editor. In the Data Editor, if you put the mouse cursor on a variable name (the column heading), a more descriptive variable label is displayed (if a label has been defined for that variable). Further, to view the label one can also choose the “view” and “value labels”. Descriptive value labels are now displayed to make it easier to interpret the responses. (ii). Output Navigator or Viewer The Output Navigator window displays the statistical results, tables, and charts from the analysis you performed. An Output Navigator window opens automatically when you run a procedure that generates output. In the Output Navigator windows, you can edit, move, delete and copy your results in a Microsoft Explorer-like environment. 4 The analyze menu contains a list of general reporting and statistical analysis categories. To do statistical analysis by creating a simple frequency table (table of counts), one has to click “Analyze\Descriptive Statistics\Frequencies.... after that, the frequencies dialog box is displayed. An icon next to each variable provides information about data type and level of measurement. When we click the variable Category of household [categ_96] the complete label/name is displayed when the cursor is positioned over it provided the variable label and/or name appears truncated in the list, otherwise we can see only variable name. The variable name categ_96 is displayed in square brackets after the descriptive variable label. In the dialog box, you choose the variables that you want to analyze from the source list on the left and drag and drop them into the variable (s) list on the right. The OK button, which runs the analysis, is disabled until at least one variable is placed in the variable (s) list. You can also obtain additional information by right-clicking on any variable name in the list (see details in SPSS brief guide 17.0). The detailed analysis will be carried out in the hand-on exercise classes. (iii). Pivot Table Editor Output displayed in pivot tables can be modified in many ways with the Pivot Table Editor. You can edit text, swap data in rows and columns, add color, create multidimensional tables, and selectively hide and show results. (iv). Chart Editor You can modify and save high-resolution charts and plots by invoking the Chart Editor for a certain chart (by double-clicking the chart) in an Output Navigator window. You can change the colours, select different type of fonts or sizes, switch the horizontal and vertical axes, rotate 3-D scatter plots, and change the chart type. (v). Text Output Editor Text output not displayed in pivot tables can be modified with the Text Output Editor. You can edit the output and change font characteristics (type, style, colour, size). (vi). Syntax Editor You can paste your dialog box selections into a Syntax Editor window, where your selections appear in the form of command syntax. You can open the syntax window by clicking on file, dragging down to New, and choosing the Syntax. Secondly, type the 5 SPSS syntax that you want to run. Finally, click on Run and drag down to All (Alternatively, if someone want to run only a few commands, highlight those commands, click on Run, and drag down to Selection) 4.1.4 Creating and Manipulating Data – Defining Variables, Reading Data, Transforming Data and Creating Tables There are various ways of creating a dataset. One can create a variable by entering the data directly; secondly, data can be transferred from EXCEL, MS Office Access, etc to SPSS (see the details in the SPSS 17.0 Brief Guide). File information also can be obtained from the “file\Display Data File Information”. In addition to saving data file in SPSS format, one can also save data in a variety of external format including excel and other spreadsheet formats, tab-delimited and CSV text files, SAS, STATA, database tables. The current/active dataset can be merged with other data set as well. Further, one can also work on multiple dataset by opening at the same time in the single window. This activity can be performed in the syntax window also by creating commands. In an ideal situation, your raw data is perfectly suitable for the type of analysis you want to perform, and any relationships between variables are either conveniently linear or neatly orthogonal. Unfortunately, this is rarely the case. Preliminary analysis may reveal inconvenient coding schemes or coding errors, or data transformations may be required in order to expose the true relationship between variables. Data files are not always organized in the ideal form for your specific needs. You may want to combine data files, sort the data in a different order, select a subset of cases, or change the unit of analysis by grouping cases together. A wide range of file transformation capabilities is available, including the ability to sort data, transpose case and variables, merge files, select subsets of cases, aggregate data, weight data and restructure data. In the results window, by using pivot tables, the result table format can be changed through transposing rows and columns, moving rows and columns, creating multidimensional layers, grouping and ungrouping rows and columns, showing and hiding rows, columns and other information. Rotating rows and columns information, 6 finding definitions of the terms (see details in SPSS brief guide 17.0). Finally, the results can be produced in a required format. 4.2. Basic Design Features of the STATA Software Package 4.2.1 Introduction to STATA STATA is a general-purpose statistical software package created in 1985 by STATACorp. It is used by many businesses and academic institutions around the world. There are four major builds of each version of STATA namely, a. STATA/MP for multiprocessor computers, b. STATA/SE for large databases, c. STATA/IC which is the standard version, d. Small STATA which is a smaller, student version for educational purchase only. STATA/MP is the fastest and largest version of STATA. Most computers purchased since mid 2006 can take advantage of the advanced multiprocessing of STATA/MP. This includes the Intel CoreTM 2 Duo, i3, i5, i7, and the AMD X2 dual-core chips. On dual-core chips, STATA/MP runs 40% faster overall and 72% faster where it matters, on the time-consuming estimation commands. With more than two cores or processors, STATA/MP is even faster. STATA/SE, STATA/IC, and Small STATA differ only in the dataset size that each can analyze. STATA/SE and STATA/MP can fit models with more independent variables than STATA/IC (up to 10,998). STATA/IC allows datasets with as many as 2,047 variables. The number of observations is limited only by the amount of RAM in your computer. STATA/IC can have at most 798 right-hand-side variables in a model. Small STATA is limited to analyzing datasets with a maximum of 99 variables and 1,200 observations. Small STATA can have at most 99 right-hand-side variables in a model. Numeric by STATA can support any of the data sizes listed above in an embedded environment. 7 4.2.2 Computer Feature Max. no. of variables Max. no. of righthand variables Max. no. of observations 64-bit version available? Fastest: designed for parallel processing? Platforms STATA/MP 32,767 10,998 unlimited* Yes Yes Windows, Mac (64-bit Intel), or Unix STATA/SE 32,767 10,998 unlimited* Yes No Windows, Mac, or Unix STATA/IC 2,047 798 unlimited* Yes No Windows, Mac, or Unix 99 99 1,200 Yes No Windows, Mac, or Unix Package Small STATA *The maximum number of observations is limited only by the amount of available RAM on your system. Source: STATA Co. 4.2.3 Requirements Package Memory Disk space STATA/MP 512 MB 500 MB STATA/SE 512 MB 500 MB STATA/IC 512 MB 500 MB Small STATA 512 MB 500 MB Source: STATA Co. 4.2.4 Why Do We Use STATA? According to Acock (2005) i) STATA is more efficient to perform advanced statistical analysis. ii) STATA is suitable for people those who develop and modify econometric models. iii) STATA can easily execute advanced econometric models in less time. iv) STATA has a command structure that is simple and consistent. v) User-developed procedures can be installed over the Internet without leaving STATA vi) the documentation for STATA is excellent, and the ability to download datasets that are used in the examples in the documentation is very helpful vii) STATA can also handle large dataset in less time by expanding the memory size. 8 4.2.5 STATA Operating System (i). Overview A basic overview of the STATA interface is necessary to effectively use this manual. Open STATA, which you have already installed on your computer. Figure 1 shows the STATA interface that should appear. Within the main STATA window, there are four smaller windows: 1. “Results” 2. “Command” 3. “Review” 4. “Variables” The “Results” window is where all of the procedural output will be generated and will be referred to hereafter as the “Results window.” The “Command” window is where the user inputs the commands that STATA will then execute and will be referred to hereafter as the “Command window.” The “Review” window is where all previous commands are stored and will be referred to as the “Review window.” This is particularly useful to reprocess or to slightly modify a set of previous commands by simply clicking the desired command in the list without having to retype or even copy and paste each time. Finally, the “Variables” window is where the variables stored in STATA are listed for quick reference. When you click on a variable name, STATA will 9 add that variable name to the text in the Command window so that you may avoid typing the variable name. (ii). Open Log File Log file helps to retrieve the output of your work and keep a record of your work. This file can be appended and replaced (iii). Close Log File (iv). Create DO-Files Do-Files are ASCII files that contain STATA commands to run specific procedures. It is highly recommended to use Do-Files to store your commands so that you do not have 10 to type them again. You can use any word processor and save the file in ASCII format or you can use STATA’s ‘Do-File Editor’ with the advantage that you can run the commands from there: “doedit” in the command window. Click here to get DoFile Editor Window Do-File Editor Window 11 (v). Viewing Your Data Editor Click here to get Data Editor Window Data Editor Window 12 (vi). View Data Browser 4.3 References 1. Shin, Kilman (1995): “SPSS Guide”, 2nd Edition, McGraw-Hill Higher Education, October. 2. Acock, A (2005): “SAS, STATA, SPSS: A Comparison”, Journal of Marriage & Family, 67 (4), pp. 1093-1095 3. Mitchell, M (2007): Strategically using General Purpose Statistics Packages: A Look at STATA, SAS and SPSS, Report No. 1, Technical Report Series, UCLA Academic Technology Services. 4. Some helpful websites concerning SPSS are: www.cofc.edu/academic/admin/acomp/Guides/spss.html www.spss.com www.leeds.ac.uk/ucs/docs/beg14/beg14.html www.nyu.edu/acf/pubs/SPSS_Win/SPSSwindoc_ToC.html www.indiana.edu/~statmath/smdoc/index.html 5. Some helpful websites concerning STATA are: http://fmwww.bc.edu/GStat/docs/statavspss.html http://www.stata.com/whystata/ http://www.ats.ucla.edu/stat/stata/ Harvard-MIT Data center http://hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml 13