Analysis of Data for Measuring Food Availability, Access

advertisement
In collaboration with
Supported by
National Food Policy Capacity Strengthening Programme
Topic 4
STATA and SPSS: Introductory User Guide
Training Workshop on
Analysis of Data for Measuring Food Availability,
Access and Nutritional Status
14-26 January 2012
Bangladesh Academy for Rural Development,
Comilla, Bangladesh
R. Srinivasulu
Centre for the Study of Regional Development (CSRD)
Jawaharlal Nehru University (JNU)
New Delhi 110067
India
1
Topic 4: Introduction to SPSS and STATA Software packages
The objective of Topic 4 is to outline the basic characteristics of SPSS and /STATA
software packages, and their relative advantages and disadvantages.
4.1 Basic Design Features of the SPSS Software Package
SPSS (Statistical Package for the Social Sciences) is a statistical analysis and data
management software package. One can extract data through SPSS from any type of
file format. SPSS can generate tabulated reports, charts, and plots of distributions and
trends, descriptive statistics, and conduct complex statistical analyses. “SPSS originally
stood for “Statistical Package for the Social Sciences”, but the name has since been
changed to reflect the marketing of SPSS outside the academic community” (HarvardMIT data center). SPSS is a widely used package where researchers perform
quantitative research in social science because it is easy to use and can be a good
starting point to learn more advanced statistical packages. Researchers can perform
syntax by using syntax editor in SPSS.
The following sections concentrate on menu systems, type of windows and data
manipulation. In the new version, SPSS 17.0, the syntax editor has been completely
redesigned with features such as auto-completion, colour coding, bookmarks, and
breakpoints. Auto-completion provides you with a list of valid command names,
subcommands, and keywords; so you will spend less time referring to syntax charts.
Colour coding allows you to quickly spot unrecognized terms as well as some common
syntactical errors. Bookmarks allow you to quickly navigate large command syntax
files. Breakpoints allow you to stop execution at specified points so you can inspect
data or output before proceeding. Most SPSS users prefer to use its windows graphic
interface that is, pointing with the mouse and clicking on the options they want.
However, if someone wants to have greater control of typing in commands tend to use
other statistical packages e.g. STATA. Nonetheless, SPSS provides a way to not only
type commands but also switch between command based and the Windows point and
click method. While this module will focus on the later, the command code will be
mentioned briefly as well.
2
4.1.1 Why Do We Use SPSS?
There are several reasons to use SPSS such as i) It is easier to handle and widely used
for descriptive statistics and basic statistical analysis ii) One can use it with either a
Windows point-and-click approach or through syntax (i.e., writing out of SPSS
commands). Each has its own advantages, and the user can switch between the
approaches. iii) Many of the widely used social science data sets come with SPSS
format; this significantly reduces the work load for transferring the data into SPSS
format. iv) SPSS is friendlier for performing simple statistical analysis. v) SPSS is
reasonably strong on ANOVA related procedure. vi) SPSS is easiest to package to
learn.
Overall, SPSS is a friendly package for novice users, but has important limitations for
experts in the field of econometrics.
4.1.2 Important Limitations
There are two important limitations identified by Harvard-MIT Data Center as follows:
Firstly, SPSS users have less control over statistical output than any other packages, for
example, STATA users. For novice users, this hardly causes a problem, but once a
researcher wants greater control over the equations or the output, she or he will need to
either choose another package or learn techniques for working around SPSS’s
Limitations.
Secondly, SPSS has problems with certain types of data manipulations and it has some
built in quirks that seem to reflect its early creation. The best known limitation is its
weak lag functions, that is, how it transforms data across cases. For new users working
off standard data sets, this is rarely a problem. However, once a researcher begins
wanting to significantly alter data sets, he or she will have to either learn a new package
or develop greater skills at manipulating SPSS.
4.1.3 The Main SPSS Window
There are six different windows that can be opened when using SPSS. The following
will give a description of each of them. The window contains separate windows such as
i) Data Editor, ii) Output Navigator, iii) Pivot Table Editor, iv) Chart Editor, v) Text
3
Output Editor and vi) Syntax Editor. It also contains tool bars, a collection of menus
and a status bar. If one of the sub-windows becomes larger than the main window, or if
it shifts outside the area of the main window, then the main window will be develop
scroll bars (see details Babu and Sanyal, 2009).
(i). Data Editor
The Data Editor is a spreadsheet in which you modify your data. Each row corresponds
to a case while each column represents a variable. The title bar displays the name of the
open data file or "Untitled" if the file has not yet been saved. This window opens
automatically when SPSS is started. This window contains 11 menus such as File, Edit,
View, Data, Transform, Analyze, Graphs, Utilities, Add-ons, Window and Help.
a. Sample File
The dataset name: “MCG_hhexpenditure_0980608.sav” can be found in the IFPRI
dataset. To open this dataset through SPSS, the following steps needs to be pursued:
b. Opening a Data File
From the menu one can choose the file through File\open\data…alternatively you
can use the Open File button on the toolbar, after that a dialog box for opening files
is displayed. By default, SPSS statistics data files (.sav extension) are displayed.
Here, we use the file MCG_hhexpenditure_0980608.sav. Once open the file through
the dialog box, the data file is displayed in the Data Editor. In the Data Editor, if
you put the mouse cursor on a variable name (the column heading), a more
descriptive variable label is displayed (if a label has been defined for that variable).
Further, to view the label one can also choose the “view” and “value labels”.
Descriptive value labels are now displayed to make it easier to interpret the
responses.
(ii). Output Navigator or Viewer
The Output Navigator window displays the statistical results, tables, and charts from
the analysis you performed. An Output Navigator window opens automatically when
you run a procedure that generates output. In the Output Navigator windows, you can
edit, move, delete and copy your results in a Microsoft Explorer-like environment.
4
The analyze menu contains a list of general reporting and statistical analysis categories.
To do statistical analysis by creating a simple frequency table (table of counts), one has
to click “Analyze\Descriptive Statistics\Frequencies.... after that, the frequencies dialog
box is displayed. An icon next to each variable provides information about data type
and level of measurement. When we click the variable Category of household
[categ_96] the complete label/name is displayed when the cursor is positioned over it
provided the variable label and/or name appears truncated in the list, otherwise we can
see only variable name. The variable name categ_96 is displayed in square brackets
after the descriptive variable label. In the dialog box, you choose the variables that you
want to analyze from the source list on the left and drag and drop them into the variable
(s) list on the right. The OK button, which runs the analysis, is disabled until at least
one variable is placed in the variable (s) list. You can also obtain additional information
by right-clicking on any variable name in the list (see details in SPSS brief guide 17.0).
The detailed analysis will be carried out in the hand-on exercise classes.
(iii). Pivot Table Editor
Output displayed in pivot tables can be modified in many ways with the Pivot Table
Editor. You can edit text, swap data in rows and columns, add color, create
multidimensional tables, and selectively hide and show results.
(iv). Chart Editor
You can modify and save high-resolution charts and plots by invoking the Chart Editor
for a certain chart (by double-clicking the chart) in an Output Navigator window. You
can change the colours, select different type of fonts or sizes, switch the horizontal and
vertical axes, rotate 3-D scatter plots, and change the chart type.
(v). Text Output Editor
Text output not displayed in pivot tables can be modified with the Text Output Editor.
You can edit the output and change font characteristics (type, style, colour, size).
(vi). Syntax Editor
You can paste your dialog box selections into a Syntax Editor window, where your
selections appear in the form of command syntax. You can open the syntax window by
clicking on file, dragging down to New, and choosing the Syntax. Secondly, type the
5
SPSS syntax that you want to run.
Finally, click on Run and drag down to All
(Alternatively, if someone want to run only a few commands, highlight those
commands, click on Run, and drag down to Selection)
4.1.4 Creating and Manipulating Data – Defining Variables, Reading Data,
Transforming Data and Creating Tables
There are various ways of creating a dataset. One can create a variable by entering the
data directly; secondly, data can be transferred from EXCEL, MS Office Access, etc to
SPSS (see the details in the SPSS 17.0 Brief Guide). File information also can be
obtained from the “file\Display Data File Information”. In addition to saving data file
in SPSS format, one can also save data in a variety of external format including excel
and other spreadsheet formats, tab-delimited and CSV text files, SAS, STATA,
database tables. The current/active dataset can be merged with other data set as well.
Further, one can also work on multiple dataset by opening at the same time in the single
window.
This activity can be performed in the syntax window also by creating
commands. In an ideal situation, your raw data is perfectly suitable for the type of
analysis you want to perform, and any relationships between variables are either
conveniently linear or neatly orthogonal. Unfortunately, this is rarely the case.
Preliminary analysis may reveal inconvenient coding schemes or coding errors, or data
transformations may be required in order to expose the true relationship between
variables. Data files are not always organized in the ideal form for your specific needs.
You may want to combine data files, sort the data in a different order, select a subset of
cases, or change the unit of analysis by grouping cases together. A wide range of file
transformation capabilities is available, including the ability to sort data, transpose case
and variables, merge files, select subsets of cases, aggregate data, weight data and
restructure data.
In the results window, by using pivot tables, the result table format can be changed
through transposing rows and columns, moving rows and columns, creating
multidimensional layers, grouping and ungrouping rows and columns, showing and
hiding rows, columns and other information. Rotating rows and columns information,
6
finding definitions of the terms (see details in SPSS brief guide 17.0). Finally, the
results can be produced in a required format.
4.2. Basic Design Features of the STATA Software Package
4.2.1 Introduction to STATA
STATA is a general-purpose statistical software package created in 1985 by
STATACorp. It is used by many businesses and academic institutions around the
world. There are four major builds of each version of STATA namely, a. STATA/MP
for multiprocessor computers, b. STATA/SE for large databases, c. STATA/IC which is
the standard version, d. Small STATA which is a smaller, student version for
educational purchase only.
STATA/MP is the fastest and largest version of STATA. Most computers purchased
since mid 2006 can take advantage of the advanced multiprocessing of STATA/MP.
This includes the Intel CoreTM 2 Duo, i3, i5, i7, and the AMD X2 dual-core chips. On
dual-core chips, STATA/MP runs 40% faster overall and 72% faster where it matters,
on the time-consuming estimation commands. With more than two cores or processors,
STATA/MP is even faster. STATA/SE, STATA/IC, and Small STATA differ only in
the dataset size that each can analyze. STATA/SE and STATA/MP can fit models with
more independent variables than STATA/IC (up to 10,998). STATA/IC allows datasets
with as many as 2,047 variables. The number of observations is limited only by the
amount of RAM in your computer. STATA/IC can have at most 798 right-hand-side
variables in a model.
Small STATA is limited to analyzing datasets with a maximum of 99 variables and
1,200 observations. Small STATA can have at most 99 right-hand-side variables in a
model. Numeric by STATA can support any of the data sizes listed above in an
embedded environment.
7
4.2.2 Computer Feature
Max. no.
of
variables
Max. no.
of righthand
variables
Max. no. of
observations
64-bit
version
available?
Fastest:
designed
for parallel
processing?
Platforms
STATA/MP
32,767
10,998
unlimited*
Yes
Yes
Windows,
Mac (64-bit
Intel), or Unix
STATA/SE
32,767
10,998
unlimited*
Yes
No
Windows,
Mac, or Unix
STATA/IC
2,047
798
unlimited*
Yes
No
Windows,
Mac, or Unix
99
99
1,200
Yes
No
Windows,
Mac, or Unix
Package
Small
STATA
*The maximum number of observations is limited only by the amount of available RAM on your system.
Source: STATA Co.
4.2.3 Requirements
Package
Memory
Disk space
STATA/MP
512 MB
500 MB
STATA/SE
512 MB
500 MB
STATA/IC
512 MB
500 MB
Small STATA
512 MB
500 MB
Source: STATA Co.
4.2.4 Why Do We Use STATA?
According to Acock (2005) i) STATA is more efficient to perform advanced statistical
analysis. ii) STATA is suitable for people those who develop and modify econometric
models. iii) STATA can easily execute advanced econometric models in less time. iv)
STATA has a command structure that is simple and consistent. v) User-developed
procedures can be installed over the Internet without leaving STATA vi) the
documentation for STATA is excellent, and the ability to download datasets that are
used in the examples in the documentation is very helpful vii) STATA can also handle
large dataset in less time by expanding the memory size.
8
4.2.5 STATA Operating System
(i). Overview
A basic overview of the STATA interface is necessary to effectively use this manual.
Open STATA, which you have already installed on your computer. Figure 1 shows the
STATA interface that should appear.
Within the main STATA window, there are four smaller windows:
1. “Results”
2. “Command”
3. “Review”
4. “Variables”
The “Results” window is where all of the procedural output will be generated and will
be referred to hereafter as the “Results window.” The “Command” window is where the
user inputs the commands that STATA will then execute and will be referred to
hereafter as the “Command window.” The “Review” window is where all previous
commands are stored and will be referred to as the “Review window.” This is
particularly useful to reprocess or to slightly modify a set of previous commands by
simply clicking the desired command in the list without having to retype or even copy
and paste each time. Finally, the “Variables” window is where the variables stored in
STATA are listed for quick reference. When you click on a variable name, STATA will
9
add that variable name to the text in the Command window so that you may avoid
typing the variable name.
(ii). Open Log File
Log file helps to retrieve the output of your work and keep a record of your work. This
file can be appended and replaced
(iii). Close Log File
(iv). Create DO-Files
Do-Files are ASCII files that contain STATA commands to run specific procedures. It
is highly recommended to use Do-Files to store your commands so that you do not have
10
to type them again. You can use any word processor and save the file in ASCII format
or you can use STATA’s ‘Do-File Editor’ with the advantage that you can run the
commands from there: “doedit” in the command window.
Click here to get DoFile Editor Window
Do-File Editor
Window
11
(v). Viewing Your Data Editor
Click here to get
Data Editor
Window
Data Editor
Window
12
(vi). View Data Browser
4.3 References
1. Shin, Kilman (1995): “SPSS Guide”, 2nd Edition, McGraw-Hill Higher Education,
October.
2. Acock, A (2005): “SAS, STATA, SPSS: A Comparison”, Journal of Marriage &
Family, 67 (4), pp. 1093-1095
3. Mitchell, M (2007): Strategically using General Purpose Statistics Packages: A
Look at STATA, SAS and SPSS, Report No. 1, Technical Report Series, UCLA
Academic Technology Services.
4. Some helpful websites concerning SPSS are:
www.cofc.edu/academic/admin/acomp/Guides/spss.html
www.spss.com
www.leeds.ac.uk/ucs/docs/beg14/beg14.html
www.nyu.edu/acf/pubs/SPSS_Win/SPSSwindoc_ToC.html
www.indiana.edu/~statmath/smdoc/index.html
5. Some helpful websites concerning STATA are:
http://fmwww.bc.edu/GStat/docs/statavspss.html
http://www.stata.com/whystata/
http://www.ats.ucla.edu/stat/stata/
Harvard-MIT Data center http://hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml
13
Download