MICS Data Processing Workshop

advertisement
Multiple Indicator Cluster Surveys
Data Processing Workshop
SPSS general commands
Overview
MICS Data Processing Workshop
IBM SPSS Statistics
Statistical Package for the Social Sciences
• SPSS is a full-featured data analysis program that offers a
variety of applications including
data base management,
statistical analysis and
graphics
• The SPSS program runs on a wide variety of mainframe,
mini, and microcomputers
• The most recent version is SPSS 21, which runs on both
Windows , Linux and Mac OS desktop platforms
• www.ibm.com/software/analytics/spss
Data management using the SPSS
Statistics command language
• Getting Data into SPSS Statistics
• Merging data
• Aggregating data
• Weighting data
• And many more
.SAV File Extension
• Data file created by SPSS is saved in a proprietary
binary format and contains a dataset as well as a
dictionary that describes the dataset; saves data by
"cases" (rows) and "variables" (columns)
• .SAV files are can store data extracted from other
databases and Microsoft Excel spreadsheets.
• .SAV files can also save data that has been entered
manually by the user or data that has been generated
by the software
• SPSS datasets can be manipulated in a variety of ways
using the SPSS engine
Programming with SPSS Statistics
• Although many of the tasks can be
performed with the menus and dialog
boxes, some very powerful features are
available only with command syntax
Programming with SPSS Statistics
• Build and run command syntax
• Get data, add new variables, and append
cases to the active dataset
• Create new datasets
• Concurrently access multiple open datasets
• Get output results
• Create tables
Creating Command Syntax Files
.SPS File Extension
• .SPS file is a program file used by SPSS, a statistical
analysis application; saved in a plain text format and
contains instructions written using the SPSS syntax;
generally developed with the SPSS Syntax Editor;
used for manipulating datasets and automating
statistical analyses
• You can use any text editor to create a command
syntax file, but SPSS Statistics provides a number of
tools to make your job easier
Creating Command Syntax Files
• SPSS program commands follow very specific syntax
rules, which are described in various SPSS
publications:
• All commands must begin in the first column of a
line and be spelled correctly
Creating Command Syntax Files
• Most commands include additional
information (e.g., names of variables the command
is to be applied to, options for processing data,
displaying results, etc.)which may be continued on
the same line using the appropriate delimiter (e.g.,
blank space, comma, slash)
• or continued on an additional line(s) provided that
the continuation begins after column 1
Creating Command Syntax Files
• Commands can be typed in either upper or lower
case
• Most SPSS commands have default specifications,
i.e., the options that will be used unless you tell SPSS
to use something else
• Use the Paste button. Make selections from the
menus and dialog boxes, and then click the Paste
button instead of the OK button. This will paste the
underlying commands into a command syntax
window
Overview of the commands
•
•
•
•
Data definition
File interfaces
Analyze data
Modify data
Data definition
These commands:
1. bring raw data into SPSS, either from
another file, or by typing it in
yourself, and
2. enter descriptive information about
the data
Data definition
Commands:
DATA LIST
VARIABLE LABELS
VALUE LABELS
MISSING VALUES
Data list
• DATA LIST defines a raw data file (data file
containing numbers and other alphanumeric
characters) by assigning names and formats to
each variable in the file
EXAMPLE:
DATA LIST FILE='C:\MICS5\SPSS\MYHH.DAT'
RECORDS=1
Variable and value labels
• VARIABLE and VALUE LABELS commands delete all
existing variable and value labels for the specified
variable(s) and assign new variable and value labels.
• ADD VALUE LABELS can be used to add new labels or
alter labels for specified values without deleting other
existing labels.
EXAMPLE
variable labels type "Main source of drinking water".
value labels type
1 "Improved sources"
2 "Unimproved sources".
File interfaces
File interfaces commands access and save
SPSS system files
Commands:
GET FILE
SAVE OUTFILE
Get file
• GET FILE opens an SPSS data file.
• SAVE produces a data file in SPSS Statistics format,
which contains data plus a dictionary. The dictionary
contains a name for each variable in the data file
plus any assigned variable and value labels, missingvalue flags, and variable print and write formats.
EXAMPLE:
get file = 'hh.sav'.
save outfile = 'hh.sav'.
Analyze data
• Commands that actually perform statistical
analysis
EXAMPLE
frequencies
variables=hc2 hc3 hc4 hc5 hc6 hc8 hc8a
ws1 ws2 ws7
/statistics=stddev mean
/order=analysis.
Modify data
• Commands that alter data and change file
characteristics.
Commands:
COMPUTE
RECODE
IF
SELECT IF
Compute
• Creates a new variable in the dataset:
COMPUTE target variable=expression
EXAMPLE
compute persroom = 99.
if (hc2 < 98) persroom = hh11/hc2.
variable label persroom 'Persons per sleeping rooms'.
missing values persroom (99).
Recode
• RECODE changes, rearranges, or consolidates the values of an
existing variable. RECODE can be executed on a value-by-value
basis or for a range of values.
• Where it can be used, RECODE is much more efficient than the
series of IF commands that produce the same transformation.
• With RECODE, you must specify the new values.
EXAMPLE.
recode improved (100 = 1) (else = 2) into type.
variable labels WS1 "".
variable labels type "Main source of drinking water".
value labels type
1 "Improved sources"
2 "Unimproved sources".
IF
• The IF command conditionally executes one or more
transformations based on one or more logical
expressions.
EXAMPLE.
compute improved = 0.
if (WS1 = 11 or WS1 = 12 or WS1 = 13 or WS1 = 14 or WS1 = 15 or WS1 = 21 or WS1
= 31 or WS1 = 41 or
WS1 = 51) improved = 100.
if ((WS2 = 11 or WS2 = 12 or WS2 = 13 or WS1 = 14 or WS1 = 15 or WS2 = 21 or WS2
= 31 or WS2 = 41 or
WS2 = 51) and WS1 = 91) improved = 100.
variable label improved "Percentage of household population using improved
sources of drinking water ".
SELECT IF
• SELECT IF permanently selects cases for analysis based on
logical conditions that are found in the data. These conditions
are specified in a logical expression.
• For temporary case selection, it is necessary to specify a
TEMPORARY command before SELECT IF.
EXAMPLE.
select if (hh9 = 1).
select if (wm7 = 1).
select if (mwm7 = 1).
select if (uf9 = 1).
MERGING FILES IN SPSS
MATCH FILES command
• MATCH FILES combines variables from 2 up to 50 SPSS Statistics
data files.
• MATCH FILES can make parallel or nonparallel matches
between different files or perform table lookups.
• Parallel matches combine files sequentially by case (they are
sometimes referred to as sequential matches). Nonparallel
matches combine files according to the values of one or more
key variables.
• In general, MATCH FILES is used to combine files containing the
same cases but different variables.
MERGING FILES IN MICS5
• 4 – 9 SPSS MICS5 data files are produced for each survey,
corresponding to the main units of analysis:
o
o
o
o
o
o
o
o
o
Households - hh.sav
Household members - hl.sav
Women in reproductive age (15-49 years of age) – wm.sav
FGM – fg.sav
Birth history – bh.sav
Treated nets – tn.sav
Maternal mortality – mm.sav
Men (15 – 49 years of age) – mn.sav
Children under the age of five – ch.sav
MERGING FILES IN MICS5
HH.sav
• Relations with:
hl.sav, wm.sav, ch.sav, bh.sav, fg.sav, tn.sav, mn.sav
• Base key variables:
HH1 (cluster number) and
HH2 (household number)
MERGING FILES IN MICS5
HL.sav
• Relations with:
wm.sav, ch.sav, bh.sav, fg.sav, mn.sav
• Base key variables:
HH1 (cluster number) and
HH2 (household number)
LN (HL1) (member’s line number)
MERGING FILES IN MICS5
WM.sav, CH.sav, MN.sav
• Relations with: hh.sav, hl.sav
• Base key variables:
HH1 (cluster number),
HH2 (household number) and
LN (HL1) (member’s line number)
IMPORTANT NOTE: variable HL1 in hl.sav data file is named LN in wm.sav ,ch.sav and
mn.sav files. Renaming of the variable is required prior to merging.
MERGING FILES IN MICS5
BH.sav
• Relations with: hh.sav, hl.sav, wm.sav
• Base key variables:
HH1 (cluster number),
HH2 (household number) and
HL1 (member’s line number)
MERGING FILES IN MICS5
MM.sav
• Relations with: hh.sav, hl.sav, wm.sav
• Base key variables:
HH1 (cluster number),
HH2 (household number) and
LN (member’s line number)
MERGING FILES IN MICS5
TN.sav
• Relations with: hh.sav, hl.sav
• Base key variables:
HH1 (cluster number),
HH2 (household number) and
HL1 (member’s line number)
Example on how to merge hh.sav onto a
wm.sav
• Make sure both files are sorted in ascending order by key
variables before trying to merge.
Example on how to merge hh.sav onto a
wm.sav
• From the menus choose: Data…. Merge Files…. Add
Variables...
Example on how to merge hh.sav onto a
wm.sav
• Select the file you wish to merge:
If the file is already open select it from the list of „an open
dataset“, and if it is not then browse for the file.
Example on how to merge hh.sav onto a
wm.sav
• Select the key variables:
Example on how to merge hh.sav onto a
wm.sav
• SPSS will give you a warning regarding sorted key
variables. Make sure both files were sorted in ascending
order before trying to do a file merge.
Example on how to merge hh.sav onto a
wm.sav
* open the women file.
get file ="wm.sav“.
* sort cases by ID variables.
sort cases HH1 HH2 LN.
save outfile = "wm.sav".
* open the household file.
get file ="hh.sav".
* sort cases by ID variables.
sort cases HH1 HH2.
save outfile = "hh.sav".
* merge the household data
file onto the women file.
match files
/file = "wm.sav"
/table = 'hh.sav'
/by HH1 HH2 .
*save the women's file.
save outfile = 'wm.sav'.
Aggregate data
• Aggregate data aggregates groups of
cases in the active dataset into single
cases and creates a new, aggregated file
or creates new variables in the active
dataset that contain aggregated data
Aggregate data
• Cases are aggregated based on the value
of zero or more break (grouping) variables
• If no break variables are specified, then
the entire dataset is a single break group
Aggregate data
EXAMPLE
AGGREGATE
/OUTFILE=‘tmp1.sav'
/BREAK=HH1 HH2
/hhmem=N(HL1).
• AGGREGATE creates a new SPSS Statistics data file, tmp1.sav, that
contains two break variables (cluster and household number) and new
aggregate variables.
• BREAK specifies cluster and household numbers as the break variables.
• One aggregated variables is created: hhmem contains total number of
household members in each household.
Creating tables using SPSS
CTABLES command
• The Custom Tables procedure produces tables in
one, two, or three dimensions
• Command provides a lot of flexibility for organizing
and displaying the contents
Creating tables using SPSS
CTABLES command
• The Custom Tables procedure
produces tables in one, two,
or three dimensions
• Command provides a lot of
flexibility for organizing and
displaying the contents
• Syntax for the CTABLES
command can be generated
from the Custom Tables
dialog
Creating tables using CTABLES command
CTABLES
/FORMAT EMPTY=ZERO {BLANK }{'chars'}
/TABLE rows BY columns BY layers
/SLABELS POSITION= {COLUMN} VISIBLE= {YES}
{ROW }
{NO }
{LAYER }
/TITLES CAPTION= ['text' 'text'...]
CORNER= ['text' 'text'...]
TITLE= ['text' 'text'...]
CTABLE Command Example
ctables
/vlabels variables = tot2 display = none
/table hh7 [c] + hh6[c] + mslbrthr [c] + welevel [c] + tot1[c] by
ebrf[s][mean,'',f5.1]+ tot2[c][count,'',f5.0]
/slabels position = column visable = no
/categories var=all empty=exclude missing=exclude
/title title="Table NU.2: Initial breastfeeding"
"Percentage of last-born children in the 2 years preceding the survey
who were ever breastfed, percentage who were breastfed within one
hour of birth and within one day of birth, and percentage who received
a prelacteal feed, " + surveyname
caption=
"[1] MICS indicator 2.4
Download