Multiple Indicator Cluster Surveys Data Processing Workshop SPSS general commands Overview MICS Data Processing Workshop IBM SPSS Statistics Statistical Package for the Social Sciences • SPSS is a full-featured data analysis program that offers a variety of applications including data base management, statistical analysis and graphics • The SPSS program runs on a wide variety of mainframe, mini, and microcomputers • The most recent version is SPSS 21, which runs on both Windows , Linux and Mac OS desktop platforms • www.ibm.com/software/analytics/spss Data management using the SPSS Statistics command language • Getting Data into SPSS Statistics • Merging data • Aggregating data • Weighting data • And many more .SAV File Extension • Data file created by SPSS is saved in a proprietary binary format and contains a dataset as well as a dictionary that describes the dataset; saves data by "cases" (rows) and "variables" (columns) • .SAV files are can store data extracted from other databases and Microsoft Excel spreadsheets. • .SAV files can also save data that has been entered manually by the user or data that has been generated by the software • SPSS datasets can be manipulated in a variety of ways using the SPSS engine Programming with SPSS Statistics • Although many of the tasks can be performed with the menus and dialog boxes, some very powerful features are available only with command syntax Programming with SPSS Statistics • Build and run command syntax • Get data, add new variables, and append cases to the active dataset • Create new datasets • Concurrently access multiple open datasets • Get output results • Create tables Creating Command Syntax Files .SPS File Extension • .SPS file is a program file used by SPSS, a statistical analysis application; saved in a plain text format and contains instructions written using the SPSS syntax; generally developed with the SPSS Syntax Editor; used for manipulating datasets and automating statistical analyses • You can use any text editor to create a command syntax file, but SPSS Statistics provides a number of tools to make your job easier Creating Command Syntax Files • SPSS program commands follow very specific syntax rules, which are described in various SPSS publications: • All commands must begin in the first column of a line and be spelled correctly Creating Command Syntax Files • Most commands include additional information (e.g., names of variables the command is to be applied to, options for processing data, displaying results, etc.)which may be continued on the same line using the appropriate delimiter (e.g., blank space, comma, slash) • or continued on an additional line(s) provided that the continuation begins after column 1 Creating Command Syntax Files • Commands can be typed in either upper or lower case • Most SPSS commands have default specifications, i.e., the options that will be used unless you tell SPSS to use something else • Use the Paste button. Make selections from the menus and dialog boxes, and then click the Paste button instead of the OK button. This will paste the underlying commands into a command syntax window Overview of the commands • • • • Data definition File interfaces Analyze data Modify data Data definition These commands: 1. bring raw data into SPSS, either from another file, or by typing it in yourself, and 2. enter descriptive information about the data Data definition Commands: DATA LIST VARIABLE LABELS VALUE LABELS MISSING VALUES Data list • DATA LIST defines a raw data file (data file containing numbers and other alphanumeric characters) by assigning names and formats to each variable in the file EXAMPLE: DATA LIST FILE='C:\MICS5\SPSS\MYHH.DAT' RECORDS=1 Variable and value labels • VARIABLE and VALUE LABELS commands delete all existing variable and value labels for the specified variable(s) and assign new variable and value labels. • ADD VALUE LABELS can be used to add new labels or alter labels for specified values without deleting other existing labels. EXAMPLE variable labels type "Main source of drinking water". value labels type 1 "Improved sources" 2 "Unimproved sources". File interfaces File interfaces commands access and save SPSS system files Commands: GET FILE SAVE OUTFILE Get file • GET FILE opens an SPSS data file. • SAVE produces a data file in SPSS Statistics format, which contains data plus a dictionary. The dictionary contains a name for each variable in the data file plus any assigned variable and value labels, missingvalue flags, and variable print and write formats. EXAMPLE: get file = 'hh.sav'. save outfile = 'hh.sav'. Analyze data • Commands that actually perform statistical analysis EXAMPLE frequencies variables=hc2 hc3 hc4 hc5 hc6 hc8 hc8a ws1 ws2 ws7 /statistics=stddev mean /order=analysis. Modify data • Commands that alter data and change file characteristics. Commands: COMPUTE RECODE IF SELECT IF Compute • Creates a new variable in the dataset: COMPUTE target variable=expression EXAMPLE compute persroom = 99. if (hc2 < 98) persroom = hh11/hc2. variable label persroom 'Persons per sleeping rooms'. missing values persroom (99). Recode • RECODE changes, rearranges, or consolidates the values of an existing variable. RECODE can be executed on a value-by-value basis or for a range of values. • Where it can be used, RECODE is much more efficient than the series of IF commands that produce the same transformation. • With RECODE, you must specify the new values. EXAMPLE. recode improved (100 = 1) (else = 2) into type. variable labels WS1 "". variable labels type "Main source of drinking water". value labels type 1 "Improved sources" 2 "Unimproved sources". IF • The IF command conditionally executes one or more transformations based on one or more logical expressions. EXAMPLE. compute improved = 0. if (WS1 = 11 or WS1 = 12 or WS1 = 13 or WS1 = 14 or WS1 = 15 or WS1 = 21 or WS1 = 31 or WS1 = 41 or WS1 = 51) improved = 100. if ((WS2 = 11 or WS2 = 12 or WS2 = 13 or WS1 = 14 or WS1 = 15 or WS2 = 21 or WS2 = 31 or WS2 = 41 or WS2 = 51) and WS1 = 91) improved = 100. variable label improved "Percentage of household population using improved sources of drinking water ". SELECT IF • SELECT IF permanently selects cases for analysis based on logical conditions that are found in the data. These conditions are specified in a logical expression. • For temporary case selection, it is necessary to specify a TEMPORARY command before SELECT IF. EXAMPLE. select if (hh9 = 1). select if (wm7 = 1). select if (mwm7 = 1). select if (uf9 = 1). MERGING FILES IN SPSS MATCH FILES command • MATCH FILES combines variables from 2 up to 50 SPSS Statistics data files. • MATCH FILES can make parallel or nonparallel matches between different files or perform table lookups. • Parallel matches combine files sequentially by case (they are sometimes referred to as sequential matches). Nonparallel matches combine files according to the values of one or more key variables. • In general, MATCH FILES is used to combine files containing the same cases but different variables. MERGING FILES IN MICS5 • 4 – 9 SPSS MICS5 data files are produced for each survey, corresponding to the main units of analysis: o o o o o o o o o Households - hh.sav Household members - hl.sav Women in reproductive age (15-49 years of age) – wm.sav FGM – fg.sav Birth history – bh.sav Treated nets – tn.sav Maternal mortality – mm.sav Men (15 – 49 years of age) – mn.sav Children under the age of five – ch.sav MERGING FILES IN MICS5 HH.sav • Relations with: hl.sav, wm.sav, ch.sav, bh.sav, fg.sav, tn.sav, mn.sav • Base key variables: HH1 (cluster number) and HH2 (household number) MERGING FILES IN MICS5 HL.sav • Relations with: wm.sav, ch.sav, bh.sav, fg.sav, mn.sav • Base key variables: HH1 (cluster number) and HH2 (household number) LN (HL1) (member’s line number) MERGING FILES IN MICS5 WM.sav, CH.sav, MN.sav • Relations with: hh.sav, hl.sav • Base key variables: HH1 (cluster number), HH2 (household number) and LN (HL1) (member’s line number) IMPORTANT NOTE: variable HL1 in hl.sav data file is named LN in wm.sav ,ch.sav and mn.sav files. Renaming of the variable is required prior to merging. MERGING FILES IN MICS5 BH.sav • Relations with: hh.sav, hl.sav, wm.sav • Base key variables: HH1 (cluster number), HH2 (household number) and HL1 (member’s line number) MERGING FILES IN MICS5 MM.sav • Relations with: hh.sav, hl.sav, wm.sav • Base key variables: HH1 (cluster number), HH2 (household number) and LN (member’s line number) MERGING FILES IN MICS5 TN.sav • Relations with: hh.sav, hl.sav • Base key variables: HH1 (cluster number), HH2 (household number) and HL1 (member’s line number) Example on how to merge hh.sav onto a wm.sav • Make sure both files are sorted in ascending order by key variables before trying to merge. Example on how to merge hh.sav onto a wm.sav • From the menus choose: Data…. Merge Files…. Add Variables... Example on how to merge hh.sav onto a wm.sav • Select the file you wish to merge: If the file is already open select it from the list of „an open dataset“, and if it is not then browse for the file. Example on how to merge hh.sav onto a wm.sav • Select the key variables: Example on how to merge hh.sav onto a wm.sav • SPSS will give you a warning regarding sorted key variables. Make sure both files were sorted in ascending order before trying to do a file merge. Example on how to merge hh.sav onto a wm.sav * open the women file. get file ="wm.sav“. * sort cases by ID variables. sort cases HH1 HH2 LN. save outfile = "wm.sav". * open the household file. get file ="hh.sav". * sort cases by ID variables. sort cases HH1 HH2. save outfile = "hh.sav". * merge the household data file onto the women file. match files /file = "wm.sav" /table = 'hh.sav' /by HH1 HH2 . *save the women's file. save outfile = 'wm.sav'. Aggregate data • Aggregate data aggregates groups of cases in the active dataset into single cases and creates a new, aggregated file or creates new variables in the active dataset that contain aggregated data Aggregate data • Cases are aggregated based on the value of zero or more break (grouping) variables • If no break variables are specified, then the entire dataset is a single break group Aggregate data EXAMPLE AGGREGATE /OUTFILE=‘tmp1.sav' /BREAK=HH1 HH2 /hhmem=N(HL1). • AGGREGATE creates a new SPSS Statistics data file, tmp1.sav, that contains two break variables (cluster and household number) and new aggregate variables. • BREAK specifies cluster and household numbers as the break variables. • One aggregated variables is created: hhmem contains total number of household members in each household. Creating tables using SPSS CTABLES command • The Custom Tables procedure produces tables in one, two, or three dimensions • Command provides a lot of flexibility for organizing and displaying the contents Creating tables using SPSS CTABLES command • The Custom Tables procedure produces tables in one, two, or three dimensions • Command provides a lot of flexibility for organizing and displaying the contents • Syntax for the CTABLES command can be generated from the Custom Tables dialog Creating tables using CTABLES command CTABLES /FORMAT EMPTY=ZERO {BLANK }{'chars'} /TABLE rows BY columns BY layers /SLABELS POSITION= {COLUMN} VISIBLE= {YES} {ROW } {NO } {LAYER } /TITLES CAPTION= ['text' 'text'...] CORNER= ['text' 'text'...] TITLE= ['text' 'text'...] CTABLE Command Example ctables /vlabels variables = tot2 display = none /table hh7 [c] + hh6[c] + mslbrthr [c] + welevel [c] + tot1[c] by ebrf[s][mean,'',f5.1]+ tot2[c][count,'',f5.0] /slabels position = column visable = no /categories var=all empty=exclude missing=exclude /title title="Table NU.2: Initial breastfeeding" "Percentage of last-born children in the 2 years preceding the survey who were ever breastfed, percentage who were breastfed within one hour of birth and within one day of birth, and percentage who received a prelacteal feed, " + surveyname caption= "[1] MICS indicator 2.4