Session 2 Housekeeping: Variable labels, value labels, calculations and recoding 1 Review You have used Stata Largely through the menus and dialogues But also with a few commands We hope you found it (surprisingly?) easy Discuss what you liked And difficulties so far 2 Housekeeping tasks By housekeeping, we mean the small jobs to organise and add labels to the data They make life easier later. This includes: labelling and adding notes to datasets; labelling variables labelling categories (or values) taken by the variable recoding variables and dealing with codes for missing values using log files to keep a record of what you have done. 3 Labels and notes Open the file named E_HouseholdComposition.dta Use Data Labels Label dataset 4 Dialogue for labelling data set Type in dialogue as below or use the command label data “Young Lives Study……” 5 4 Labelling variables Use the menu sequence Data Labels Label variable as shown below Or type the command: label variable relcare "What is your relationship to child?“ 6 Defining value labels Use: Data Labels Label values Define or modify value labels and complete the dialogue box that follows. The corresponding commands show that two steps are needed to label the values. • First, a label must be defined,e.g. label define sexlabel 1 "male" 2 "female" • Then this label is attached to the variable, • e.g. for the variable called sex use the command label values sex sexlabel 7 Your turn Work through Section 4.1of the Stata Guide Note down any difficulties you have and clarify your difficulties with a resource person 8 Recoding a variable Also use options to define a new variable Data Create or change variables Other variable transformation commands Recode categorical variable 9 Information on the recoded variable Always safer to recode into a new variable, e.g. seedad2. The effect of the recoding can be seen by typing codebook seedad2 If seedad is later no longer needed, it can be dropped. Use File Save, to save information on the new variable in the data set. 10 Your turn again Work through Section 4.2 of the Stata Guide Note down any difficulties you have and clarify your difficulties with a resource person 11 Missing values Symbols for missing values in Stata: . and .a .b .c and so on, up to .z These are used to distinguish between the different reasons for values to be missing. When making calculations, comparisons or sorting, the following rules are observed: all non-missing numbers are less than . . is less than .a .a is less than .b, and so on, up to .z 12 Memory The initial memory in Stata is 1 megabyte This can be changed, but first type Clear to clear memory To increase the current memory to 20 mbytes, type set memory 20m For setting Permanent memory, use set memory 20m, permanently For problems processing large datasets, use the compress command. 13 Log files To keep a record of the output, while using Stata This opens a dialogue Open a log file by clicking on the Log icon. In your working directory so you can name the log file It suggests an extension smcl .smcl stands for Stata Markup and Control Language. •Log files in Stata record both commands and output. 14 Remarks You can change the extension to “log” to produce a simple ASCII file Other packages use the idea of a log file to record just the command – not the output as well You can do this in Stata (but not from the menus) Do the same again, but using Notice that the command Stata used for its log file was . log using “name of file” . cmdlog using “name of file” If at a later stage you need to append or replace this file, add the option replace or append at the end of the above commands. 15 Your turn Practice the above ideas by working through Sections 4.6, 4.7, 4.8 of the Stata Guide. Then either read your own data into Stata and perform some simple analyses using methods covered so far Or use a dataset suggested by the resource persons. 16 So if you have a dataset… Open, within Stata, the data file in Stata format that you created in the previous session. Identify the key variables in your data set and set up labels for each of these variables. Identify any categorical variables in your data set. Then define, and set value labels that describe the levels for each categorical variable. Finally, re-save your data file. 17