Getting Started With Stata Session 1 Jim Anthony John Troost Department of Epidemiology Michigan State University Windowing and the Edit submenus The Review Window displays a record of the commands implied by your changes to the data editor. You can save these commands so that you do not have to enter the basic data structure next time. VARIABLE window, with variables being created COMMAND window The ‘Log’ or ‘Output’ window echoes back the commands and the result of each command. You’ll learn to save a log file that you can use to document your work, copy/paste tables to emails, or print it out. Enter the 1 0 1 0 sequence in the first four rows of var1 as shown here, and then click on the first row of the second column. In that cell of the dataset, enter 1 as shown below. Repeat that process for the var2 variable by double-clicking on ‘var2’ at the top of the second column of values, and make the changes as shown below in order to label the unprotected sex variable, which also is a ‘dummy-coded’ exposure variable which means that 1 is the code for exposed and 0 is the code for unexposed. In the jargon of our field, any 0/1 coded variable is a ‘dummy-coded’ variable. Question for you: Is the aids variable a ‘dummy-coded’ variable even though it has to do with ‘case’ status and not with ‘exposure’ status’? Think about it. The answer is on the next slide. Yes, the aids variable is a dummy-coded variable as well because it has the 0/1 coding scheme. ANY 0/1 variable might be thought of as a dummy-coded variable, whether it applies to case status, exposure status, or any other kind of variable. A dummy-coded variable always is a ‘binary’ or ‘dichotomous’ variable. I think it is helpful to reserve the term ‘dummy-coded’ for variables that are “nominal” in the variable’s level of measurement. This concept of ‘level of measurement’ is an important one. Nominal variables are at a very low level of measurement. The values are names, and the 0/1 coding may have nothing to do with units of measurement as we see in an ‘ordinal’ variable that conveys rank. For example, an ordinal variable conveys class rank. The best student might get a class rank value of 1 (first in the class; best grade). The next best student would get a class rank value of 2 (second in class; next best score), and so on, with the integers actually conveying the ‘distance’ or ranking of each student in relation to an underlying scale, and we can interpret the meaning of a unit change in the class rank variable. In this sense, we look across values of nominal variables, but can’t compare levels. Nominal variables can reveal group differences, but not levels of the variable. Actually, you can leave this at float type and change the format to %4.0f, which gives a bit more generality. But name the variable struc1 (your first data structure). The data structure you just created corresponds to a null association between aids and unprotected sex. One way to think about this data structure is that it is the kind of structure we might generate if we had flipped two fair heads/tails coins for each of the 400 people, and then used the pattern of heads and tails to place each person into a case-exposure cell of the table. If the laws of chance worked exactly as they should work with respect to these 400 people, and we are flipping Coin #1 and then Coin #2, and looking at the combinations of heads and tails on the two paired coins, then how many combinations of each type should we see? How many ‘head–head’ combinations? How many ‘tail-tail’ combinations? How many ‘head-tail’ combinations? And how many ‘tail-head’ combinations? Hint: The chance of a ‘head-head’ combination = the chance of a ‘tail-tail’ combination = the chance of a ‘head-tail’ combination = the chance of a ‘tail-head’ combination. ANSWER IS ON THE NEXT PAGE If the laws of chance work exactly as they should work with respect to these 400 people, then how many the paired coin flips should be generated? 100 of each combination type. This is the data structure we just created using the Stata Data Editor, for an initial look at the association between being a case of AIDS and prior unprotected sex exposure. Here we leave it as a float variable, but change the format to: %4.0f. Name it struc2. When you go back to the main Stata windows after closing the Data Editor window, you will find that Stata has kept a record of all your commands in the REVIEW window. You can save them and study the program syntax later. The next slide explains how to do it. Your VARIABLES window now has a list of all the variables you created in the dataset, which you can save. The BIG window is the OUTPUT or LOG window, and it shows your commands and their execution results. You probably won’t see anything in the COMMAND window at this time. To save your commands for later inspection, move your cursor into the Review Window, and right click. Slide your cursor to ‘SAVE ALL’ and you will be prompted to declare a location and file name where you can save them. By tradition, the extension for Stata syntax files includes the letters ‘do’ because these are ‘do’ files. Save them with an informative name, such as ‘build26feb11.do’ so that you can remember out how to ‘build’ a dataset from scratch using this file. We can go over the other options later. However, sometimes, if you have issued some incorrect commands, you may want to partition the correct commands from the incorrect commands, before you save your program syntax commands for later use, Incorrect commands show up in the command window as red font. To partition the incorrect ones (if you have made any mistakes in issuing commands), slide the Review Window to the right (red arrow at bottom of the snapshot), and then click on the _rc letters printed up at the top of the command window. Now, you can either select on the correct commands and save the selected ones, using the menu from the last slide. Or you can save ALL of the commands, sorted by correct status. Now, let’s have you look at the data structures you built, using a basic tabulate command, which is abbreviated by Stata as: tab Start by typing tab in the COMMAND window, and follow the instructions below, step by step. Then go to next slide. The result should look like what you see down below. Now, position your cursor in the command window to the right of the word ‘aids’ as shown above. Use the space bar to add a space and ENTER this phrase [fweight=struc1] Then press the ENTER key. This command applies ‘struc1’ values as ‘frequency weights’ and builds the 2x2 aids – u_sex table, as shown in the log window (next slide). The result should look like what you see down below, but the command line should be empty. Look at the table before going to the bottom of this slide. Now, let’s apply the struc2 weights and see the positive association table. Do this by pressing the PgUp key to retrieve your just-issued command, and change struc1 to struc2. (You can just change the number. No need to type the entire word again.) Press ENTER to issue this command. ANNOTATING THE OUTPUT WITH COMMENTS SAVING THE DATASET SAVING THE OUTPUT IN A LOG FILE YOU CAN EDIT WITH NOTEPAD SAVING THE COMMAND FILE YOU CAN EDIT WITH NOTEPAD SEE SLIDE 24-25, AS SHOWN BEFORE Second Part of Session 1 • As a work group or on your own, view the UCLA Stata introductory streaming video on other ways to bring data into the Stata environment (e.g., if you have an .xls spreadsheet version of the data): http://www.ats.ucla.edu/stat/stata/notes_old/movies/IntroStata1.html • This video also teaches some nifty Stata tricks about describing datasets, etc. Information about importing SPSS and SAS files into Stata can be found here: http://www.ats.ucla.edu/stat/stata/faq/convert_pkg.htm • Other Stata aids at the UCLA site are here: http://www.ats.ucla.edu/stat/stata/ Session 2 Overview • • An overview of the Stata epitab commands will be provided. The ‘immediate’ commands will be covered in detail http://www.stata.com/help.cgi?epitab In advance of Session 2, read Chapter 1 (3 pages) of this online text if you are new to epidemiology or need a quick refresher overview. http://www.epi.msu.edu/janthony/Epidemiologic%20Analysis%20with%20a%20Programmable%20Calculator.pdf End of Session 1 A copy of this PPT and an annotated Stata do-file with these commands can be found at the following URL: http://www.epi.msu.edu/janthony/stata/session1/ Try the .zip file if you cannot access the individual files.