Getting to Know Stata

advertisement

Yale University Social Science Statistical Laboratory

(Statlab)

Outline

I. Motivation: Why Statistics

II. Navigating Stata

III. Inputting Data

IV. Perusing the Data

V. Modifying and Analyzing the Data

VI. Merging Data

VII. Tips

I.

Motivation: Why Statistics

Introduction to Stata Workshop

Brian Fried, Spring 2010

II.

Navigating Stata

The Windows version of Stata opens to present 4 windows and a task bar.

Stata provides menu choices and buttons for almost any task, but most users eventually prefer to enter commands in the Command window for most tasks.

Automatically displayed windows: When you start Stata, the following windows open: o Command window: The Command window at the bottom of the screen is where you will type in commands. Execute with the Enter key. This window is not available when the Data Editor is open. o o

Results window: The black window labeled Results displays output from commands. The Results window is for screen display only; contents cannot be edited, but can be directed to a log file.

Review window: The Review window displays previously entered commands. To re-run commands without typing them again, click the command. That command will appear in the

Command window; hit Enter to execute the command. You can also double click commands in the

Review window to run them right away. The contents of the Review window may also be saved to a file to edit and to use as a do-file; right-click in the Review window and select Save All… o Variables window: The Variables window will list variables in the currently active dataset. Clicking on a variable name in the Variables window will add insert that variable name into the Command window.

Other Windows accessible through the Windows menu or by clicking on the respective menu button: o Data Editor: allows entry or correction of data (use cautiously; however you will be asked to confirm changes upon closing the window) o Data Browser: allows viewing but not editing of data, safer than Data Editor) o o

Do-File Editor: allows for editing or creating lists of commands in do-files

Viewer Window: allows for viewing (not editing) log files and help files. Log files are text versions of the results window suitable for editing in a text editor.

III.

IV.

Inputting Data

As mentioned, one can enter data by hand in the data editor. Why is this usually a bad idea?

If you have a Stata Dataset, you can: o Select File > Open… then browse to and select the file. o In the Command window, type use c:\user\olddata to open the existing Stata file ‘olddata.dta’ saved in ‘C:\user\’

 The command is use after which you type the full file path and name. If the path or file name includes any spaces (as in ‘C:\Documents and Settings\’) Stata requires double quotes (“ “) to surround the path and filename.

 Let’s practice. In the command window type:

 clear

 use t:\stataclass\sample.dta {browse}

If you have data in another format, you can use StatTransfer to convert it into Stata’s format.

You can also import some files directly into Stata. For example, if you are starting with an Excel file, the best option is to save it as .csv (comma-delimited) file and import it directly into Stata.

 Let’s practice. In the command window type clear

 Select File > Import > ASCII data created by a spreadsheet

 Select “comma-delimited” and browse until you find T:\StataClass\sample.txt

 You can also type in the command that appears on the screen.

Perusing the data

Descriptive statistics: It’s good to begin by getting a feel for the data. Try out the following commands and note in the available space what they produce.

 codebook

 describe

 list o You can modify the list command to return a subset of the data. Try:

 list write read

 list write read in 10/20

 list write read if read <= 60

 summarize write read

 tabulate science

StatLab: Introduction to Stata Workshop 2/05/2010 2

V.

 tab schtyp science o note: many commands can be abbreviated

 sort prgtype by prgtype: summarize math o note: some commands require that data be sorted first

Modifying and Analyzing the data

Before you do any analysis OPEN A LOG FILE

 o o

Option 1: Click the Log icon in the Menu bar to open a dialog box to begin a log file.

 Select Log > Begin from the File Menu

 Close by Log > Close

Option 2: Type in the command line: log using c:\temp\mylogname

 Close by typing log close

 Basic Syntax of a Stata command o o

COMMAND Variables Restrictions, options

 For Example: regress y x1 x2 x3 if x4==2, noconstant

In Stata manuals, the command lines are formatted as below:

 regress depvar [indepvars] [if] [in] [weight] [, options]

 Text within brackets [] are optional restrictions or options.

 In the example above, we use the command regress, the dependent variable y, the list of independent variables x1 x2 x3, the optional if restriction x4==2, and the option o o

 noconstant

Underlined sections indicate acceptable abbreviations (i.e. reg instead of regress)

Note: Stata is case sensitive

To add variables to the command line, you may either type the variable name or click on the variable name in the variable window.

Modify Data o o

Labeling Variables (can also be done in a do file)

 Type codebook schtyp

label variable schtyp "The type of school the student attended."

label define scl 1 public 2 private

label values schtyp scl

Type codebook schtyp

Recoding values of variables (can also be done in a do file)

tab race

recode race 5 = .

tab race

 could also use replace race = . if race = 5

StatLab: Introduction to Stata Workshop 2/05/2010 3

o o

Dropping variables (be careful, cannot be undone….one reason why do files help!)

keep

drop

Generate new variables

generate total = math + science + write

generate tot_sq = weight^2

generate lowmath = math<=52

 To utilize Stata’s built in functions, like minimum, maximum, tag, etc., use the the egen command rather than gen.

 egen rmean = mean(read), by(ses)

Analyze o o

T-tests

ttest science=math Paired t-test testing whether or not the means are statistically

 equivalent.

ttest science, by(gender) Two-sample t-test with pooled variances

Regression (What is a regression?)

regress write read gender OLS regression

xi: regress write read gender i.progtyp this creates a dummy variable for program type.

Graphs o The easiest way to make graphs is through the menus.

 For example, go to Graphics > Histogram. Choose science for your histogram under the

 menu of the ‘Main’ tab. Click on the frequency toggle button under ‘Y axis’.

The graph will display in a separate window. You’ll see that the Stata command for the o o graph you created will appear in the review window, in this case histogram

science,frequency .

Some basic graph commands are listed below:

scatter write read, title(Write / Read)

graph box write

twoway lfit read write

graph box varname1 varname2 varname3

twoway (scatter varname1 varname2) (lfitci varname1 varname2), by(varname3)

graph matrix varname1 varname2 varname3

For examples of graphs, and the code that produced them see

 http://www.ats.ucla.edu/stat/stata/topics/graphics.htm

 http://www.ats.ucla.edu/stat/Stata/examples/rwg/rwgstata5/rwgstata5.htm

 http://survey-design.com.au/tips.html

StatLab: Introduction to Stata Workshop 2/05/2010 4

VI.

Merging Data

Stata provides three different commands for merging datasets together: append, joinby, and merge. o append

 Add records to a data file. For example, if you had collected identical information on students from two different high schools and had originally placed the information in two different Stata data files, you could use append to aggregate the data into one large data set with records from both schools. o joinby

 Link group attributes to individual attributes. For example, if you had collected information on high schools (average GPA, funding, etc.) in one Stata file and information on individual students in another, you could use joinby to merge schools’ attributes to each student observation. o merge

 Add variables to a current datafile. For example, if you had two different sources of information on high school students, such as a data file of school records and a data file of survey answers, you could use merge to generate a file which combined for each record a student’s academic performance and his or her answers to the survey. This merge would require a unique identifier for each case (for example student ID); this identifier must be the same in each file. Identifiers may be created by using a combination of variables (such as first and last name); however, be cautious that these combinations will indeed be unique.

VII.

Tips

Use do files! (Replication) o o o

Commands can be stored in the form of do-files (.do extension). These files are useful both as records of data manipulation and of the estimation processes as well as replication on other datasets.

The review window stores in order all commands entered in the command line. You can save the contents of your ‘Review’ window as a .do file by right clicking in the ‘Review’ window. Choose

‘Save All…’ then follow the prompts. This will provide you with a record of all commands used during that session.

 This file can be edited in the Do-File window, available by clicking on the icon in the toolbar or through the menus Window > Do-File Editor.

You can also edit a .do file using any text editor (Notepad, Word, etc.)

You can enter text directly into the do file window.

Running do files

 Use the buttons on the toolbar. Select either:

Run: to run quietly, i.e. without displaying output

Do: to run with output displayed in the results screen

StatLab: Introduction to Stata Workshop 2/05/2010 5

 Saving your data in Stata format o File > Save opens a dialog box to save the currently active dataset. o save c:\temp\myfile saves the active data into a new Stata dataset myfile.dta located in C:\temp\.

If a file named myfile.dta already exists in that directory, Stata will display an error message and the file will not be saved. If you want to replace the file, add replace to the end of the command, as follows: save c:\temp\myfile, replace (Don’t forget the comma.)

 Stata help; Statlab FAQ

More tips: o o o

Help is available by clicking Help > Search in the menu. Type in keywords for a specific search.

Click Help > Contents to browse topics. Alternately, in the Command window, type help and the keyword or command name, i.e. help graph to open the Viewer window with the selected help files.

For FAQs, Help, and this workshop, see the StatLab software help page http://statlab.stat.yale.edu/statlab/software.html

Stata and other statistics and econometric books are available at the statlab.

How do I paste STATA tables from the Results window into Word?

Highlight your table in the Stata results window. Right-click and select either "Copy" or "Copy table." Go to your Word document and right-click where you want the table. Select "Paste". To line up the columns, highlight your table and switch your font to Courier New, size 9 or 9.5 to get the columns to line up.

How do I paste STATA tables from the Results window into Excel?

A simple cut-and-paste will yield an unreadable table that where each row is a single cell in Excel. To get a table that has one entry per cell, do the following: o Highlight the table in the Results window. Highlight only one table, include all text on every row of the table and do not include any rows above or below the table. Stata is very picky about this. o Right-click the highlighted table and select "Copy Table". Do not use the Ctrl-C shortcut here. o Right-click the cell in Excel where you want the data, and select "Paste." This should yield tables that have one number entry per cell. If not, make sure you have highlighted every bit of table, not a space more, not a space less

StatLab: Introduction to Stata Workshop 2/05/2010 6

Download