Outline
I. Motivation: Why Statistics
II. Navigating Stata
III. Inputting Data
IV. Perusing the Data
V. Modifying and Analyzing the Data
VI. Merging Data
VII. Tips
I.
Motivation: Why Statistics
Introduction to Stata Workshop
Brian Fried, Spring 2010
II.
Navigating Stata
The Windows version of Stata opens to present 4 windows and a task bar.
Stata provides menu choices and buttons for almost any task, but most users eventually prefer to enter commands in the Command window for most tasks.
Automatically displayed windows: When you start Stata, the following windows open: o Command window: The Command window at the bottom of the screen is where you will type in commands. Execute with the Enter key. This window is not available when the Data Editor is open. o o
Results window: The black window labeled Results displays output from commands. The Results window is for screen display only; contents cannot be edited, but can be directed to a log file.
Review window: The Review window displays previously entered commands. To re-run commands without typing them again, click the command. That command will appear in the
Command window; hit Enter to execute the command. You can also double click commands in the
Review window to run them right away. The contents of the Review window may also be saved to a file to edit and to use as a do-file; right-click in the Review window and select Save All… o Variables window: The Variables window will list variables in the currently active dataset. Clicking on a variable name in the Variables window will add insert that variable name into the Command window.
Other Windows accessible through the Windows menu or by clicking on the respective menu button: o Data Editor: allows entry or correction of data (use cautiously; however you will be asked to confirm changes upon closing the window) o Data Browser: allows viewing but not editing of data, safer than Data Editor) o o
Do-File Editor: allows for editing or creating lists of commands in do-files
Viewer Window: allows for viewing (not editing) log files and help files. Log files are text versions of the results window suitable for editing in a text editor.
III.
IV.
Inputting Data
As mentioned, one can enter data by hand in the data editor. Why is this usually a bad idea?
If you have a Stata Dataset, you can: o Select File > Open… then browse to and select the file. o In the Command window, type use c:\user\olddata to open the existing Stata file ‘olddata.dta’ saved in ‘C:\user\’
The command is use after which you type the full file path and name. If the path or file name includes any spaces (as in ‘C:\Documents and Settings\’) Stata requires double quotes (“ “) to surround the path and filename.
Let’s practice. In the command window type:
clear
use t:\stataclass\sample.dta {browse}
If you have data in another format, you can use StatTransfer to convert it into Stata’s format.
You can also import some files directly into Stata. For example, if you are starting with an Excel file, the best option is to save it as .csv (comma-delimited) file and import it directly into Stata.
Let’s practice. In the command window type clear
Select File > Import > ASCII data created by a spreadsheet
Select “comma-delimited” and browse until you find T:\StataClass\sample.txt
You can also type in the command that appears on the screen.
Perusing the data
Descriptive statistics: It’s good to begin by getting a feel for the data. Try out the following commands and note in the available space what they produce.
codebook
describe
list o You can modify the list command to return a subset of the data. Try:
list write read
list write read in 10/20
list write read if read <= 60
summarize write read
tabulate science
StatLab: Introduction to Stata Workshop 2/05/2010 2
V.
tab schtyp science o note: many commands can be abbreviated
sort prgtype by prgtype: summarize math o note: some commands require that data be sorted first
Modifying and Analyzing the data
Before you do any analysis OPEN A LOG FILE
o o
Option 1: Click the Log icon in the Menu bar to open a dialog box to begin a log file.
Select Log > Begin from the File Menu
Close by Log > Close
Option 2: Type in the command line: log using c:\temp\mylogname
Close by typing log close
Basic Syntax of a Stata command o o
COMMAND Variables Restrictions, options
For Example: regress y x1 x2 x3 if x4==2, noconstant
In Stata manuals, the command lines are formatted as below:
regress depvar [indepvars] [if] [in] [weight] [, options]
Text within brackets [] are optional restrictions or options.
In the example above, we use the command regress, the dependent variable y, the list of independent variables x1 x2 x3, the optional if restriction x4==2, and the option o o
noconstant
Underlined sections indicate acceptable abbreviations (i.e. reg instead of regress)
Note: Stata is case sensitive
To add variables to the command line, you may either type the variable name or click on the variable name in the variable window.
Modify Data o o
Labeling Variables (can also be done in a do file)
Type codebook schtyp
label variable schtyp "The type of school the student attended."
label define scl 1 public 2 private
label values schtyp scl
Type codebook schtyp
Recoding values of variables (can also be done in a do file)
tab race
recode race 5 = .
tab race
could also use replace race = . if race = 5
StatLab: Introduction to Stata Workshop 2/05/2010 3
o o
Dropping variables (be careful, cannot be undone….one reason why do files help!)
keep
drop
Generate new variables
generate total = math + science + write
generate tot_sq = weight^2
generate lowmath = math<=52
To utilize Stata’s built in functions, like minimum, maximum, tag, etc., use the the egen command rather than gen.
egen rmean = mean(read), by(ses)
Analyze o o
T-tests
ttest science=math Paired t-test testing whether or not the means are statistically
equivalent.
ttest science, by(gender) Two-sample t-test with pooled variances
Regression (What is a regression?)
regress write read gender OLS regression
xi: regress write read gender i.progtyp this creates a dummy variable for program type.
Graphs o The easiest way to make graphs is through the menus.
For example, go to Graphics > Histogram. Choose science for your histogram under the
menu of the ‘Main’ tab. Click on the frequency toggle button under ‘Y axis’.
The graph will display in a separate window. You’ll see that the Stata command for the o o graph you created will appear in the review window, in this case histogram
science,frequency .
Some basic graph commands are listed below:
scatter write read, title(Write / Read)
graph box write
twoway lfit read write
graph box varname1 varname2 varname3
twoway (scatter varname1 varname2) (lfitci varname1 varname2), by(varname3)
graph matrix varname1 varname2 varname3
For examples of graphs, and the code that produced them see
http://www.ats.ucla.edu/stat/stata/topics/graphics.htm
http://www.ats.ucla.edu/stat/Stata/examples/rwg/rwgstata5/rwgstata5.htm
http://survey-design.com.au/tips.html
StatLab: Introduction to Stata Workshop 2/05/2010 4
VI.
Merging Data
Stata provides three different commands for merging datasets together: append, joinby, and merge. o append
Add records to a data file. For example, if you had collected identical information on students from two different high schools and had originally placed the information in two different Stata data files, you could use append to aggregate the data into one large data set with records from both schools. o joinby
Link group attributes to individual attributes. For example, if you had collected information on high schools (average GPA, funding, etc.) in one Stata file and information on individual students in another, you could use joinby to merge schools’ attributes to each student observation. o merge
Add variables to a current datafile. For example, if you had two different sources of information on high school students, such as a data file of school records and a data file of survey answers, you could use merge to generate a file which combined for each record a student’s academic performance and his or her answers to the survey. This merge would require a unique identifier for each case (for example student ID); this identifier must be the same in each file. Identifiers may be created by using a combination of variables (such as first and last name); however, be cautious that these combinations will indeed be unique.
VII.
Tips
Use do files! (Replication) o o o
Commands can be stored in the form of do-files (.do extension). These files are useful both as records of data manipulation and of the estimation processes as well as replication on other datasets.
The review window stores in order all commands entered in the command line. You can save the contents of your ‘Review’ window as a .do file by right clicking in the ‘Review’ window. Choose
‘Save All…’ then follow the prompts. This will provide you with a record of all commands used during that session.
This file can be edited in the Do-File window, available by clicking on the icon in the toolbar or through the menus Window > Do-File Editor.
You can also edit a .do file using any text editor (Notepad, Word, etc.)
You can enter text directly into the do file window.
Running do files
Use the buttons on the toolbar. Select either:
Run: to run quietly, i.e. without displaying output
Do: to run with output displayed in the results screen
StatLab: Introduction to Stata Workshop 2/05/2010 5
Saving your data in Stata format o File > Save opens a dialog box to save the currently active dataset. o save c:\temp\myfile saves the active data into a new Stata dataset myfile.dta located in C:\temp\.
If a file named myfile.dta already exists in that directory, Stata will display an error message and the file will not be saved. If you want to replace the file, add replace to the end of the command, as follows: save c:\temp\myfile, replace (Don’t forget the comma.)
Stata help; Statlab FAQ
More tips: o o o
Help is available by clicking Help > Search in the menu. Type in keywords for a specific search.
Click Help > Contents to browse topics. Alternately, in the Command window, type help and the keyword or command name, i.e. help graph to open the Viewer window with the selected help files.
For FAQs, Help, and this workshop, see the StatLab software help page http://statlab.stat.yale.edu/statlab/software.html
Stata and other statistics and econometric books are available at the statlab.
How do I paste STATA tables from the Results window into Word?
Highlight your table in the Stata results window. Right-click and select either "Copy" or "Copy table." Go to your Word document and right-click where you want the table. Select "Paste". To line up the columns, highlight your table and switch your font to Courier New, size 9 or 9.5 to get the columns to line up.
How do I paste STATA tables from the Results window into Excel?
A simple cut-and-paste will yield an unreadable table that where each row is a single cell in Excel. To get a table that has one entry per cell, do the following: o Highlight the table in the Results window. Highlight only one table, include all text on every row of the table and do not include any rows above or below the table. Stata is very picky about this. o Right-click the highlighted table and select "Copy Table". Do not use the Ctrl-C shortcut here. o Right-click the cell in Excel where you want the data, and select "Paste." This should yield tables that have one number entry per cell. If not, make sure you have highlighted every bit of table, not a space more, not a space less
StatLab: Introduction to Stata Workshop 2/05/2010 6