STATA Tutorial

advertisement
LEARNING STATA
KEY COMMANDS
1) From the Start menu, load the STATA 9 program (intercooled Stata 9)
2) use \statafiles\caschool.dta
Datasets in STATA are called dta files. Create a folder called statafiles in the C-drive and
download the dataset caschool.dta from class web site
http://www.econ.iastate.edu/classes/econ371/McPhail/lab.html to this folder. The
above command then loads the dataset into STATA memory.
3) describe
This command tells STATA to “describe” the dataset. This command produces a list of
the variable names and any variable descriptions stored in the dataset.
4) generate income = avginc*1000.
The command tells STATA to create a new variable called income. The new variable is
constructed by multiplying the variable avginc by 1000. The variable avginc is
contained in the dataset and is the average household income in a school district
expressed in thousands of dollars. The new variable income will be the average
household income expressed in dollars instead of thousands of dollars.
5) summarize income
This command tells STATA to compute some summary statistics (mean, standard
deviations, and so forth) for income.
6) clear
This command erases any data already in STATA’s memory.
7) correlate str testscr
Two of the variables in the dataset are testscr (the average test score in a school district)
and str (the district’s average class size or student-teacher ratio). The above command
tells STATA to compute the correlation between str and testscr.
8) scatter testscr str
This command generates a scatter plot of testscr versus str.
9) regress testscr str
1
This command tells STATA to run an OLS regression with testscr as the dependent
variable and str as the explanatory variable. Note that the first variable appearing after
the regress command is always the dependent variable to be explained, while the
variables following it will be the explanatory variables.
HYPOTHESIS TESTING
We are now interested in constructing tests and confidence intervals for the mean of a
population or the difference between the means of two different populations.
10) ttest testscr=0
This command computes the sample mean and standard deviation of the variable testscr,
computes a t-test that the population mean is equal to zero, and computes a 95%
confidence interval for the population mean.
11) generate testscr_lo = testscr if (str<20)
generate testscr_hi = testscr if (str>=20)
ttest testscr_lo = testscr_hi, unequal unpaired
To test hypotheses regarding the difference between the means of different populations,
we first need to define two new variables – testscr_lo, which relates to the test scores
for students in districts that have an average class size of less than twenty students and
testscr_hi, which gives the test scores for students in districts having an average class
size of twenty students or greater.
The command ttest testscr_lo = testscr_hi, unequal unpaired, is then used to test
the hypothesis that testscr_lo and testscr_hi come from populations with the same
mean. That is, the command computes the t-statistic for the null hypothesis that the mean
of test scores for districts with class sizes less than twenty students is the same as the
mean of test scores for districts with class sizes equal to or greater than twenty students.
12) generate d = (str<20);
regress testscr d, robust;
The first line creates the binary variable d using the command generate. The variable d
takes a value of 1 if the expression in parentheses is true (that is, when str < 20) and is
equal to 0 if the expression is false.
The second line indicates the command that STATA uses to run an OLS regression with
testscr as the dependent variable and str as the explanatory variable. The option robust
tells STATA to use heteroskedasticity robust formulas for the standard errors of the
regression coefficient estimators.
2
USING DO AND LOG FILES
LOG FILES
To store the results of your subsequent commands, you can create “log” files. To do this,
simply type in log using C:\statafiles\stata1.log to create a log file titled
stata1.log. To view this file, click on “File”, then on “Log”, then on “View” and finally
on “Browse”. You must close your log file before you exit STATA. If you don’t, all the
results will be lost. To close the log file, type log close.
DO FILES
The problem with the above outlined “interactive approach” is that it is cumbersome to
type out the commands over and over again every time you need to use them. It can also
be difficult to fix errors when they occur. These problems can be remedied through the
use of “do” files. To create such files, go to the do-file editor on the toolbar, type in your
commands and then save the file. Suppose the saved file is stata1.do. To execute the file,
click on “File”, then on “Do” and finally pick the required file. The program will then be
executed. If errors occur, the error messages will appear in red. Clear STATA of the
variables, fix the error and re-execute the file.
3
Download