Econometrics Stata Good introduction

Sacha Kapoor - Masters Metrics
Address: Max Gluskin House, 150 St.George, Rm 329
Email: [email protected]
Where do you get data? Erasmus has a data service center. The center gives students access to various
data sets. It’s website is found here: :
The EDSC is really helpful for downloading data, and for help with the data you need. You can contact
the EDSC data team for help with your data needs.
There are many different types of data:
• Financial markets data:
– CRSP Database - access NYSE/AMEX/Nasdaq daily and monthly security prices and other
historical data related to over 20,000 companies
– Canadian Financial Markets Research Centre Toronto stock exchange trading info about specific securities Fundata Mutual Fund Database
• Companies financial data:
– Financial Post Corporate Database
– COMPUSTAT Database - Income Statement, Balance Sheet, Flow of Funds, and supplemental
data items on more than 10,000 active and 9,400 inactive companies
• National income statistics:
– OECD National Accounts Database
– World Bank databases
– Penn World Tables
It is trend for economic journals to post the data used in the articles they publish. You should make
use of these websites. It is a great source for data that you can use in your thesis. For example, the
American Economic Association publishes several journals that have empirical articles. Visit this website, click on Journals, then on American Economic Review or Americal Economic
Journal: Economic Policy or American Economic Journal: Applied Economics. Look through the papers
at these journals. Many will have a data folder attached to their paper. In the data folder you can find a
Readme file. That file should tell you more about the availability of the data. If the data is unavailable,
the Readme file will tell you. Otherwise you can assume that it is available.
What is Stata?
• A high level general purpose statistical software package (built on a C environment), with lots of
built in functions.
– Caveat: Functions are not substitutes for understanding.
Sacha Kapoor - Masters Metrics
• 3 ways to use Stata:
– Interactively, through the command prompt (enter the commands one by one).
– Batch files, by collecting commands and running them all at once.
– Point and Click.
How to collect commands? Use a do file.
To track results/output you should use a log file:
cd ../../../../Documents/TA/2010-2011/Masters_Metrics
log using "tutorial_091610.log", replace
where the first command changes the working directory to the data location and the second command
opens the log file. To examine the current working directory:
To import comma delimited data (.csv) use the insheet command:
insheet using "S&P_data.csv"
To examine attributes of the data:
Another way to obtain the same information and more:
Note that in Stata 11, as opposed to previous versions, you can run commands and have the editor open
at the same time. Before proceeding label the data and variables:
label data "S&P (01-31-80 to 12-31-99)"
label variable eps "Earnings per share"
label variable price "Price per share"
label variable weather "Weather"
To convert the data into Stata format:
save sacha_S&P.dta, replace
To import data already in Stata format use the ‘use’ command:
use sacha_S&P.dta
To destring the date variable, let’s try:
Sacha Kapoor - Masters Metrics
destring date, replace
list date in 1/10
destring date, force replace
list date in 1/10
Two issues: 1. missing data; 2. proper command for destringing dates. To deal with the first problem
take the necessary precautions in your preamble:
use "sacha_S&P.dta"
destring date, force replace
list date in 1/10
list date in 1/10
To deal with the second problem:
generate date2 = date(date,"MDY")
list date2 in 1/10
Now let’s tell Stata that this is a time series:
tsset date2, monthly
To extract more detailed date information:
generate year = year(date2)
generate month = month(date2)
generate day = day(date2)
label variable year "Year"
label variable month "Month"
label variable day "Day"
list in 1/10
To drop variables:
drop day
To keep variables:
keep year
To drop observations 5 through 15.
Sacha Kapoor - Masters Metrics
drop in 5/15
Let’s restore the data:
Still on the topic of time series data, to generate a trend:
generate x = _n
list x in 1/10
To generate lags (for x):
generate x_1 = x[_n-1]
replace x_1=0 if x_1==.
Let’s take a closer look at the weather variable.
des weather
edit weather
One way to turn this into a dummy variable:
generate weather2 = 0
replace weather2=1 if weather =="yes"
replace weather2=0 if weather =="no"
list weather2 in 1/10
Notice how the replace command conditions on a logical expression. For future reference conditional
statements can involve any one of the following:
• <, less than
• >, greater than
• <=, less than or equal to
• >=, greater than or equal to
• ==, equal to in a logical expression
• ∼=, not equal to in a logical expression
Some Basic (Mostly) Statistical Commands
To check the current memory allocation:
help memory
To set a new allocation:
set memory 100
Sacha Kapoor - Masters Metrics
Note that the set command can be used to change many basic defaults in Stata. I always begin investigations with the following command:
tabulate weather
Why is it nonsensical to tabulate price?
tabulate price
To present continuous data:
histogram price
An even better way:
histogram price, kdensity
Compare this with:
histogram eps, kdensity
Coarser evidence is obtained with the following command:
summarize price eps
To include a summary of a categorical variable we can use the ‘xi’ environment:
xi: summarize price eps
To calculate means for price and eps under good and bad weather:
by weather, sort: summarize price eps
To summarize a subset of values:
summarize price if price <=150
To collapse the data and create a new dataset:
collapse(mean) price, by (weather)
save "price.dta", replace
To test the hypothesis that price=150, with 95 percent confidence:
ttest price=150, level(95)
To test the equality of means:
gen price_g = price if weather2==1
gen price_b = price if weather2==0
ttest price_g = price_b, unequal unpaired
Sacha Kapoor - Masters Metrics
Suppose our interest is in the relationship between price and eps:
twoway(scatter price eps)
twoway(scatter price eps) || lfit price eps
Fitting a line through these points is equivalent to:
regress price eps
Controls are easy to add:
regress price eps x
The ‘xi’ environment works here as well:
xi: regress price eps x
One way to deal with persistence in the dependent variable:
generate price_1 = price[_n-1]
xi: regress price eps x price_1
Merging Data Sets
Let’s access online data from
webuse odd
webuse even1
Merges can be one-to-one
merge using
or can match observations across datasets
webuse even1, clear
merge number using, sort
Sacha Kapoor - Masters Metrics
Let’s generate data:
set obs 100
To create a variable with draws from a uniform distribution:
generate y = runiform()
list y in 1/10
To generate many variables with draws from the uniform distribution:
forvalues i = 1(1)100{
generate x‘i’ = runiform()
Note: (1) gives the increment, the loop generates 100 uniform random variables over (0,1). To check for
consistency of an estimator:
webuse census2, clear
generate x = rnormal(1000,100)
generate e = rnormal()
list x e in 1/10
generate y = 100+1*x + e
regress y x
Panel Data
Tell Stata you have a panel:
xtset distid year
To run regressions using panel data:
xtreg math4 y93 y94 y95 y95 y96 y97 y98 lrexpp lenrol lunch, fe
xtreg math4 y93 y94 y95 y95 y96 y97 y98 lrexpp lenrol lunch, fe robust
To obtain predictions for the dependent variable and residuals, respectively:
predict yhat
predict resid
To compare predictions with actual values:
edit yhat math4
To close the log file:
log close
Related flashcards

Regression analysis

25 cards

Create Flashcards