WRDS SAS User Guide - West Virginia University

advertisement
WRDS
SAS User Guide
West Virginia University
Available Data in WRDS
By Subject
Data
Contents
CRSP
Security price, return, and volume data for main indexes
Dow Jones
Dow Jones averages and total return
CBOE
Key measure of market expectations of volatility
Compustat North
America
U.S. and Canadian accounting and market information
Compustat Global
Financial and accounting data of publicly traded companies
SEC
Information about Disclosure of Order Execution Statistics
Bank Regulatory
Financial information about U.S. banking institutions
TRACE
Transaction data for all eligible corporate bonds
CRSP Treasury
Historical info and market data including yields, and durations
Interest Rate
FRB
Databases collected from Federal Reserve Banks
Mergers & Acquisition
Bank Regulatory
Merger information concerning U.S. banking institutions
Currency Option
PHLX
Philadelphia Stock Exchange's United Currency Options Market
Marketing
DEMF
Customer buying history
Ownership
Blockholders
Standardized data for blockholders
Stocks
Fundamentals
Bank
Bond
Advantage for using SAS
• WRDS is built using SAS data sets, so manipulating
data through SAS is easier than almost any other
querying tool.
• Any combination of two databases can be
constructed.
• The web interface deletes observations for which the
chosen variables have missing values and there is no
simple way of finding out what observations were
deleted
SAS Sample programs
• For simple SAS codes:
► SAS Sample Programs on WRDS Info Home
• For advanced SAS codes:
► Support → WRDS Datasets and
Sample program → SAS
► Support → Research Applications
Connect to WRDS
%let wrds = wrds.wharton.upenn.edu 4016; options
comamid=TCP remote=wrds; signon username=_prompt_;
rsubmit;
*------------------- * * your code here *------------------------- *;
endrsubmit;
► Note: you always need this code since it has
SAS connected to WRDS.
Autoexe.sas
data _NULL_;
file 'autoexec.sas';
put "%include '!SASROOT/wrdslib.sas';";
run;
► A list of important libnames already is assigned
by WRDS through this statement.
► You may run this code only when libname error
happens.
Libname
• SAS library names are already defined in all user accounts of
Unix.
• For example,
► Bank regulatory bank: /wrds/bank/sasdata
► Compustat: comp /wrds/compustat/sasdata
► CRSP CCM: crsp /wrds/crsp/sasdata/cc
► CRSP Monthly stock: crsp /wrds/crsp/sasdata/sm
• Unix home directory:
► Temp /home/wvu/min06/temp
Data set
• To set up data steps, use the LIBNAME statement and then
name the dataset. This is enough to create it. Example:
• For CRSP monthly stock file:
► set crsp.msf
• For Compustat Industrial Fundamental file:
► set comp.ina
• All data set are from SAS data files stored in Unix.
► A good way to fix your error is checking
variables and directory name of SAS files
in Unix.
Finding variables
• Web – Based:
► Documents
► Tools (Searching variables)
• Using SAS:
► proc contents data=crsp.dsf;
Finding identifiers
• Web – Based:
► Code lookup
► Tools
• Using SAS:
► For example, to find identifiers in Compustat;
data names;
set comp.namesann;
where coname contains 'IBM' or SMBL contains 'IBM'; run;
proc print data = names; run;
► For CRSP: the file name is “stocknames”.
• Using Unix command: grep
a. Merge CRSP / Compustat
using CUSIP
When merging two databases, we need a common
ID
• Best way is to match them with CUSIP:
► Names and Tickers are problematic since they
change though time, can be re-used. and therefore
have different entries in different databases.
► CUSIP changes through time but are not re-used.
► should be historical one (NCUSIP)
Understanding CUSIP
• Example for IBM
459200 10 1
Compustat:
► CNUM=459200 (6 digits of CUSIP)
CRSP:
► CUSIP=45920010
Matching identifiers
Database
Ticker
CUSIP
GVKEY
PERMNO
CRSP
YES
CUSIP NCUSIP
NO
YES
Main identifier
COMPUSTAT
YES(SMBL)
CNUM
YES
Main identifier
NO
► To create a common identifier (cnum), we use CUSIP
and subtract 6 digits from it.
Step 1. Headers from Compustat
• From Compustat header file “namesann”:
► Find “cnum” and “gvkey” for IBM
► Then exclude missing data
► Sort data by cnum
• proc sort data=comp.namesann(keep = gvkey cnum) out=comp nodupkey;
where missing(cnum)=0 where smbl in IBM;
by cnum gvkey;
run;
Step 2. Headers from CRSP
• From CRSP header file “stocknames”;
► Find “ncusip” and “permno” for IBM
► Then exclude missing data
► Sort data by ncusip
► Define the output “mse”
• proc sort data=crsp.stocknames(keep = permco ncusip) out=mse
nodupkey;
where missing(ncusip)=0;
by permco ncusip;
run;
Step 3. Creating cnum from ncusip
• Create 6 digits identifier (cnum) from ncusip in order CRSP
and Compustat to be matched on cnum:
► Using functions “length” and “subtr”,
create cnum from ncusip in “mse”
► Sort data by cnum
► Define the output as “mse3”
data mse2;
length cnum $6.;
set crsp.mse;
cnum= substr(ncusip,1,6); run;
proc sort data=mse2 out=mse3(keep = permco cnum) nodupkey;
by cnum permco; run;
Step 4. Merging
► Create temporary variables “aa” and “bb” using option “in” in order to
track whether that data set contributed to the current observation
data joint2;
merge comp(in=aa) mse3(in=bb);
by cnum;
/* Create Dummies to test source of merging*/
if aa=1 then compustat=1;
else compustat=0;
if bb=1 then crsp=1;
else crsp=0; run;
b. Extract data from CCM
Concepts needed:
• Historical identifier (NPERMNO)
• Linking file (cstlink2)
► see CCM guide
• Most of the SAS procedures on WRDS use SQL
► see SQL references
Step 1. Libname & Years
• Libname: ‘wrds/crsp/sasdata/cc’
• Sepecify beginning and ending years:
%let beg_yr = 1995;
%let end_yr = 2003;
Step 1
BEGFYR
ENDFYR
Step 2. Link file (cstlink2)
Specify link information:
► Create a data (temp1) which is set on “cstlink2”
/* data temp1;
set crsp.CSTLINK2;
run; */
► Select link types and link dates (1995 < date < 2004)
► Sort data by GVKEY
► Define the output as “link”
/* proc sort data=temp1 out=lnk;
where NPERMNO in (12490, 11081, 10107) and
LINKTYPE in ("LU", "LC", "LD", "LF", "LN", "LO", "LS", "LX") and
(&end_yr+1 >= year(LINKDT) or LINKDT = .B) and
(&beg_yr-1 <= year(LINKENDDT) or LINKENDDT = .E);
by GVKEY LINKDT; run; */
Step 2
LINKDT
BEGFYR
LINKENDDT
ENDFYR
Step 3.a
A,B part → (LINKDT <= FYENDDT or LINKDT = .B) and
(FYENDDT <= LINKENDDT or LINKENDDT =.E)
B part →(LINKDT <= FYBEGDT or LINKDT = .B) and
(LINKENDDT >= FYENDDT or LINKENDDT = .E)
A,B,C part → (LINKDT <= FYENDDT or LINKDT = .B) and
(LINKENDDT >= FYBEGDT or LINKENDDT = .E)
LINKDT
BEGFYR
LINKENDDT
ENDFYR
A
Step 3
BEGFYR
B
ENDFYR
C
Step 3b. Specify overlapping periods
► Create table (defined “mydata”) which has following variables from the file
“link”
► Name “crsp.CSTANN” as “cst” (* CSTANN is a file which contains all
compustat data)
► Specify date requirements. (Select A, B or C)
► With GVEKY we found, extract data we need from “CSTANN“ by the
corresponding GVKEY (lnk.GVKEY = cst.GVKEY)
proc sql;
create table mydata(keep=GVKEY NPERMNO NPERMCO SMBL YEARA LINKDT
LINKENDDT LINKTYPE DATA6) as select *
from lnk, crsp.CSTANN as cst
where lnk.GVKEY = cst.GVKEY and
(&beg_yr <= YEARA <= &end_yr) and
(LINKDT <= cst.FYENDDT or LINKDT = .B) and (cst.FYENDDT <= LINKENDDT
or LINKENDDT = .E);
quit;
References
To see all references and SAS programs:
► http://www.be.wvu.edu/wrds/home/index.html
Download