SAS/STAT in ppt format

advertisement
Haas MFE SAS Workshop
Lecture 3:
Peng Liu
http://faculty.haas.berkeley.edu/peliu/computing
Haas School of Business, Berkeley, MFE 2006
Commonly used PROCedures
in Financial Economics
Peng Liu
http://faculty.haas.berkeley.edu/peliu/computing
Haas School of Business, Berkeley, MFE 2006
Basic Statistical Analysis

Univariate statistics
PROC MEANS;
 PROC UNIVARIATE;
 PROC FREQ;


Bivariate and Multivariate Statistics
PROC CORR;
 PROC NPAR1WAY;
 PROC TTEST;

Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 3
Comparison of PROC MEANS
and PROC UNIVARIATE
PROC MEANS



DESCRIPTIVE STATISTICS
CLM CSS CV KURTOSIS LCLM MAX
MEAN MIN N NMISS RANGE SKEWNESS
STD STDERR SUM SUMWGT UCLM USS
VAR
QUANTILE STATISTICS
MEDIAN|P50 Q1|P25 Q3|P75 P1 P5 P10
P90 P95 P99 RANGE
HYPOTHESIS TESTING
PROBT T
PROC UNIVARIATE




Haas School of Business, Berkeley, MFE 2006
DESCRIPTIVE STATISTICS
CSS CV KURTOSIS MAX MEAN MIN
MODE N NMISS RANGE SKEWNESS
STD STDMEAN SUM SUMWGT USS
VAR
QUANTILE STATISTICS
MEDIAN| P1 P5 P10 P90 P95 P99 Q1 Q3
RANGE
QUANTILE STATISTICS
NORMAL PROBN MSIGN PROBM
SIGNRANK PROBS T PROBT
ROBUST STATISTICS
GINI MAD QN SN STD_SINI STD_MAD
STD_QN STD_QRANGE STD_SN
Peng Liu AND Alexander Vedrashko 4
PROC MEANS
PROC MEANS DATA=mfe.loan;
VAR appraisal ltv;
CLASS state;
RUN;






PROC MEANS DATA=mfe.loan max min;
VAR appraisal ltv;
OUTPUT OUT=m
max=maxvalue maxltv
min=minvalue minltv;
RUN;
The default output for PROC MEANS are variable label N Mean Std Dev
Min max
median min max clm alpha=0.05 are examples of options you can
specify.
You can get summary statistics for many variables
CLASS statements will produce summary stat for each grouping
class.
You can suppress print using NOPRINT option
You can save the result in a self-defined sas dataset.
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 5
PROC UNIVARIATE
PROC UNIVARIATE
DATA=mfe.loan ;
VAR ltv; ID id;
RUN;




PROC UNIVARIATE DATA=mfe.loan;
VAR ltv; HISTOGRAM;
QQPLOT /normal;
RUN;
Use VAR to specify which variable you want to
analyze, otherwise, this PROC will produce all
variables
Use ID to identify Extreme Observations, without ID
statement it will use observation number by default
Can plot histogram, quantile-quantile plots etc.
Can do twosided T test, etc.
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 6
PROC FREQ
PROC FREQ DATA=mfe.loan;
TABLE term;
RUN;




PROC FREQ DATA=mfe.loan;
TABLE state state*term/nocol norow;
RUN;
One-way v.s two-way frequency table
/CHISQ or /BINOMIAL option can be used to test
equal proportion
In one TABLE statement, you can produce more
than one frequency tables
You can suppress col percentage or/and row
percentage by option /nocol norow
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 7
PROC CORR
PROC CORR
DATA=mfe.loan;
VAR rate ltv fico_orig;
RUN;



PROC CORR DATA=mfe.loan
COV SPEARMAN;
VAR rate ltv fico_orig;
RUN;
The CORR procedure computes Pearson correlation coefficients,
three nonparametric measures of association (Spearman rankoder correlation, Kendall’s taub and Hoeffding’s measure of
dependence D), and the probabilities associated with these
statistics for numeric variables;
The default is Pearson correlation.
COV option evolke the computation of covariance
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 8
PROC TTEST
DATA;
INPUT a b @@ ;
DATALINES;
51 55 64 61 75 74 86 90
95 93 68 71 73 72 90 95
;
RUN;
PROC TTEST;
PAIRED a*b;
RUN;




DATA step will produce automatic dataset, if user did not specify one.
@@ in INPUT lets SAS continuously read from datelines
DATALINES; is a SAS statement followed by lines of raw data.
Data are typed continuously separated by blank, you can separated into a
different line in the way you like.
 ; should be stand by itself
 PROC step will perform specified procedure on current dataset in working
directory if user did not specify a particular dataset name
 Paired T-Test
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 9
PROC NPAR1WAY
PROC NPAR1WAY DATA=mfe.loan;
CLASS state;
VAR ltv;
RUN;



NONPARAMETRIC TEST FOR DIFFERENCE ACROSS ONE-WAY
CLASSIFICATION.
IF the normality assumption does not hold, we may use some
nonparametric tests.
PROC NPAR1WAY performs nonparametric tests for location and
scale differences across a one-way classiication, based on the
following scores: Wilcoxin, Median, Van Der Waerden, Savage,
Siegel-Tukey, Ansari-Bradley, Klotz, and Modd Scores.
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 10
Financial Econometrics using SAS

Linear Models (OLS, GLS and their variates)
PROC REG
 PROC GLM (Skip)


Logistic Regression
PROC LOGISTIC
 PROC GENMOD


Hazard Regression (Cox-P.H.)

PROC PHREG
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 11
Linear Model: Theory

Data: (yi, xi=(xi1, xi2, …xik)) for i=1, …, n and yi  R

Model: yi = 0+-1xi1+ … + kxik +i for i=1,…,n

For short
where

Assumption: i are i.i.d. normal N(0,2)
Ordinary Least Square Estimation
 = (XTX)-1XTy

y=X+
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 12
PROC REG
PROC REG is a SAS procedure for simple or
multivariate linear regression models with
continuous dependent variables.
Part of SAS/STAT






Model fitting (parameters, residuals, confidence limits,
influential statistics, etc)
Model selection (forward, backward, stepwise, ,etc)
Hypothesis testing
Model diagnostics
Plotting
Outputting estimates and statistics
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 13
PROC REG –Examples
PROC REG
DATA=mfe.loan;
MODEL ltv = rate;
PLOT ltv * rate;
QUIT;
MODEL ltv = rate fico_orig;
OLS:MODEL ltv term= rate fico_orig;
MODEL ltv = rate fico_orig term/SELECTION=F;

Begin with PROC REG; end with QUIT;
Multiple independent , dependent variables are separated by space;
Label “OLS” is optional, useful for multiple MODEL statement in one
PROC REG
By default, a constant is included;

Use /Options to request additional stat or specify model selection method;

PLOT creates a scatter plot of your regression data and
automatically adds the regression line.



Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 14
Logistic Regression– Theory




Data: (yi, xi=(xi1, xi2, …xik)) for i=1, …, n and yi is a
binary or ordinal response variable. e.g. yi  {0,1}
Model:
Maximum Likelihood estimate of 
Assumption: binomial Variation
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 15
Logistic Regression – SAS procedure
SAS has several procedures that performs logistic
regression, e.g. GENMOD, CATMOD and LOGISTIC
PROC LOGISTIC
 Works for binary or ordinal response variables
 Performs MLE using different optimization algorithms
 4 model selection methods: F, B, Stepwise, Score
 Outputs statistics to dataset
 Tests linear hypotheses of parameters
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 16
PROC LOGISTIC –Examples
PROC LOGISTIC DATA=mfe.loan;
CLASS state edu;
MODEL default = ltv age edu term rate state/LINK=LOGIT;
RUN;





Begin with PROC LOGISTIC; end with QUIT;
/LINK=LOGIT option can be ignored, other options: PROBIT,
CLOGIT, CLOGLOG
Use CLASS statement to avoid creating dummy in DATA step
/option can be used to request additional stat, or specify selection
method.
TEST statement
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 17
Survival Analysis – Background 1
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 18
Survival Analysis – Background 2
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 19
Cox Proportional Hazard Regression
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 20
PROC PHREG - Example
PROC PHREG DATA=mfe.loan;
MODEL loanage*prepay(0) = age edu race rate ltv
fico_orig state;
RUN;





Use WHERE option to subset sample to want to regress
You can define, group variables inside PHREG after MODEL
using IF THEN ELSE
Handling tied data: /TIES=EXACT, other option: DISCRETE
Run PHREG for different group, use BY option, need to sort data.
Use CLASS statement to create dummy variables
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 21
Download