Haas MFE SAS Workshop Lecture 3: Peng Liu http://faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006 Commonly used PROCedures in Financial Economics Peng Liu http://faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006 Basic Statistical Analysis Univariate statistics PROC MEANS; PROC UNIVARIATE; PROC FREQ; Bivariate and Multivariate Statistics PROC CORR; PROC NPAR1WAY; PROC TTEST; Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 3 Comparison of PROC MEANS and PROC UNIVARIATE PROC MEANS DESCRIPTIVE STATISTICS CLM CSS CV KURTOSIS LCLM MAX MEAN MIN N NMISS RANGE SKEWNESS STD STDERR SUM SUMWGT UCLM USS VAR QUANTILE STATISTICS MEDIAN|P50 Q1|P25 Q3|P75 P1 P5 P10 P90 P95 P99 RANGE HYPOTHESIS TESTING PROBT T PROC UNIVARIATE Haas School of Business, Berkeley, MFE 2006 DESCRIPTIVE STATISTICS CSS CV KURTOSIS MAX MEAN MIN MODE N NMISS RANGE SKEWNESS STD STDMEAN SUM SUMWGT USS VAR QUANTILE STATISTICS MEDIAN| P1 P5 P10 P90 P95 P99 Q1 Q3 RANGE QUANTILE STATISTICS NORMAL PROBN MSIGN PROBM SIGNRANK PROBS T PROBT ROBUST STATISTICS GINI MAD QN SN STD_SINI STD_MAD STD_QN STD_QRANGE STD_SN Peng Liu AND Alexander Vedrashko 4 PROC MEANS PROC MEANS DATA=mfe.loan; VAR appraisal ltv; CLASS state; RUN; PROC MEANS DATA=mfe.loan max min; VAR appraisal ltv; OUTPUT OUT=m max=maxvalue maxltv min=minvalue minltv; RUN; The default output for PROC MEANS are variable label N Mean Std Dev Min max median min max clm alpha=0.05 are examples of options you can specify. You can get summary statistics for many variables CLASS statements will produce summary stat for each grouping class. You can suppress print using NOPRINT option You can save the result in a self-defined sas dataset. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 5 PROC UNIVARIATE PROC UNIVARIATE DATA=mfe.loan ; VAR ltv; ID id; RUN; PROC UNIVARIATE DATA=mfe.loan; VAR ltv; HISTOGRAM; QQPLOT /normal; RUN; Use VAR to specify which variable you want to analyze, otherwise, this PROC will produce all variables Use ID to identify Extreme Observations, without ID statement it will use observation number by default Can plot histogram, quantile-quantile plots etc. Can do twosided T test, etc. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 6 PROC FREQ PROC FREQ DATA=mfe.loan; TABLE term; RUN; PROC FREQ DATA=mfe.loan; TABLE state state*term/nocol norow; RUN; One-way v.s two-way frequency table /CHISQ or /BINOMIAL option can be used to test equal proportion In one TABLE statement, you can produce more than one frequency tables You can suppress col percentage or/and row percentage by option /nocol norow Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 7 PROC CORR PROC CORR DATA=mfe.loan; VAR rate ltv fico_orig; RUN; PROC CORR DATA=mfe.loan COV SPEARMAN; VAR rate ltv fico_orig; RUN; The CORR procedure computes Pearson correlation coefficients, three nonparametric measures of association (Spearman rankoder correlation, Kendall’s taub and Hoeffding’s measure of dependence D), and the probabilities associated with these statistics for numeric variables; The default is Pearson correlation. COV option evolke the computation of covariance Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 8 PROC TTEST DATA; INPUT a b @@ ; DATALINES; 51 55 64 61 75 74 86 90 95 93 68 71 73 72 90 95 ; RUN; PROC TTEST; PAIRED a*b; RUN; DATA step will produce automatic dataset, if user did not specify one. @@ in INPUT lets SAS continuously read from datelines DATALINES; is a SAS statement followed by lines of raw data. Data are typed continuously separated by blank, you can separated into a different line in the way you like. ; should be stand by itself PROC step will perform specified procedure on current dataset in working directory if user did not specify a particular dataset name Paired T-Test Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 9 PROC NPAR1WAY PROC NPAR1WAY DATA=mfe.loan; CLASS state; VAR ltv; RUN; NONPARAMETRIC TEST FOR DIFFERENCE ACROSS ONE-WAY CLASSIFICATION. IF the normality assumption does not hold, we may use some nonparametric tests. PROC NPAR1WAY performs nonparametric tests for location and scale differences across a one-way classiication, based on the following scores: Wilcoxin, Median, Van Der Waerden, Savage, Siegel-Tukey, Ansari-Bradley, Klotz, and Modd Scores. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 10 Financial Econometrics using SAS Linear Models (OLS, GLS and their variates) PROC REG PROC GLM (Skip) Logistic Regression PROC LOGISTIC PROC GENMOD Hazard Regression (Cox-P.H.) PROC PHREG Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 11 Linear Model: Theory Data: (yi, xi=(xi1, xi2, …xik)) for i=1, …, n and yi R Model: yi = 0+-1xi1+ … + kxik +i for i=1,…,n For short where Assumption: i are i.i.d. normal N(0,2) Ordinary Least Square Estimation = (XTX)-1XTy y=X+ Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 12 PROC REG PROC REG is a SAS procedure for simple or multivariate linear regression models with continuous dependent variables. Part of SAS/STAT Model fitting (parameters, residuals, confidence limits, influential statistics, etc) Model selection (forward, backward, stepwise, ,etc) Hypothesis testing Model diagnostics Plotting Outputting estimates and statistics Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 13 PROC REG –Examples PROC REG DATA=mfe.loan; MODEL ltv = rate; PLOT ltv * rate; QUIT; MODEL ltv = rate fico_orig; OLS:MODEL ltv term= rate fico_orig; MODEL ltv = rate fico_orig term/SELECTION=F; Begin with PROC REG; end with QUIT; Multiple independent , dependent variables are separated by space; Label “OLS” is optional, useful for multiple MODEL statement in one PROC REG By default, a constant is included; Use /Options to request additional stat or specify model selection method; PLOT creates a scatter plot of your regression data and automatically adds the regression line. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 14 Logistic Regression– Theory Data: (yi, xi=(xi1, xi2, …xik)) for i=1, …, n and yi is a binary or ordinal response variable. e.g. yi {0,1} Model: Maximum Likelihood estimate of Assumption: binomial Variation Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 15 Logistic Regression – SAS procedure SAS has several procedures that performs logistic regression, e.g. GENMOD, CATMOD and LOGISTIC PROC LOGISTIC Works for binary or ordinal response variables Performs MLE using different optimization algorithms 4 model selection methods: F, B, Stepwise, Score Outputs statistics to dataset Tests linear hypotheses of parameters Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 16 PROC LOGISTIC –Examples PROC LOGISTIC DATA=mfe.loan; CLASS state edu; MODEL default = ltv age edu term rate state/LINK=LOGIT; RUN; Begin with PROC LOGISTIC; end with QUIT; /LINK=LOGIT option can be ignored, other options: PROBIT, CLOGIT, CLOGLOG Use CLASS statement to avoid creating dummy in DATA step /option can be used to request additional stat, or specify selection method. TEST statement Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 17 Survival Analysis – Background 1 Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 18 Survival Analysis – Background 2 Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 19 Cox Proportional Hazard Regression Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 20 PROC PHREG - Example PROC PHREG DATA=mfe.loan; MODEL loanage*prepay(0) = age edu race rate ltv fico_orig state; RUN; Use WHERE option to subset sample to want to regress You can define, group variables inside PHREG after MODEL using IF THEN ELSE Handling tied data: /TIES=EXACT, other option: DISCRETE Run PHREG for different group, use BY option, need to sort data. Use CLASS statement to create dummy variables Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 21