Sugi-92-190 Chou Wu

advertisement
A SASIIML0 PROGRAM TO SIMULATE DATA WITH NONNORMAL DISTRIBUTIONS
Chih-Ping Chou, University of Southern California
Shinn-Tzong Wu, University of California, Los Angeles
OPTIONS PAGESIZE=58 LlNESIZE=130;
A method to simulate nonnormal distributions has been proposed by
Reichman (1978).
This method generates data based on the
conceptualization that distributions of variables can be represented by
their first four moments, i.e., mean, variance, skewness, and kurtosis.
Fleishman's procedure basically transforms normal deviates, X with
N(O,l), to a linear function of X up to its cube for univariatenonnorrnal
distribution.
More specifically, Y=a+bx+ci+dX'.
..........................•.................................
*
SIMNNM.SAS
*.................
~
........................................•.
Vale and
Maurelli (1983) extended Fleishman's procedure to generate multivariate
random numbers with pre-specified intercorrelation structure in addition
to the first four moments. To incorporate the desired correlation matrix
in simulating multivariate nonnorrnal distributions with the first four
moments within reasonable range, an intennediate correlation matrix
needs to be detennined. This procedure involves solving a third-degree
polynomial. A SASIIML program, SIMNNM.SAS, has been developed
using Vale and Maurelli's procedure to simulate multivariate nonnonnal
data.
•
Monte Carlo studies using three variables were conducted to
evaluate this procedure in generating nonnonnal data. The three
variables were assumed to have low, medium, and high correlation, or
correlation coefficients of .2, .5, and .8, respectively.
Three
combinations of skewness and kurtoses were also assumed. Skewness
is ranged from 0 to 1 while lrurtoses ranged from () to 3. Nonnormal
data based on certain combination of correlation matrix and nannarmal
distribution were then simulated with samples sizes of 200, 500, and
1000. A hundred samples, or replications, were genarated for each of
the (3 correlation matrices x 3 nonnonnal distribution x 3 sample sizes
=) 27 combinations. Based on the means of skewness and kurtoses
across 100 replications, this procedure generates nonnonnal data with
smaller skewness and mrtoses than the hypothesized values. The
correlation coefficients obtained from the simulated data were, however,
quite acurate. The biasness on nonnonnal conditions reflected by
skewness and kurtoses seemed to be positively related with the size of
correlation, especially for kurtoses. The size of nonnonnal condition
also have impact on the degree of biasness. Sample sizes, however, did
not seem to create significant impact on the biasness.
•
PROGRAMMER,
CHill-PING CHOU & SHINN-TZONG WU
*
*
DATE CREATED, JULY, 1991
•
*
*
*
*
PURPOSE:
THIS SASIIML PROGRAM IS TO SIMULATE
DISTRIBUTIONS WITH PRE-SPECIFIED
INTERCORRELATION STRUCTURE AS WELL AS
MEANS, VARIANCES, SKEWNESS, AND KURTOSES.
THE PROCEDURE IS BASED ON VALE & MAURELLI
(1983), AND FLEISHMAN (1978) APPEARED IN
PSYCHOMETRIKA.
*
*
.................•...........................•..............
"'++++++++++++++++++++++++++++++++
'" USERS SPECIFY
•
1. A p.p COVRAIANCE MATRIX.
'"
2. A ROW VECTOR OF MEAN.
* 3. A ROW VECTOR OF SKEWNESS.
* 4. A ROW VECTOR OF KURTOSES.
••• ++++++++++++++++++++++++++++++;
DATA A;
INPUT VI-V3 @@;
CARDS;
I .20.20
.20 I .20
.20.20 I
000
References
o
o
Fleishman, A. I., 1978. A method for simulating non-normal
distributions. Psychometrika, 43, 521-532.
Vale, C. D., & Maurelli, V. A., 1983. Simulating mullivariate
non-nonnal distributions. Psychometrika, 48,465-471.
.5 .5
.5
PROC IML;
START FUNSK(B,D,SOK,EQ,F);
···PRINT "BEGINNING FUNSK ROUTINE. ";
BB=B+B;
DD=D+D;
BD=B·O;
CC ~0.S*(I-(BB+6*BD + IS*DD»;
IF (EQ~ I) THEN
F=4.0*CC*(BB+24*BD+ I05·DD+2)"'·2-S0K*SOK;
ELSE
F=24"'(BO +CC*(l + BB+ 28*BD)+ DD"'(12+48"'BD+ 141"'CC+225
1122
B2~COE[2,B];
'DD»-SOK;
FINISH;
CI ~COE[3,A];
C2~COE[3,B];
D1 ~COE[4,A];
START FLEISH(NN ,NV ,X,R,SKEW,KURT);
PRINT 'BEGINNING FLEISH ROUTINE.';
EPS= 1.0/(10.0··10);
D2~COE[4,B];
D6~6#DI#D2;
RC[K,I]~(l.#CI#C2)/D6;
COE~J(4,NV,O);
RC[K,2]~(BI#B2+3#BI#D2+3#DI#B2+9#DI#D2)1D6;
DOA~ITONV;
RC[K,3] ~-R[A,B]/D6;
SK~SKEW[I,AJ;
END;
KU~KURT[I,A];
DO E~I TO A;
IF (E < A) THEN DO;
IF (SK~SKEW[I,E] &
END;
Q~(RC[,I]##2-3#RC[,2])/9;
KU~KURT[I,EJ)
R~(l.#RC[,I]##3-9#RC[,I]#RC[,2]+27#RC[,3ll/54;
THEN DO;
B~COE[2,E];
AD3~RC[,I]I3;
D~COE[4,E];
RC~J(PI,3>999);
GO TO SKDONE;
QR~Q##3-R##2+EPS;
QR~(QR>O);
END;
END;
DO A ~ I TO PI;
IF QR[A,I] >0 THEN DO;
END;
B= 1.0.
TH~ARCOS(R[A,I]/SQRT(Q[A,I]##3»;
D~O.OI;
QT2~2#SQRT(Q[A,IJ);
K=O;
RC[A,I] ~-QT2#COS(THl3);
RC[A,2] ~-QT2#COS«TH + 2#PI)I3);
RC[A,3] ~-QT2#COS«TH +4#PI)I3);
UPDATE,
K=K+l;
IF (K > 50) THEN DO;
PRINT 'FLEISHMAN COEFFICIENTS NOT FOUND.';
RETURN;
END;
RUN FUNSK(B,D,SK,I,FSO);
RUN FUNSK(B,D,KU,2,FKO);
IF(ABS(FSO)<~EPS &
ABS(FKO)<~EPS) THEN GO TO
SKDONE;
EPB=EPS*B;
EPD=EPS"'D;
END;
ELSE DO;
TH ~(SQRT(R[A,I]##2-Q[A, 1]##3)+ ABS(R[A,I]))##(1.0/3);
SGNR~I;
IF R[A,I] <0 THEN
SGNR~-I;
RC[A,I]~-SGNR#(TH +Q[A,I]rrH);
END;
END;
RC~RC-AD3·J(I,3,1);
T~ABS(RC);
BU~B+EPB;
T~T[,>,<];
DU~D+EPD;
RHO~J(NV ,NV ,0);
RUN FUNSK(BU,D,SK,I,FSB);
RUN FUNSK(B,DU,SK,I,FSD);
K=O;
DOA ~ 2TO NV;
AI=A-l;
DO B ~ I TO AI;
DSB~(FSB-FSO)IEPB;
DSD~(FSD-FSO)/EPD;
RUN FUNSK(BU,D,KU,2,FKB);
RUN FUNSK(B,DU,KU,2,FKD);
K=K+l;
DKB~(FKB-FKO)/EPB;
TT~RC[K,L];
L~T[K,I];
DKD ~ (FKD-FKO)/EPD;
DELTA=DSB"'DKD-DSO+DKB;
DTB ~ (FSO'DKD-FKO·DSD)/DELTA;
RHO[A,B] ~TT;
END;
END;
DTD~(FKO·DSB-FSO·DKB)IDELTA;
RHO~RHO+RHO'+I(NV);
B~B-DTB;
U~ROOT(RHO);
D~D-DTD;
X=X"'U;
T~J(NN,I,I);
GO TO UPDATE;
SKDONE,
COE[2,A] ~ B;
Y - X#(T"COE[2,J) + X#X#(T·COE[3,ll + X#X#X#(T·COE[4,ll;
X~Y+T·COE[I,];
COE[4,A]~D;
PRINT
FINISH;
COE(3,A] =O.5+SKI(B+B+ 24"'8"'0+ 105+0+0+2);
~FLEISHMAN
TRANSFORMATION COMPLETED. to;
COE[I,A]~-COE[3,A];
END;
START READlN(SS,NN,NV,SEED);
PRINT 'BEGINNING READIN ROUTINE.';
PI~3.1415926536;
PI ~NV·(NV-1)/2;
NR~NROW(SS);
RC~J(PI,3,O);
NC~NCOL(SS);
K~O;
NV~NC;
DOA ~ 2TONV;
AI=A-l;
DO B ~ I TO AI;
PRINT NR NC SS;
NN~NR;
RETURN;
END;
ELSE DO;
K~K+I;
B1 ~COE[2,A];
1123
JJ~J(I,NV,O);
MEAN~JJ;
VAR =JJ;
SKEW=JJ;
KURT~JJ;
IF«NR-NC) > ~ I) THEN MEAN ~SS[NV +\,];
IF«NR-NC)> ~2) THEN SKEW~SS[NV+2,];
IF«NR-NC»~3)THEN KURT~SS[NV+3,];
SS~SS[I;NV,];
V AR ~ DIAG(SS);
SQTIVAR~SQRT(I.Of(VECDIAG(V AR)));
DiV ~ DIAG(SQTIVAR);
R=DIV*SS·DIV;
SS~J(NN,NV,SEED);
SS~NORMAL(SS);
IF
(SKEW~JJ
&
KURT~JJ)
THEN DO;
U~ROOT(R);
SS=SS·U;
END;
ELSE DO;
PRINT "DATA ARE NOT NORMAL";
RUN FLEISH(NN,NV,SS,R,SKEW,KURT);
END;
SS~SS'SQRT(VAR)+J(NN,I,i)"MEAN;
·PRINT SSt
END;
FINISH;
START MAIN;
PRINT "BEGINNING MAIN ROUTINE.";
USE A;
READ ALL INTO SS;
PRINT "DATA READ INTO SS.";
*++++++++++++++++++++++++++++++++
•
• SPECIFY THE FOLLOWING PARAMEfERS FOR READIN
• SUBROUTINE
• I. SS;TOKEEPTHESIMUALTEDDATAMATRIX.
• 2. NUMBER OF CASES.
• 3. SEED TO GENERATE RANDOM NUMBERS.
•
*++++++++++++++++++++++++++++++++;
RUN READIN(SS,IOOOO,I23456789);
CREATE B FROM SS;
APPEND FROM SS;
FINISH;
RUN MAIN;
DATAC;
SETB;
PROC MEANS MEAN VAR SKEWNESS KURTOSIS MIN MAX;
PROCCORR;
RUN;
1124
Download