A SASIIML0 PROGRAM TO SIMULATE DATA WITH NONNORMAL DISTRIBUTIONS Chih-Ping Chou, University of Southern California Shinn-Tzong Wu, University of California, Los Angeles OPTIONS PAGESIZE=58 LlNESIZE=130; A method to simulate nonnormal distributions has been proposed by Reichman (1978). This method generates data based on the conceptualization that distributions of variables can be represented by their first four moments, i.e., mean, variance, skewness, and kurtosis. Fleishman's procedure basically transforms normal deviates, X with N(O,l), to a linear function of X up to its cube for univariatenonnorrnal distribution. More specifically, Y=a+bx+ci+dX'. ..........................•................................. * SIMNNM.SAS *................. ~ ........................................•. Vale and Maurelli (1983) extended Fleishman's procedure to generate multivariate random numbers with pre-specified intercorrelation structure in addition to the first four moments. To incorporate the desired correlation matrix in simulating multivariate nonnorrnal distributions with the first four moments within reasonable range, an intennediate correlation matrix needs to be detennined. This procedure involves solving a third-degree polynomial. A SASIIML program, SIMNNM.SAS, has been developed using Vale and Maurelli's procedure to simulate multivariate nonnonnal data. • Monte Carlo studies using three variables were conducted to evaluate this procedure in generating nonnonnal data. The three variables were assumed to have low, medium, and high correlation, or correlation coefficients of .2, .5, and .8, respectively. Three combinations of skewness and kurtoses were also assumed. Skewness is ranged from 0 to 1 while lrurtoses ranged from () to 3. Nonnormal data based on certain combination of correlation matrix and nannarmal distribution were then simulated with samples sizes of 200, 500, and 1000. A hundred samples, or replications, were genarated for each of the (3 correlation matrices x 3 nonnonnal distribution x 3 sample sizes =) 27 combinations. Based on the means of skewness and kurtoses across 100 replications, this procedure generates nonnonnal data with smaller skewness and mrtoses than the hypothesized values. The correlation coefficients obtained from the simulated data were, however, quite acurate. The biasness on nonnonnal conditions reflected by skewness and kurtoses seemed to be positively related with the size of correlation, especially for kurtoses. The size of nonnonnal condition also have impact on the degree of biasness. Sample sizes, however, did not seem to create significant impact on the biasness. • PROGRAMMER, CHill-PING CHOU & SHINN-TZONG WU * * DATE CREATED, JULY, 1991 • * * * * PURPOSE: THIS SASIIML PROGRAM IS TO SIMULATE DISTRIBUTIONS WITH PRE-SPECIFIED INTERCORRELATION STRUCTURE AS WELL AS MEANS, VARIANCES, SKEWNESS, AND KURTOSES. THE PROCEDURE IS BASED ON VALE & MAURELLI (1983), AND FLEISHMAN (1978) APPEARED IN PSYCHOMETRIKA. * * .................•...........................•.............. "'++++++++++++++++++++++++++++++++ '" USERS SPECIFY • 1. A p.p COVRAIANCE MATRIX. '" 2. A ROW VECTOR OF MEAN. * 3. A ROW VECTOR OF SKEWNESS. * 4. A ROW VECTOR OF KURTOSES. ••• ++++++++++++++++++++++++++++++; DATA A; INPUT VI-V3 @@; CARDS; I .20.20 .20 I .20 .20.20 I 000 References o o Fleishman, A. I., 1978. A method for simulating non-normal distributions. Psychometrika, 43, 521-532. Vale, C. D., & Maurelli, V. A., 1983. Simulating mullivariate non-nonnal distributions. Psychometrika, 48,465-471. .5 .5 .5 PROC IML; START FUNSK(B,D,SOK,EQ,F); ···PRINT "BEGINNING FUNSK ROUTINE. "; BB=B+B; DD=D+D; BD=B·O; CC ~0.S*(I-(BB+6*BD + IS*DD»; IF (EQ~ I) THEN F=4.0*CC*(BB+24*BD+ I05·DD+2)"'·2-S0K*SOK; ELSE F=24"'(BO +CC*(l + BB+ 28*BD)+ DD"'(12+48"'BD+ 141"'CC+225 1122 B2~COE[2,B]; 'DD»-SOK; FINISH; CI ~COE[3,A]; C2~COE[3,B]; D1 ~COE[4,A]; START FLEISH(NN ,NV ,X,R,SKEW,KURT); PRINT 'BEGINNING FLEISH ROUTINE.'; EPS= 1.0/(10.0··10); D2~COE[4,B]; D6~6#DI#D2; RC[K,I]~(l.#CI#C2)/D6; COE~J(4,NV,O); RC[K,2]~(BI#B2+3#BI#D2+3#DI#B2+9#DI#D2)1D6; DOA~ITONV; RC[K,3] ~-R[A,B]/D6; SK~SKEW[I,AJ; END; KU~KURT[I,A]; DO E~I TO A; IF (E < A) THEN DO; IF (SK~SKEW[I,E] & END; Q~(RC[,I]##2-3#RC[,2])/9; KU~KURT[I,EJ) R~(l.#RC[,I]##3-9#RC[,I]#RC[,2]+27#RC[,3ll/54; THEN DO; B~COE[2,E]; AD3~RC[,I]I3; D~COE[4,E]; RC~J(PI,3>999); GO TO SKDONE; QR~Q##3-R##2+EPS; QR~(QR>O); END; END; DO A ~ I TO PI; IF QR[A,I] >0 THEN DO; END; B= 1.0. TH~ARCOS(R[A,I]/SQRT(Q[A,I]##3»; D~O.OI; QT2~2#SQRT(Q[A,IJ); K=O; RC[A,I] ~-QT2#COS(THl3); RC[A,2] ~-QT2#COS«TH + 2#PI)I3); RC[A,3] ~-QT2#COS«TH +4#PI)I3); UPDATE, K=K+l; IF (K > 50) THEN DO; PRINT 'FLEISHMAN COEFFICIENTS NOT FOUND.'; RETURN; END; RUN FUNSK(B,D,SK,I,FSO); RUN FUNSK(B,D,KU,2,FKO); IF(ABS(FSO)<~EPS & ABS(FKO)<~EPS) THEN GO TO SKDONE; EPB=EPS*B; EPD=EPS"'D; END; ELSE DO; TH ~(SQRT(R[A,I]##2-Q[A, 1]##3)+ ABS(R[A,I]))##(1.0/3); SGNR~I; IF R[A,I] <0 THEN SGNR~-I; RC[A,I]~-SGNR#(TH +Q[A,I]rrH); END; END; RC~RC-AD3·J(I,3,1); T~ABS(RC); BU~B+EPB; T~T[,>,<]; DU~D+EPD; RHO~J(NV ,NV ,0); RUN FUNSK(BU,D,SK,I,FSB); RUN FUNSK(B,DU,SK,I,FSD); K=O; DOA ~ 2TO NV; AI=A-l; DO B ~ I TO AI; DSB~(FSB-FSO)IEPB; DSD~(FSD-FSO)/EPD; RUN FUNSK(BU,D,KU,2,FKB); RUN FUNSK(B,DU,KU,2,FKD); K=K+l; DKB~(FKB-FKO)/EPB; TT~RC[K,L]; L~T[K,I]; DKD ~ (FKD-FKO)/EPD; DELTA=DSB"'DKD-DSO+DKB; DTB ~ (FSO'DKD-FKO·DSD)/DELTA; RHO[A,B] ~TT; END; END; DTD~(FKO·DSB-FSO·DKB)IDELTA; RHO~RHO+RHO'+I(NV); B~B-DTB; U~ROOT(RHO); D~D-DTD; X=X"'U; T~J(NN,I,I); GO TO UPDATE; SKDONE, COE[2,A] ~ B; Y - X#(T"COE[2,J) + X#X#(T·COE[3,ll + X#X#X#(T·COE[4,ll; X~Y+T·COE[I,]; COE[4,A]~D; PRINT FINISH; COE(3,A] =O.5+SKI(B+B+ 24"'8"'0+ 105+0+0+2); ~FLEISHMAN TRANSFORMATION COMPLETED. to; COE[I,A]~-COE[3,A]; END; START READlN(SS,NN,NV,SEED); PRINT 'BEGINNING READIN ROUTINE.'; PI~3.1415926536; PI ~NV·(NV-1)/2; NR~NROW(SS); RC~J(PI,3,O); NC~NCOL(SS); K~O; NV~NC; DOA ~ 2TONV; AI=A-l; DO B ~ I TO AI; PRINT NR NC SS; NN~NR; RETURN; END; ELSE DO; K~K+I; B1 ~COE[2,A]; 1123 JJ~J(I,NV,O); MEAN~JJ; VAR =JJ; SKEW=JJ; KURT~JJ; IF«NR-NC) > ~ I) THEN MEAN ~SS[NV +\,]; IF«NR-NC)> ~2) THEN SKEW~SS[NV+2,]; IF«NR-NC»~3)THEN KURT~SS[NV+3,]; SS~SS[I;NV,]; V AR ~ DIAG(SS); SQTIVAR~SQRT(I.Of(VECDIAG(V AR))); DiV ~ DIAG(SQTIVAR); R=DIV*SS·DIV; SS~J(NN,NV,SEED); SS~NORMAL(SS); IF (SKEW~JJ & KURT~JJ) THEN DO; U~ROOT(R); SS=SS·U; END; ELSE DO; PRINT "DATA ARE NOT NORMAL"; RUN FLEISH(NN,NV,SS,R,SKEW,KURT); END; SS~SS'SQRT(VAR)+J(NN,I,i)"MEAN; ·PRINT SSt END; FINISH; START MAIN; PRINT "BEGINNING MAIN ROUTINE."; USE A; READ ALL INTO SS; PRINT "DATA READ INTO SS."; *++++++++++++++++++++++++++++++++ • • SPECIFY THE FOLLOWING PARAMEfERS FOR READIN • SUBROUTINE • I. SS;TOKEEPTHESIMUALTEDDATAMATRIX. • 2. NUMBER OF CASES. • 3. SEED TO GENERATE RANDOM NUMBERS. • *++++++++++++++++++++++++++++++++; RUN READIN(SS,IOOOO,I23456789); CREATE B FROM SS; APPEND FROM SS; FINISH; RUN MAIN; DATAC; SETB; PROC MEANS MEAN VAR SKEWNESS KURTOSIS MIN MAX; PROCCORR; RUN; 1124