ppt slides

advertisement
BSTA 670 – Statistical Computing
27 October 2010
Lecture 14:
Random Number Generation
Presented by:
Paul Wileyto, Ph.D.
A good analog
random number
generator.
Anyone who uses software to produce random
numbers is in a state of sin.
John von Neumann
Why do we need Random
Numbers?
• Simulation Input
– Statistical Sampling
• Assignment in Trials
• Games
Where do you get random numbers?
• Uniform Random Numbers
– Published Tables
– Make Them
• Computer Algorithms
• Harvest from Nature
• Random draws from a specific
distribution
– Make them from Uniform Random
Numbers
Software Random Number Generators
• There are no true random number
generators but
• There are Pseudo-Random Number
Generators
– Computers have only a limited number of
bits to represent a number
– Sooner or later, the sequence of random
numbers will repeat itself (period of the
generator)
– The trick is to be “good enough” to look like
random numbers
Algorithms for Uniform Random Numbers
Good pseudo-random numbers:
• Independent of the previous number
• Long period
• Sequence reproducible if started with
same initial conditions
• Fast
Good pseudo-random numbers:
• Equal probability for any number inside
interval [a,b]
Probability Density:
 1
, a xb

f ( x)   b  a
 0, x  a, x  b
We are interested primarily in
uniform random numbers in the
interval [0,1].
• We’ll refer to the realization of a uniform
random number over [0,1] as U.
• Many of the algorithms produce integer
valued random numbers over interval
[0,b].
– Transform to interval [0,1]
Linear Congruential Generator (LCG)
Mod in SAS
Most common
X n  ( X n-1  c) mod m
X o = seed, modulus m (large prime),
muliplier  , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into the desired range.
proc iml; /* begin IML session */
q={20,30,40,50,70,90,160};
t=mod(q,7);
qt=q||t;
print qt; /* print matrix */
quit;
qt
20
30
40
50
70
90
160
6
2
5
1
0
6
6
SAS
Linear Congruential Generator (LCG)
Mod in R
Most common
X n  ( X n-1  c) mod m
X o = seed, modulus m (large prime),
muliplier  , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into the desired range.
q<-matrix(seq(10,100, by=10),10,1)
qm=q%%13
qall<-cbind(q,qm)
qall
[,1] [,2]
[1,]
10
10
[2,]
20
7
[3,]
30
4
[4,]
40
1
[5,]
50
11
[6,]
60
8
[7,]
70
5
[8,]
80
2
[9,]
90
12
[10,] 100
9
R
Unit Random Variates in SAS
proc iml; /* begin IML session */
seed = 123456;
c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
SAS
Unit Random Variates in R
RNGkind()
[1] "Mersenne-Twister" "Inversion"
set.seed(as.integer(format(Sys.time(), "%S%M%H")))
c<-matrix(runif(5),5,1)
c
[,1]
[1,] 0.9919911
[2,] 0.2598466
[3,] 0.1818524
[4,] 0.3357782
[5,] 0.2754353
R
Unit Random Variates in SAS
proc iml; /* begin IML session */
seed = 0;
c = j(5,1,seed);
b = uniform(c);
Set seed to 0 to
print b;
grab a seed value
quit;
from the system
clock.
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
SAS
RANUNI() and IML UNIFORM() use a multiplicative
linear congruential generator (from SAS docs) where
SEED = mod( SEED * 397204094, 2**31-1 )
and then returns
SEED / (2**31-1)
SAS
Testing Randomness
• Is it Uniform?
4
300
3
250
2.5
200
2
150
1.5
100
1
50
0.5
0
0
0.2
0.4
0.6
0.8
1
x 10
0
0
0.2
0.4
0.6
0.8
1
.8
.6
x
.4
.2
0
0
.2
.4
.6
.8
1
.6
.8
1
.2
.4
x1
.6
.8
1
y
0
• Generate two sets
and plot against
each other
• Might see
correlation in higher
dimensions
• Plot Xi versus Xi+k
for serial correlation
1
Testing Randomness
0
.2
.4
y2
Linear Congruential Generator
• The good
– Fast
– Up to period of m random numbers
• The Bad
– Sequential correlation
• Plots in more than 1 dimension do not fill in the
space uniformly, but tend to form bands
• Not cryptographically secure
– Selections of m, , and c are important
Linear Congruential Generator
• Good “magic” number for linear
congruent method:
– a = 16,807, c = 0, M = 2,147,483,647
Overflow Method for integers
I j 1  aI j  c
• Multiply two 32-bit numbers to get a 64
bit integer, that cannot be represented
in 32-bit space.
• Low order 32 bits remain after the
overflow.
• Divide by 232 to get floating point values
between 0 and 1.
• Very Fast
Blum, Blum, Shub
X n 1   X n  mod M , M  pq,
2
where p and q are large primes
• Very slow
– Not suited to simulation
• Passes all tests
• Cryptographically secure
Mersenne Twister
• By Matsumoto and Nishimura (1997)
• Caused a great deal of excitement in
1997.
• Good statistical properties
• Not good for cryptography
• SAS IML RANDGEN function
• Default technique for R runif()
Mersenne Twister
• I’m just going to give you the flavor of it
• It’s a bit shifting algorithm
32 bit word:
0
1
0
0
0
1
…
1
1
Mersenne Twister
• XOR
– Logical bitwise comparison
function
– Compares two bits
• If they are different, value is
1
• If they are the same, value is
zero
>> a=[0 0 1 1]
a =
0
0
1
1
1
0
0
1
>> b=[0 1 1 0]
b =
0
1
>> c=xor(a,b)
c =
0
>>
1
MATLAB
Mersenne Twister
• XOR
– Logical bitwise comparison
function
– Compares two bits
• If they are different, value is
1
• If they are the same, value is
zero
> a<-c(0, 0, 1, 1)
> b<-c(0, 1, 1, 0)
> c<-xor(a,b)
> c
[1] FALSE
TRUE FALSE
TRUE
> as.integer(c)
[1] 0 1 0 1
R
Mersenne Twister
• Bit shifting algorithm
– Use XOR function to flip values
32 bit word:
1
1
0
0
0
XOR
1
…
1
1
Mersenne Twister
• Use 624 32 bit words to make one
19937 bit word (623*32 + 1)
– XOR flip function in each 32-bit word
To
next
word
32 bit word:
0
From
last
word
1
0
0
0
XOR
1
…
1
1
Mersenne
Twister
From:
John Savard’s
Cryptology
Page
http://www.quadibloc.com
Mersenne Twister
• By Matsumoto and Nishimura (1997)
• Mersenne Prime Numbers (powers of 2
– 1) give period length: 219937-1 for 32 bit
numbers
• Free C source code
• Fast
• Passes all randomness smell tests
• Not cryptographically secure
proc iml; /* begin IML session */
r = j(10,1,.);
call randgen(r,'uniform');
print r;
quit;
r
0.0151013
0.5743561
0.5829185
0.6437729
0.1823678
0.3977417
0.476881
0.9845982
0.3211301
0.9623223
SAS
> RNGkind()
[1] "Mersenne-Twister" "Inversion"
> r=matrix(runif(10), 10,1)
>r
[,1]
[1,] 0.14645262
[2,] 0.04558767
[3,] 0.79254901
[4,] 0.57810786
[5,] 0.57831079
[6,] 0.30258424
[7,] 0.08682622
[8,] 0.77980499
[9,] 0.34161593
[10,] 0.98705945
R
Grabbing a Seed from the
System Clock (SAS)
Both R and SAS automatically grab a seed value
from the system clock at first use, unless you call
set.seed (in R) or randseed (in SAS) to set a specific
starting point
SAS
proc iml; /* begin IML session */
call randseed(12345);
r = j(10,1,.);
call randgen(r,'uniform');
print r;
quit;
r
0.5832971
0.9936254
0.5878877
0.8574689
0.8246889
0.2805668
0.6473969
0.3819192
0.4489572
0.8757847
SAS
> set.seed(12345)
> r=matrix(runif(10), 10,1)
>r
[,1]
[1,] 0.7209039
[2,] 0.8757732
[3,] 0.7609823
[4,] 0.8861246
[5,] 0.4564810
[6,] 0.1663718
[7,] 0.3250954
[8,] 0.5092243
[9,] 0.7277053
[10,] 0.9897369
R
Obtaining Random Numbers
from Specific Distributions
• Inverse Probability Transform
Methods
• Rejection Methods
• Mixed Rejection and Transform
• Methods for Correlated Random
Numbers
Obtaining Random Numbers
from Specific Distributions
• Inverse Probability Transform methods
– Let X be a random variable described by CDF F(X)
– We wish to generate values of X distributed
according to F(X).
– Given a continuous Uniform Random Variable U, in
[0,1], the Random Variable X=F-1(U).
F 1 (u )  inf  x | F ( x)  u, 0  u  1
What this means is:
• Solve for X in the CDF, so that when
you plug in U for F, you get a random
number from that specific distribution.
Example: Exponential Distribution
f(x)= e-  x I (0, ) ( x) , F ( x)  1  e   x I x (0, )
Let y  1  e   x
Solving for x  x 
1
x
So, F ( y ) 
 log(1  y )

 log(1  y )

which means that
as Exponential ( ).
 log(1  U )

is an rv distributed
proc iml; /* begin IML session */
u = j(1000,1,.);
call randgen(u,'uniform');
exrand=-log(1-u)/.04;
tbl=u||exrand;
print tbl;
varnames={"u","erand"};
create erand from tbl [colname=varnames];
append from tbl;
quit;
proc means data=erand;
var u erand;
run;
title 'Analysis Exponential RVs';
proc univariate data=erand noprint;
histogram erand / midpoints=5 to 205 by 10 exp;
run;
tbl
0.115794
0.754043
0.157732
0.0431113
0.1086405
0.2632565
0.9448316
0.3589581
0.7109185
0.4665676
3.0766303
35.064963
4.2914261
1.1017045
2.8751851
7.6378865
72.434124
11.116512
31.026164
15.710572
The MEANS Procedure
Variable
N
Mean
Std Dev
Minimum
Maximum
u
1000
0.4819
0.2877
0.0010
0.9996
erand
1000
23.4993
23.7694
0.0240
194.5330
>r=matrix(runif(10000), 10000,1)
>exrand=-log(1-r)/.04
>hist(exrand, freq = FALSE)
> mean(exrand)
[1] 24.55222
> 1/mean(exrand)
[1] 0.04072951
> hist(exrand, freq = FALSE)> help.search("means")
R
0.010
0.000
Density
0.020
Histogram of exrand
0
50
100
150
200
250
exrand
R
Weibull Survival
Survival Time: S ( U )  exp   t


  ln U  
Inverse Prob Transform: t  

  
  1.5,
1

  0.001
SAS
proc iml; /* begin IML session */
u = j(2000,1,.);
call randgen(u,'uniform');
wrand=(-log(1-u)/.001)##(1/1.5);
tbl=u||wrand;
print tbl;
varnames={"u","weibrand"};
create wrand from tbl [colname=varnames];
append from tbl;
Quit;
proc means data=wrand;
var u weibrand;
run;
title 'Analysis of Weibull RVs';
proc univariate data=wrand noprint;
histogram weibrand / midpoints=5 to 205 by 10 weibull;
run;
SAS
SAS
Weibull Survival
Survival Time: S ( U )  exp   t


  ln U  
Inverse Prob Transform: t  

  
  1.5,
1

  0.001
R
> r=matrix(runif(10000), 10000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> hist(wrand, freq = FALSE, main = paste("Histogram of
Survival Times"), breaks=50, xlab = "Survival Time")
R
R
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;
time WEIBRAND * CENS (0);
run; quit;
goptions reset=all device=WIN;
data work._surv; set work._surv;
if survival > 0 then _lsurv = -log(survival);
if _lsurv > 0 then _llsurv = log(_lsurv);
run;
** Survival plots **;
goptions reset=symbol;
goptions ftext=SWISS ctext=BLACK htext=1 cells;
proc gplot data=work._surv ;
label weibrand = 'Survival Time';
axis2 minor=none major=(number=6)
label=(angle=90 'Survival Distribution Function');
symbol1 i=stepj c=BLUE l=1 width=1;
plot survival * weibrand=1 /
description="SDF of weibrand"
frame cframe=CXF7E1C2 caxis=BLACK vaxis=axis2 hminor=0 name='SDF';
run;
symbol1 i=join c=BLUE l=1 width=1;
quit;
goptions ftext= ctext= htext= reset=symbol;
SAS
SAS
> r=matrix(runif(1000), 1000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> event=wrand<=200
> wrand2=wrand*(event)+200*(1-event)
> fit <- survfit(Surv(wrand2, event) ~ 1, data = aml)
> plot(fit, lty = 2:3,xlab = "Days", ylab="Survival")
>
R
R
Simulating Weibull Regression Data, with
Proportional Hazards
Survival Time: S (  P)  exp   t   ,   exp(  x)
  ln  P  
Inverse Prob Transform: t  




1

  1.5,   exp(  x)
 0  ln(0.001)  2.30
HR( Drug )  0.5, 1  0.69
SAS
proc iml; /* begin IML session */
u = j(400,1,.);
d = j(200,1,0) // j(200,1,1);
call randgen(u,'uniform');
wrand=(-log(1-u)/exp(log(.001)-0.69*d))##(1/1.5);
c = wrand<=200;
wrand=wrand##c + 200*(1-c);
tbl=u || wrand || d || c ;
print tbl;
varnames={"u","weibrand","treat", "cens"};
create wrand from tbl [colname=varnames];
append from tbl;
Data, drug treatment, 0,1)
Constant Value, drug effect * drug
quit;
SAS
options pageno=1;
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;
time WEIBRAND * CENS (0); strata TREAT;
run; quit;
goptions reset=all device=WIN;
data work._surv; set work._surv;
if survival > 0 then _lsurv = -log(survival);
if _lsurv > 0 then _llsurv = log(_lsurv);
run;
** Survival plots **;
title;
footnote;
goptions reset=symbol;
goptions ftext=SWISS ctext=BLACK htext=1 cells;
proc gplot data=work._surv ;
label weibrand = 'Survival Time';
axis2 minor=none major=(number=6)
label=(angle=90 'Survival Distribution Function');
symbol1 i=stepj l=1 width=1; symbol2 i=stepj l=2 width=1; symbol3 i=stepj l=3 width=1;
plot survival * weibrand = treat /
description="SDF of weibrand by treat"
frame cframe=CXF7E1C2 caxis=BLACK vaxis=axis2 hminor=0 name='SDF';
run;
symbol1 i=join l=1 width=1; symbol2 i=join l=2 width=1; symbol3 i=join l=3 width=1;
quit;
SAS
SAS
*** Proportional Hazards Models *** ;
options pageno=1;
proc phreg data=Work.Wrand;
model WEIBRAND * CENS (0) = TREAT;
run; quit;
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
202.5356
209.0491
201.2807
1
1
1
<.0001
<.0001
<.0001
Analysis of Maximum Likelihood Estimates
Parameter
treat
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Hazard
Ratio
1
-0.70413
0.04963
201.2807
<.0001
0.495
SAS
Simulating Weibull Regression Data, with
Proportional Hazards
Survival Time: S (  P)  exp   t   ,   exp(  x)
  ln  P  
Inverse Prob Transform: t  




1

  1.5,   exp(  x)
 0  ln(0.001)  2.30
HR( Drug )  0.5, 1  0.69
R
Data, drug treatment (0,1)
r=matrix(runif(400), 400,1)
Constant Value
drug=rbind(matrix(1,200,1),matrix(0,200,1))
Drug effect * drug
wrand=(-log(1-r)/exp(log(.001)-0.69*drug))^(1/1.5)
event = wrand<=200;
wrand=wrand*event + 200*(1-event)
survreg(Surv(wrand, event) ~ drug, dist='weibull', model=TRUE, scale=1)
Call:
survreg(formula = Surv(wrand, event) ~ drug, dist = "weibull",
scale = 1, model = TRUE)
Coefficients:
(Intercept)
drug
4.6042014 0.5978962
Scale fixed at 1
Loglik(model)= -1918.1 Loglik(intercept only)= -1932.6
Chisq= 29.06 on 1 degrees of freedom, p= 7e-08
n= 400
lsurv2 <- survfit(Surv(wrand, event) ~ drug, aml, type='fleming')
plot(lsurv2, lty=2:3,xlab = "Days", ylab="Survival")
Package survival
R
R
Package eha
> enter=matrix(0,400,1)
> fit <- phreg(Surv(enter, wrand, event) ~ drug)
> fit
Call:
phreg(formula = Surv(enter, wrand, event) ~ drug)
Covariate
drug
W.mean
0.586
log(scale)
log(shape)
Events
Total time at risk
Max. log. likelihood
LR test statistic
Degrees of freedom
Overall p-value
Coef Exp(Coef)
-0.731
0.481
4.663
0.402
105.903
1.495
se(Coef)
0.113
Wald p
0.000
0.050
0.047
0.000
0.000
327
44359
-1886.9
42.4
1
7.38224e-11
> plot.phreg(fn="sur“)
R
R
Generating Numbers from
Specific Distributions
• Rejection Method
– Fast
– Good for Count Models
– Good when you cannot find F-1 , but have
f(x)
– Generally Use Pairs of Random Numbers
– Just like playing the game “Battleship”
The Rejection Method is Like Playing the Game Battleship
Rejection
• Choose pairs of uniform random
numbers
– xU between Xmin and Xmax
– yU between Ymin and Ymax
• Reject xU if yU > f(x) at xU
Rejection
Ymax
Miss
Miss
f(x)
Hit
Ymin
Xmin
Xmax
Sample the area (two dimensions) containing the
probability distribution or density function uniformly.
Rejection
• Simple version becomes inefficient if
the rejection area is large.
Large Dead Zone
g(x)
Binomial Count:
p=0.2
Trials=20
239 In, 761 Rejected
0.25
Frequency
0.2
0.15
0.1
0.05
0
-5
0
5
10
Count
15
20
25
Matlab
Rejection
• Can be made more efficient by uniform
sampling over a smaller target area.
The trick is to sample uniformly over
the smaller area.
g(x)
Smaller
Dead Zone
Rejection
• Can be made more efficient by uniform
sampling over a smaller target area.
First, define "dominating function" f ( x),
and corresponding integral or Cumulative
Distribution F ( x).
F ( x) need not be normalized.
g(x)
Smaller
Dead Zone
f(x)
Rejection
• Can be made more efficient by uniform
sampling over a smaller target area.
g(x)
f ( x)  a  bx , 0  x  a b
b
F ( x)  ax  x 2
2
2
a
F (a b ) 
2b
a
b
xMax
Smaller
Dead Zone
f(x)
Rejection
• Choose xU based on inverse transform of the
integrated dominance function (F(x)).
a2
Choose a uniform random number U1 in the range: 0  U 1 
2b
Calculate x by setting F(x)=U, and solving (the quadratic) for x.
g(x)
f(x)
xU
Rejection
• Evaluate f(x), choose a second uniform
random number U2 between 0 and f(x).
• Reject if U2 >f(x)
g(x)
f(x)
xU
315 In, 685 Rejected
0.25
Frequency
0.2
0.15
0.1
0.05
0
-5
0
5
10
Count
15
20
25
Weibull Function?
 
 x 
 x  1
f ( x) 
exp  






 
 x
F ( x)  1  exp  


1
(u )
F
 x     ln  u  
1


times 2



,   6.5,   1.8
574 In, 426 Rejected
0.25
Frequency
0.2
0.15
0.1
0.05
0
-5
0
5
10
Count
15
20
25
Binomial Distribution
(Bernoulli Trials are the simplest
example of the rejection method.)
Probability Pr(X=1): p
>>proc iml; /* begin IML session */
r = j(10,1,.);
call randgen(r,'uniform');
b=r>0.5;
print b;
quit;
b
1
0
1
1
0
0
0
0
0
1
SAS
> r=matrix(runif(10), 10,1)
> b=r<=0.5
> cbind(r,b)
[,1] [,2]
[1,] 0.4919652
1
[2,] 0.5088624
0
[3,] 0.5955355
0
[4,] 0.5243394
0
[5,] 0.5923056
0
[6,] 0.1610980
1
[7,] 0.9663659
0
[8,] 0.2548106
1
[9,] 0.4582953
1
[10,] 0.1170421
1
>
R
• But then, you never have just one
value of “p” for your Bernoulli Trials
Simulating Outcomes from a
Logistic Model
• Placebo Controlled Drug Trial
– 25% Success for Placebo
– Odds Ratio of 2.0 for Treatment
• Two different success probabilities,
based on logistic model
Logistic Model
Placebo:
exp   0 
,  0  1.0986
0.25 
1  exp   0 
Drug (0,1): OR=2.0, ln(OR)=0.6931
exp  1.0986  0.6931* Drug 
CDF  Success  
1  exp  1.0986  0.6931* Drug 
proc iml; /* begin IML session */
u = j(400,1,.);
d = j(400,1,1)||(j(200,1,0) // j(200,1,1));
bta= {-1.0986 , 0.6931};
call randgen(u,'uniform');
expit=exp(d*bta)/(1+exp(d*bta));
outcome=u<=expit;
tbl=u || d || expit || outcome ;
varnames={"u","const","treat", "expit","outcome"};
create erand from tbl [colname=varnames];
append from tbl;
 bta1 
bta 2 


d:
1
0
0
1
quit;
proc logistic data=Work.Erand DESCEND;
model OUTCOME = TREAT;
run;
1
1
SAS
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Intercept
treat
1
1
-1.2367
0.5510
0.1693
0.2261
Wald
Chi-Square
Pr > ChiSq
53.3383
5.9402
<.0001
0.0148
Odds Ratio Estimates
Effect
treat
Point
Estimate
1.735
95% Wald
Confidence Limits
1.114
2.702
SAS
r=matrix(runif(400), 400,1)
drug=rbind(matrix(0,200,1),matrix(1,200,1))
d=cbind(matrix(1,400,1), drug)
parms=matrix(c(-1.0986 , 0.6931),2,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r<=expit
 parm1 
 parm2


d:
1
0
0
1
1
1
R
> drugtrial<-glm(outcome~drug, family = binomial(link="logit"))
> summary(drugtrial)
Call:
glm(formula = outcome ~ drug, family = binomial(link = "logit"))
Deviance Residuals:
Min
1Q Median
-1.036 -1.036 -0.776
3Q
1.326
Max
1.641
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0460
0.1612 -6.488 8.68e-11 ***
drug
0.7026
0.2158
3.255 0.00113 **
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 511.49
Residual deviance: 500.67
AIC: 504.67
on 399
on 398
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
R
Normally Distributed Random Numbers
• Inverse transform methods inefficient for
normal random numbers
• Box-Muller Transform
– z transformation of two random uniform
variates [X1,X2~U(0,1)]
• Random radius, random 
Get two z variates from two uniform
random numbers, X 1 and X 2 :
z1  r cos( )  2 ln( X 1 ) cos(2 X 2 )
z2  r sin( )  2 ln( X 1 ) sin(2 X 2 )
Normally Distributed Random Numbers
• Box-Muller Transform
– X1, X2, specify a position within the unit circle
• Random angle, random radius
– Would be more efficient if it did not make calls to
trigonometric functions.
• Marsaglia Method
– Places the Unit Circle within a square, -1 to +1,
and samples the square uniformly.
• Rejects draws that fall outside the circle.
• But it avoids calls to trig functions.
s  X12  X 2 2  1
z1  X1
2ln( s)
2ln( s)
, z2  X 2
s
s
Generating Numbers from
Specific Distributions
• Normal, Using CLT (quick & dirty)
– Sum several iterations of u
– Standardize
• Recall that Var(u)=1/12
12
X   ui  6
i 1
Correlated Multivariate Random Numbers
• Simulating panel data, repeated
measures
• Mixture distributions
Generating Multivariate Normal
Random Numbers
Desired Covariance Matrix V
V = R'R, R is the Cholesky Decomposition of V
Begin with independent standard normal RVs Z N (0,1)
Correlated (Multivariate) Normal RVs: X = R'Z  μ
Generating Multivariate Normal
Random Numbers
proc iml; /* begin IML session */
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={53 36 12 47};
cvmat=rmat#(sigvec`*sigvec);
upr=half(cvmat);
print rmat;
print sigvec;
print cvmat;
print upr;
/* Let’s be wasteful */
r1 = j(1000,4,.);
r2 = j(1000,4,.);
call randgen(r1,'uniform');
call randgen(r2,'uniform');
pi= 4*atan(1);
print pi;
quit;
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
z1=z1*upr;
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
proc corr data=work.nrand pearson;
var x1 x2 x3 x4;
run;
SAS
rmat
1
0.3
0.2
0.1
0.3
1
0.3
0.2
0.2
0.3
1
0.3
0.1
0.2
0.3
1
12
47
127.2
129.6
144
169.2
249.1
338.4
169.2
2209
sigvec
53
36
cvmat
2809
572.4
127.2
249.1
572.4
1296
129.6
338.4
upr
53
10.8
2.4
4.7
0 34.341811 3.0190603 8.3757958
0
0 11.36333 11.672016
0
0
0 44.503035
SAS
The CORR Procedure
4
Variables:
x1
x2
x3
x4
Simple Statistics
Variable
x1
x2
x3
x4
N
Mean
Std Dev
Sum
Minimum
Maximum
1000
1000
1000
1000
-0.86090
-0.21592
-0.06176
0.46483
51.78291
36.41244
11.60953
46.63351
-860.89502
-215.92386
-61.75755
464.82762
-167.70178
-122.58068
-37.09589
-152.65527
157.51299
120.05335
43.83908
143.41509
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1
x2
x3
x4
x1
1.00000
0.30338
<.0001
0.20341
<.0001
0.11397
0.0003
x2
0.30338
<.0001
1.00000
0.28186
<.0001
0.24150
<.0001
x3
0.20341
<.0001
0.28186
<.0001
1.00000
0.34421
<.0001
x4
0.11397
0.0003
0.24150
<.0001
0.34421
<.0001
1.00000
SAS
Generating Multivariate Normal
Random Numbers
cmat<-rbind(c(1, .3, .2, .1), c(.3, 1, .3, .2), c(.2, .3, 1, .3) , c(.1, .2, .3, 1))
sigvec=c(53, 36, 12, 47)
vv=cmat*(sigvec%*%t(sigvec))
rr=chol(vv)
r1=matrix(runif(1000), 250,4)
r2=matrix(runif(1000), 250,4)
z1=sqrt(-2*log(r1))*cos(2*pi*r2)
rvs=z1%*%rr
> cmat
[,1] [,2] [,3] [,4]
[1,] 1.0 0.3 0.2 0.1
[2,] 0.3 1.0 0.3 0.2
[3,] 0.2 0.3 1.0 0.3
[4,] 0.1 0.2 0.3 1.0
> vv
[1,]
[2,]
[3,]
[4,]
[,1]
[,2] [,3]
[,4]
2809.0 572.4 127.2 249.1
572.4 1296.0 129.6 338.4
127.2 129.6 144.0 169.2
249.1 338.4 169.2 2209.0
R
> rr
[1,]
[2,]
[3,]
[4,]
[,1]
[,2]
[,3]
[,4]
53 10.80000 2.400000 4.700000
0 34.34181 3.019060 8.375796
0 0.00000 11.363330 11.672016
0 0.00000 0.000000 44.503035
> cov(rvs)
[,1]
[,2]
[,3]
[,4]
[1,] 2832.4200 561.2585 134.0656 533.7351
[2,] 561.2585 1235.7616 124.2373 382.5441
[3,] 134.0656 124.2373 127.4132 160.2173
[4,] 533.7351 382.5441 160.2173 2205.5903
> cor(rvs)
[,1]
[,2]
[,3]
[,4]
[1,] 1.0000000 0.2999969 0.2231676 0.2135426
[2,] 0.2999969 1.0000000 0.3130961 0.2317137
[3,] 0.2231676 0.3130961 1.0000000 0.3022317
[4,] 0.2135426 0.2317137 0.3022317 1.0000000
> sd(rvs)
[1] 53.22048 35.15340 11.28774 46.96371
R
Subject-specific Random Effects
th
th
For the i subject in the j measurement:
yij    x  eij  ki
• We have an error term (eij) for
measurement j in subject i.
• We also have a subject specific random
effect (ki)
Recipe for Subject-specific Random Effects
• Create subjects for study
–N
– Assign treatment, covariates
• Give each subject a random effect
– Drawn from, say, N(0,V)
• Generate predicted values based on
regression + random effects
• Generate outcomes for each repeated
measure from specific distribution
Logistic Model
Placebo:
0.25 
exp   0 
,  0  1.0986
1  exp   0 
Drug: OR=2.0, ln(OR)=0.6931
Time (0,1,2): OR 1.5, ln(OR)=0.4055
Ki
N(0,1)
exp  1.0986  0.6931* Drug  0.4055* Time  K i 
CDF  Success  
1  exp  1.0986  0.6931* Drug  0.4055* Time  K i 
 bta1 
bta 2 


1 
proc iml; /* begin IML session */
u = j(600,1,.);
d1=j(100,1,0)//j(100,1,1);
d1=d1//d1//d1;
id=j(200,1,0);
do i=1 to 200 by 1;
id[i,1]=i;
end;
id=id//id//id;
t=j(200,1,0)//j(200,1,1)//j(200,1,2);
k=j(200,1,.);
call randgen(k,'normal');
k=k//k//k;
bta= {-1.0986 , 0.6931,.4055,1};
d = j(600,1,1)||d1||t||k;
call randgen(u,'uniform');
expit=exp(d*bta)/(1+exp(d*bta));
y=u<=expit;
id:
d:
ID
1
0
k
1
ID
0
k
1
ID
tbl=id||u || d || expit || y ;
0
k
1
varnames={"id","u","const","treat","t","k", "expit","outcome"};
create erand from tbl [colname=varnames];
append from tbl;
quit;
SAS
. xtlogit outcome treat t, i(id)
Random-effects logistic regression
Group variable: id
Number of obs
Number of groups
=
=
600
200
Random effects u_i ~ Gaussian
Obs per group: min =
avg =
max =
3
3.0
3
Log likelihood
=
-394.754
Wald chi2(2)
Prob > chi2
=
=
11.07
0.0040
-----------------------------------------------------------------------------outcome |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treat |
.6023501
.2279107
2.64
0.008
.1556533
1.049047
t |
.235297
.1123071
2.10
0.036
.015179
.4554149
_cons | -.9334699
.2040373
-4.57
0.000
-1.333376
-.5335642
-------------+---------------------------------------------------------------/lnsig2u | -.1281394
.3971684
-.9065751
.6502964
-------------+---------------------------------------------------------------sigma_u |
.9379396
.18626
.6355353
1.384236
rho |
.2109869
.0661172
.1093476
.3680594
-----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) =
13.01 Prob >= chibar2 = 0.000
. di -394.754*(-2)
789.508
Stata, SAS data
Finally got this to run in SAS. I had forgotten that SAS requires you to sort.
Stata does not require sorted data for their mixed models.
proc sort data=erand;
by id t;
run;
proc nlmixed data=erand qpoints=5 ;
parms b0=0 b1=-.7 b2=.6 sig=0 ;
theta2 = b0+b1*treat+b2*t+u;
prb= exp(theta2)/(1+exp(theta2));
model outcome ~ binary(prb);
random u ~normal(0,sig) subject=id ;
run;
SAS
The NLMIXED Procedure
Fit Statistics
-2 Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
789.5
797.5
797.6
810.7
Parameter Estimates
Parameter
b0
b1
b2
sig
Estimate
Standard
Error
DF
t Value
Pr > |t|
Alpha
Lower
Upper
Gradient
-0.9332
0.6021
0.2353
0.8781
0.2039
0.2278
0.1123
0.3480
199
199
199
199
-4.58
2.64
2.10
2.52
<.0001
0.0089
0.0374
0.0124
0.05
0.05
0.05
0.05
-1.3354
0.1529
0.01382
0.1917
-0.5310
1.0513
0.4567
1.5644
0.00039
-0.00007
0.000737
0.000363
We see tiny differences between this and Stata results, owing to differences in optimization
specs.
SAS
id=matrix(seq(1:200), 200,1)
k1=matrix(runif(200), 200,1)
k2=matrix(runif(200), 200,1)
k=sqrt(-2*log(k1))*cos(2*pi*k2)
id=rbind(id,id,id)
k=rbind(k,k,k)
drug =rbind(matrix(0,100,1), matrix(1,100,1))
drug=rbind(drug,drug,drug)
d=cbind(matrix(1,600,1),drug,k)
parms=matrix(c(-1.0986 , 0.6931,1),3,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r<=expit
id:
d:
ID
1
 bta1 
bta 2 


1 
0
k
1
ID
0
k
1
ID
0
k
1
R
. xtlogit
outcome drug, i(id)
Random-effects logistic regression
Group variable: id
Number of obs
Number of groups
=
=
600
200
Random effects u_i ~ Gaussian
Obs per group: min =
avg =
max =
3
3.0
3
Log likelihood
=
-368.112
Wald chi2(1)
Prob > chi2
=
=
14.55
0.0001
-----------------------------------------------------------------------------outcome |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------drug |
.8331252
.2183881
3.81
0.000
.4050924
1.261158
_cons | -1.240213
.1708219
-7.26
0.000
-1.575018
-.9054085
-------------+---------------------------------------------------------------/lnsig2u | -.6325958
.5624138
-1.734907
.469715
-------------+---------------------------------------------------------------sigma_u |
.7288423
.2049555
.4200198
1.264729
rho |
.1390212
.0673177
.050895
.3271436
-----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) =
5.08 Prob >= chibar2 = 0.012
R
Got this to run in R, using a mixed effects package called Zelig.
z.out1 <- zelig(outcome ~ drug + tag(1 | id),data=NULL, model="logit.mixed")
Delia Bailey and Ferdinand Alimadhi. 2007. "logit.mixed: Mixed effects logistic
model" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical
Software," http://gking.harvard.edu/zelig
summary(z.out1)
Generalized linear mixed model fit by the Laplace approximation
Formula: outcome ~ drug + tag(1 | id)
AIC
BIC logLik deviance
743.4 756.6 -368.7
737.4
Random effects:
Groups Name
Variance Std.Dev.
id
(Intercept) 0.39486 0.62838
Number of obs: 600, groups: id, 200
Fixed effects:
Estimate Std. Error z value
(Intercept) -1.2172
0.1511 -8.054
drug
0.8174
0.2023
4.041
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01
Pr(>|z|)
8.02e-16 ***
5.32e-05 ***
‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
drug -0.747
R
General Approach to Correlated
Multivariate Random Numbers
• “Copulas”
– allow us to draw correlated random
numbers from different distributions
• Random effects in Mixture Models
– They use CDF probabilities of correlated
variables on the inside to map to
correlated uniform random numbers on
the margins
• Those correlated uniform RVs may be used
to marry vastly different distributions.
• Maintain Marginal Distributions
Generating Multivariate Random
Numbers
From SAS documentation, a Gaussian Copula
• Independent Normal (N(0,1) ) random variables are
generated
• These variables are transformed to a correlated set of
z-scores by using the Cholesky Decomposition of the
covariance matrix.
• These correlated normal RVs are transformed to a
uniform by using (z).
• F-1() is used to compute the final sample value
Generating Multivariate Random
Numbers
proc iml; /* begin IML session */
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={1 1 1 1};
cvmat=rmat#(sigvec`*sigvec);
/* # is element-wise multiplication */
upr=half(cvmat);
print rmat;
print sigvec;
print cvmat;
print upr;
r1 = j(1000,4,.);
r2 = j(1000,4,.);
call randgen(r1,'uniform');
call randgen(r2,'uniform');
pi= 4*atan(1);
print pi;
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
/* Note – I could have gotten another z here */
z1=z1*upr;
z1=cdf('Normal',z1);
z1=gaminv(z1,3.0);
/* Standardized gamma parameter, also the
mean */
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
quit;
proc corr data=work.nrand pearson;
var x1 x2 x3 x4;
run;
SAS
rmat
1
0.3
0.2
0.1
0.3
1
0.3
0.2
0.2
0.3
1
0.3
0.1
0.2
0.3
1
1
1
0.2
0.3
1
0.3
0.1
0.2
0.3
1
sigvec
1
1
cvmat
1
0.3
0.2
0.1
0.3
1
0.3
0.2
SAS
The CORR Procedure
4
Variables:
x1
x2
x3
x4
Simple Statistics
Variable
x1
x2
x3
x4
N
Mean
Std Dev
Sum
Minimum
Maximum
1000
1000
1000
1000
2.96320
3.01249
3.00336
3.08106
1.73566
1.68236
1.68803
1.79858
2963
3012
3003
3081
0.11528
0.14039
0.34496
0.11148
12.19072
10.20117
13.72023
13.25409
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1
x2
x3
x4
x1
1.00000
0.25874
<.0001
0.19005
<.0001
0.10052
0.0015
x2
0.25874
<.0001
1.00000
0.22622
<.0001
0.13944
<.0001
x3
0.19005
<.0001
0.22622
<.0001
1.00000
0.32082
<.0001
x4
0.10052
0.0015
0.13944
<.0001
0.32082
<.0001.
1.00000
SAS
Generating Multivariate Random
Numbers
cmat<-rbind(c(1, .4, .4, .4), c(.4, 1, .4, .4), c(.4, .4, 1, .4) , c(.4, .4, .4, 1))
rr=chol(cmat)
r1=matrix(runif(1000), 250,4)
r2=matrix(runif(1000), 250,4)
z1=rbind(sqrt(-2*log(r1))*cos(2*pi*r2),sqrt(-2*log(r2))*cos(2*pi*r1))
rvs=z1%*%rr
cd=pnorm(rvs,mean=0,sd=1)
g<-qinvgamma(cd,2,3)
corr(cd)
[1] 0.4150358
corr(rvs)
[1] 0.4188932
corr(g)
[1] 0.2337756
R
>> U = copularnd('Gaussian',.4,10)
U=
0.8017
0.3650
0.8104
0.3467
0.6067
0.4743
0.6273
0.9905
0.4427
0.3443
0.9388
0.2250
0.6253
0.0988
0.6561
0.6723
0.7427
0.8249
0.6925
0.2711
>> U = copularnd('Gaussian',.4,10000);
>> corr(U)
>> X = norminv(U,0,1);
>> corr(X)
ans =
1.0000
0.3901
0.3901
1.0000
>> Xg = gaminv(U,2,3);
>> corr(Xg)
ans =
1.0000
0.3721
0.3721
1.0000
>>
ans =
1.0000
0.3765
0.3765
1.0000
Matlab
(a little more clear)
Old Slides
Grabbing a Seed from the
System Clock (Stata)
program define seedset
local ct =c(current_time)
local s1=substr("`ct'",7,2)
local s2=substr("`ct'",4,2)
local s3=substr("`ct'",2,1)
global newseed=real("`s1'" +"`s2'" +"`s3'")
di $newseed
set seed $newseed
end
LCG is default for Stata
. set obs 100
obs was 0, now 100
.
.
.
.
gen x0=ceil(uniform()*100)
gen m=ceil(uniform()*10)
gen x1=mod(x0,m)
list in 1/10
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
+------------+
| x0
m x1 |
|------------|
| 70
2
0 |
| 62
7
6 |
| 92
7
1 |
| 53
1
0 |
| 37
3
1 |
|------------|
| 78
1
0 |
| 47
2
1 |
| 91
2
1 |
| 98
1
0 |
| 71
9
8 |
+------------+
Testing Randomness (Stata)
• Correlogram of Xi versus Xi+k for serial
correlation
. gen tv=_n
. tsset tv
time variable:
delta:
tv, 1 to 20000
1 unit
. corrgram x, lags(40)
-1
0
1 -1
0
1
LAG
AC
PAC
Q
Prob>Q [Autocorrelation] [Partial
Autocor]
-----------------------------------------------------------------------------1
0.0026
0.0026
.13551 0.7128
|
|
2
-0.0011 -0.0011
.15995 0.9231
|
|
3
-0.0004 -0.0004
.16301 0.9833
|
|
4
-0.0131 -0.0131
3.5987 0.4630
|
|
5
-0.0008 -0.0007
3.6115 0.6066
|
|
6
0.0119
0.0118
6.4238 0.3774
|
|
7
-0.0060 -0.0061
7.1533 0.4131
|
|
8
0.0004
0.0003
7.1571 0.5198
|
|
9
0.0057
0.0057
7.815 0.5529
|
|
10
-0.0049 -0.0046
8.2893 0.6006
|
|
11
-0.0097 -0.0099
10.19 0.5134
|
|
12
0.0044
0.0043
10.581 0.5651
|
|
13
0.0087
0.0090
12.102 0.5193
|
|
14
0.0081
0.0079
13.424 0.4935
|
|
15
0.0068
0.0064
14.357 0.4986
|
|
16
0.0068
0.0071
15.285 0.5039
|
|
17
-0.0029 -0.0025
15.449 0.5632
|
|
(Stata)
. seedset
23491
. set obs 2000
obs was 0, now 2000
. gen P=uniform()
. gen enum=-ln(P)/.04
. ci
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------P |
2000
.5030619
.0065298
.4902559
.5158678
enum |
2000
24.79822
.553316
23.71309
25.88336
. set obs 200
obs was 0, now 200
(Stata)
. gen P=uniform()
. gen tte=(-ln(P)/0.1)^1.5
. gen fail=1
. replace fail=0 if tte>200
. replace tte=200 if tte>200
. stset tte, fail(fail)
failure event:
obs. time interval:
exit on or before:
fail != 0 & fail < .
(0, tte]
failure
------------------------------------------------------------------200 total obs.
0 exclusions
------------------------------------------------------------------200 obs. remaining, representing
188 failures in single record/single failure data
9019.163 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
200
(Stata)
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier survival estimate
0
50
100
analysis time
150
200
(Stata)
. streg, d(w) nohr
failure _d:
analysis time _t:
fail
tte
Weibull regression -- log relative-hazard form
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
200
188
9019.163067
-403.39593
Number of obs
=
200
LR chi2(0)
Prob > chi2
=
=
0.00
.
-------------------------------------------------------------------_t |
Coef.
SE
z
P>|z|
[95% CI]
-------------+-----------------------------------------------------_cons | -2.245
0.167
-13.47
0.000
-2.572
-1.918
-------------+-----------------------------------------------------delta |
0.625
0.036
0.558
0.701
-------------------------------------------------------------------.
. gen P=uniform()
. gen tte=(-ln(P)/(exp(log(0.1)+log(0.5)*drug)))^1.5
(Stata)
. gen fail=1
. replace fail=0 if tte>200
(39 real changes made)
. replace tte=200 if tte>200
(39 real changes made)
. stset tte, fail(fail)
failure event:
obs. time interval:
exit on or before:
fail != 0 & fail < .
(0, tte]
failure
-----------------------------------------------------------------------------400 total obs.
0 exclusions
-----------------------------------------------------------------------------400 obs. remaining, representing
361 failures in single record/single failure data
25170.04 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
200
.
(Stata)
. list
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
drug P tte fail in 1/15
+-----------------------------------+
| drug
P
tte
fail |
|-----------------------------------|
|
1
.842721
6.331312
1 |
|
0
.3839878
29.6119
1 |
|
1
.3483792
96.8484
1 |
|
0
.6035132
11.34804
1 |
|
1
.8460417
6.114305
1 |
|-----------------------------------|
|
1
.4935982
53.06192
1 |
|
1
.5173908
47.84433
1 |
|
1
.385052
83.39208
1 |
|
0
.8726683
1.589515
1 |
|
0
.0356283
192.5611
1 |
|-----------------------------------|
|
0
.8018837
3.280757
1 |
|
0
.6059877
11.21039
1 |
|
1
.7919235
10.07838
1 |
|
0
.1920578
67.02081
1 |
|
0
.0819428
125.1301
1 |
+-----------------------------------+
(Stata)
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier survival estimates
0
50
100
analysis time
drug = 0
150
drug = 1
200
(Stata)
. streg drug, d(w) nohr
failure _d:
analysis time _t:
fail
tte
Weibull regression -- log relative-hazard form
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
400
361
25170.03819
-757.69677
Number of obs
=
400
LR chi2(1)
Prob > chi2
=
=
53.70
0.0000
-------------------------------------------------------------------_t |
Coef.
SE
z
P>|z|
[95% CI]
-------------+-----------------------------------------------------drug | -0.788
0.107
-7.34
0.000
-0.998
-0.577
_cons | -2.458
0.143
-17.19
0.000
-2.738
-2.177
-------------+-----------------------------------------------------delta |
0.706
0.031
0.648
0.768
-------------------------------------------------------------------.
(Stata)
Generating Multivariate Normal
Random Numbers
In Stata , gennorm (webseek to download):
Typing
. gennorm a b c, corr(.2 .3 .4)
creates a, b, and c with value draw from a N(0,S) distribution where
+-+
| 1
|
S = | .2 1
|
| .3 .4 1 |
+-+
That is, corr(a,b)=.2, corr(a,c)=.3, and corr(b,c)=.4
CONTINUED NEXT PAGE
(Stata)
Generating Multivariate Normal
Random Numbers
In Stata:
Example
------. set obs 10000
obs was 0, now 10000
. set seed 6819
. gennorm a b c, corr(.2 .3 .4)
. summarize a b c
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+---------------------------------------------------------------------------a | 10000 -.0105333 1.005723 -3.694448 3.775433
b | 10000 -.0042212 1.000254 -3.695302 3.648826
c | 10000 -.0069625 .9989002 -3.996779 3.606923
. corr a b c
(obs=10000)
|
a
b
c
-------------+--------------------------------a | 1.0000
b | 0.2137 1.0000
c | 0.3035 0.3952 1.0000
(Stata)
Generating Multivariate Normal
Random Numbers
In Stata, drawnorm:
. clear
. matrix C=(1, 0.2, 0.3 \ 0.2, 1, 0.4 \ 0.3, 0.4, 1)
. drawnorm a b c, n(10000) corr(C)
(obs 10000)
. summarize a b c
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------a | 10000 -.0176275 .9920181 -3.701594 3.7838
b | 10000 .0009005 1.003002 -3.709259 3.518793
c | 10000 -.0149926 .9925292 -3.716346 4.009713
. corr a b c
(obs=10000)
|
a
b
c
-------------+--------------------------a | 1.0000
b | 0.1937 1.0000
c | 0.3051 0.4056 1.0000
Simulating Weibull Regression Data, with
Time-Dependency in Drug Effect
Survival Time: S (  P)  exp   t   ,   exp(  x)
 x  -2.30  0.3* drug - 0.08* drug * t
Inverse Prob Transform: ?????????
How do you solve for t? (Not all answers are in the book.)
Remember Newton’s Method?
t0
t 1…
f (t )  exp   exp(-2.30  0.3* drug - 0.004* drug * t ) t    P
f (t * )  0
f (t  d )  f (t )

f (t ) 
d
f (t0 )
t1  t0 
f (t0 )
clear
set obs 400
gen drug=_n>200
gen double P=uniform()
gen double t=1
gen double tpd=t+.0001
gen double f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-P
gen double fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-P
gen double slope=(fp-f)/0.0001
forvalues i=1/50 {
qui replace f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-P
qui replace fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-P
qui replace slope=(fp-f)/0.0001
qui replace t=t-f/slope
qui replace tpd=t+.0001
}
(Stata)
Matlab
Matlab
>> drug=[zeros(1000,1);ones(1000,1)];
>> P=rand(2000,1);
>> cdf0=exp(-1.0986+0.6931*drug)./(1+exp(-1.0986+0.6931*drug));
>> outcome=P<=cdf0;
>> b = glmfit(drug,outcome,'binomial')
b=
-1.0616
0.6562
(Stata)
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier survival estimates
0
50
100
analysis time
drug = 0
150
drug = 1
200
. gen P=uniform()
. gen cdf0=exp(-1.0986+0.6931*drug)/(1+exp(1.0986+0.6931*drug))
. list in 1/10
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
+----------------------------+
| drug
P
cdf0 |
|----------------------------|
|
0
.2865897
.2500023 |
|
0
.3788754
.2500023 |
|
1
.3597057
.3999916 |
|
1
.7182508
.3999916 |
|
1
.4315197
.3999916 |
|----------------------------|
|
1
.2963237
.3999916 |
|
1
.7961193
.3999916 |
|
0
.056983
.2500023 |
|
0
.4622037
.2500023 |
|
0
.5336403
.2500023 |
+----------------------------+
(Stata)
(Stata)
. gen outcome=P<=cdf0
. list in 1/10
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
+--------------------------------------+
| drug
P
cdf0
outcome |
|--------------------------------------|
|
0
.2865897
.2500023
0 |
|
0
.3788754
.2500023
0 |
|
1
.3597057
.3999916
1 |
|
1
.7182508
.3999916
0 |
|
1
.4315197
.3999916
0 |
|--------------------------------------|
|
1
.2963237
.3999916
1 |
|
1
.7961193
.3999916
0 |
|
0
.056983
.2500023
1 |
|
0
.4622037
.2500023
0 |
|
0
.5336403
.2500023
0 |
+--------------------------------------+
(Stata)
. gen outcome=P<=cdf0
. logistic
outcome drug
Logistic regression
Log likelihood = -1245.3138
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
2000
58.65
0.0000
0.0230
-------------------------------------------------------------------outcome |
OR
SE
z
P>|z|
[95% CI]
-------------+-----------------------------------------------------drug |
2.084
0.202
7.57
0.000
1.723
2.519
-------------------------------------------------------------------_cons | -1.077
0.073
-14.83
0.000
-1.220
-0.935
--------------------------------------------------------------------
.
Download