A Simple Example of Finding MLE`s Using SAS PROC NLP With the

advertisement
1
A Simple Example of Finding MLE’s Using SAS PROC NLP
With the Nelder-Mead Simplex Method, Followed by Newton’s Method
We want to obtain MLE’s for a simple example in which a relatively small (n = 5) data set has been
sampled from a normal distribution with mean µ and finite variance 𝜎 2 . The data values are: 𝑥1 = 1,
𝑥2 = 3, 𝑥3 = 4, 𝑥4 = 5, and 𝑥5 = 7.
We will use SAS PROC NLP to obtain simultaneous MLE’s for both parameters using the Nelder-Mead
Simplex algorithm, followed by refinement of the parameter estimates using Newton’s method.
The log-likelihood function to be maximized is:
(1)
𝑛
1
𝑙(𝜇, 𝜎;⃗⃗⃗⃗𝑥) = − 2 𝑙𝑛(2𝜋) − 𝑛𝑙𝑛(𝜎) − 2𝜎2 ∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2.
In PROC NLP, we enter this function for a single observation; SAS then recognizes that the data
represent n observations from the distribution.
The Nelder-Mead method (without modification) does not necessarily converge to a minimum point for
general non-convex functions. However, it does tend to produce “a rapid initial decrease in function
values…” 1 regardless of the form of the function.
In this example, we are finding the extreme point of a convex function; hence the Nelder-Mead method
should work well. The SAS program for implementing the first algorithm is shown below, followed by
an explanation of the options and statements in the PROC, and the output.
data Ex1;
input x @@;
datalines;
1 3 4 5 7
;
proc nlp data=Ex1 tech=nmsimp vardef=df covariance=h pcov phes;
max loglik;
parms mean=0., sigma=1.;
bounds sigma > 1e-12;
loglik=-0.5*((x-mean)/sigma)**2-log(sigma);
;
run;
Options in the PROC NLP statement:
a) “data=” tells the procedure where to find the data set.
b) “tech” specifies the method for numeric optimization; in this case “nmsimp” tells SAS to use the
Nelder-Mead simplex method. “newrap” would specify the use of the Newton-Raphson method.
If no method is specified, the default method is Newton-Raphson.
2
c) “vardef” specifies the divisor to be used in the calculation of the covariance matrix and standard
errors. The default value is N, the sample size. Here, “df” specifies that the degrees of freedom
should be used as the divisor. (See p. 371 of the manual.)
d) “covariance” specifies that an approximate covariance matrix for the parameter estimates is to be
computed. There are several ways to do this (See p. 370 of the manual.) The option “h” used
here specifies that, since the “max” statement is used in the procedure, the covariance matrix
𝑛
should be computed as 𝑚𝑎𝑥{1,𝑛−𝑑𝑓} 𝐺 −1 , where G is the Hessian matrix (the matrix of second
partial derivatives of the objective function, (1), with respect to the parameters.)
e) “pcov” tells SAS to display the covariance matrix.
f) “phes” tells SAS to display the Hessian matrix.
Other statements:
a) The “max” statement specifies that the maximum point for the objective function (1) is to be
found. The “loglik” variable denotes the objective function.
b) The “parms” statement specifies initial values of the parameters, to be used in this case to
generate the initial simplex for starting the Nelder-Mead algorithm. (See p. 374 of the manual.)
c) The “bounds” statement specifies constraints on the values of the parameters. Here, the only
constraint is that σ must be positive. (See p. 325 of the manual.)
d) The objective function to be maximized is defined by a statement in the procedure. Here, we
1 𝑥−𝜇 2
define 𝑙𝑜𝑔𝑙𝑖𝑘(𝜇, 𝜎) = − 2 (
𝜎
) − 𝑙𝑛(𝜎).
The output of the program is listed below:
The SAS System
PROC NLP: Nonlinear Maximization
Gradient is computed using analytic formulas.
Hessian is computed using finite difference approximations (2) based on analytic gradient.
N Parameter
1 mean
2 sigma
The SAS System
PROC NLP: Nonlinear Maximization
Optimization Start
Parameter Estimates
Gradient
Lower
Objective
Bound
Estimate
Function
Constraint
0
20.000000
.
1.000000
95.000000
1E-12
Value of Objective Function = -50
mean
sigma
Hessian Matrix
mean
-5
-40.00000001
sigma
-40.00000001
-295.0000001
Determinant = -124.9999997
Matrix has 1 Positive Eigenvalue(s)
WARNING: Second-order optimality condition violated.
The SAS System
PROC NLP: Nonlinear Maximization
Upper
Bound
Constraint
.
.
3
Nelder-Mead Simplex Optimization
Parameter Estimates
Functions (Observations)
Lower Bounds
Upper Bounds
2
5
1
0
Optimization Start
0
-50
Active Constraints
Objective Function
Iter
Restarts
Function
Calls
1
2
3
4
5
6
0
0
0
0
0
0
11
20
29
38
48
58
Active
Constraints
Objective
Function
Objective
Function
Change
Std Dev
of Simplex
Values
Restart
Vertex
Length
Simplex
Size
0
0
0
0
0
0
-6.40098
-6.08749
-5.97918
-5.96671
-5.96581
-5.96574
1.7805
0.3135
0.0496
0.00693
0.000095
1.57E-6
0.7758
0.1291
0.0218
0.00312
0.000039
6.954E-7
1.000
1.000
1.000
1.000
1.000
1.000
2.750
1.438
0.262
0.101
0.0213
0.00459
Optimization Results
6 Function Calls
0 Active Constraints
-5.965738194 Std Dev of Simplex Values
1 Size
Iterations
Restarts
Objective Function
Deltax
60
0
6.9541052E-7
0.0045923843
FCONV2 convergence criterion satisfied.
NOTE: At least one element of the (projected) gradient is greater than 1e-3.
The SAS System
PROC NLP: Nonlinear Maximization
Optimization Results
Parameter Estimates
N Parameter
1 mean
2 sigma
Approx
Estimate
Std Err
t Value
3.998087
0.894406
4.470100
1.999952
0.632418
3.162391
Value of Objective Function = -5.965738194
mean
sigma
Hessian Matrix
mean
-1.250060277
-0.002392044
sigma
-0.002392044
-2.500304826
Determinant = 3.1255260207
Matrix has Only Negative Eigenvalues
Covariance Matrix 2: H = (NOBS/d) inv(G)
mean
sigma
mean
0.7999628893
-0.000765325
sigma
-0.000765325
0.399951966
Factor sigm = 1
Determinant = 0.3199461445
Matrix has 2 Positive Eigenvalue(s)
Approximate Correlation Matrix
of Parameter Estimates
mean
sigma
mean
1
-0.001353029
sigma
-0.001353029
1
Determinant = 0.9999981693
Matrix has 2 Positive Eigenvalue(s)
Approx
Pr > |t|
0.006579
0.025028
Gradient
Objective
Function
0.002392
0.000123
4
The final parameter estimates obtained using the Nelder-Mead method were 𝜇̂ = 3.998087 and 𝜎̂ =
1.999952. We will use these as our initial guesses of the parameters, and refine these values by using
Newton’s method.
The program for implementing Newton’s method is listed below, followed by the output. The final
MLE’s for the parameters (after 1 iteration) were found to be 𝜇̂ = 4.000000 and 𝜎̂ = 1.999999.
data Ex1;
input x @@;
datalines;
1 3 4 5 7
;
proc nlp data=Ex1 tech=newrap vardef=n covariance=h pcov phes;
max loglik;
parms mean=3.998087, sigma=1.999952;
bounds sigma > 1e-12;
loglik=-0.5*((x-mean)/sigma)**2-log(sigma);
;
run;
The SAS System
PROC NLP: Nonlinear Maximization
Gradient is computed using analytic formulas.
Hessian is computed using analytic formulas.
The SAS System
PROC NLP: Nonlinear Maximization
Optimization Start
Parameter Estimates
Gradient
Lower
Objective
Bound
N Parameter
Estimate
Function
Constraint
1 mean
3.998087
0.002391
.
2 sigma
1.999952
0.000122
1E-12
Upper
Bound
Constraint
.
.
Value of Objective Function = -5.965738193
mean
sigma
Hessian Matrix
mean
sigma
-1.250060002
-0.002391422
-0.002391422
-2.500303451
Determinant = 3.125523618
Matrix has Only Negative Eigenvalues
Active Constraints
Objective Function
Max Abs Gradient Element
The SAS System
PROC NLP: Nonlinear Maximization
Newton-Raphson Optimization with Line Search
Without Parameter Scaling
Parameter Estimates
2
Functions (Observations)
5
Lower Bounds
1
Upper Bounds
0
Optimization Start
0
-5.965738193
0.0023913648
Objective
Max Abs
Slope of
5
Iter
Restarts
Function
Calls
Active
Constraints
Objective
Function
Function
Change
Gradient
Element
Step
Size
Search
Direction
1
0
2
0
-5.96574
2.29E-6
2.294E-6
1.000
-458E-8
Optimization Results
1 Function Calls
2 Active Constraints
-5.965735903 Max Abs Gradient Element
-4.580223E-6 Ridge
Iterations
Hessian Calls
Objective Function
Slope of Search Direction
3
0
2.2942694E-6
0
ABSGCONV convergence criterion satisfied.
The SAS System
PROC NLP: Nonlinear Maximization
Optimization Results
Parameter Estimates
N Parameter
1 mean
2 sigma
Approx
Estimate
Std Err
t Value
4.000000
0.894427
4.472138
1.999999
0.632455
3.162280
Value of Objective Function = -5.965735903
mean
sigma
Hessian Matrix
mean
-1.250001147
-1.125884E-7
sigma
-1.125884E-7
-2.500005736
Determinant = 3.1250100374
Matrix has Only Negative Eigenvalues
Covariance Matrix 2: H = (NOBS/d) inv(G)
mean
sigma
mean
0.7999992658
-3.602817E-8
sigma
-3.602817E-8
0.3999990823
Factor sigm = 1
Determinant = 0.3199989722
Matrix has 2 Positive Eigenvalue(s)
Approximate Correlation Matrix
of Parameter Estimates
mean
sigma
mean
1
-6.368951E-8
sigma
-6.368951E-8
1
Determinant = 1
Matrix has 2 Positive Eigenvalue(s)
Approx
Pr > |t|
0.006566
0.025031
Gradient
Objective
Function
0.000000113
0.000002294
6
Lagarias, J. C.; Reeds, J. A.; Wright, M. H.; and Wright, P. E. (1998). “Convergence Properties of the NelderMead simplex method in low dimensions,” SIAM Journal on Optimization, 9, 1, pp. 112-147.
1
Download