Robert Fullér | Péter Majlender On obtaining minimal variability OWA operator weights TUCS Technical Report No 429, November 2001 On obtaining minimal variability OWA operator weights Robert Fullér Department of Operations Research Eötvös Loránd University Pázmány Péter sétány 1C, P. O. Box 120, H-1518 Budapest, Hungary e-mail: rfuller@cs.elte.hu and Institute for Advanced Management Systems Research Åbo Akademi University, Lemminkäinengatan 14B, Åbo, Finland e-mail: rfuller@mail.abo.fi Péter Majlender Turku Centre for Computer Sceince Institute for Advanced Management Systems Research Åbo Akademi University, Lemminkäinengatan 14B, Åbo, Finland e-mail: peter.majlender@mail.abo.fi TUCS Technical Report No 429, November 2001 Abstract One important issue in the theory of Ordered Weighted Averaging (OWA) operators is the determination of the associated weights. One of the first approaches, suggested by O’Hagan, determines a special class of OWA operators having maximal entropy of the OWA weights for a given level of orness; algorithmically it is based on the solution of a constrained optimization problem. Another consideration that may be of interest to a decision maker involves the variability associated with a weighting vector. In particular, a decision maker may desire low variability associated with a chosen weighting vector. In this paper, using the Kuhn-Tucker second-order sufficiency conditions for optimality, we shall analytically derive the minimal variablity weighting vector for any level of orness. Keywords: Multiple criteria analysis, Fuzzy sets, OWA operator, Lagrange multiplier 1 Introduction The process of information aggregation appears in many applications related to the development of intelligent systems. One sees aggregation in neural networks, fuzzy logic controllers, vision systems, expert systems and multi-criteria decision aids. In [6] Yager introduced a new aggregation technique based on the ordered weighted averaging operators. Definition 1.1. An OWA operator of dimension n is a mapping F : Rn → R that has an associated weighting vector W = (w1 , . . . , wn )T of having the properties w1 + · · · + wn = 1, 0 ≤ wi ≤ 1, i = 1, . . . , n, and such that F (a1 , . . . , an ) = n X wi bi , i=1 where bj is the jth largest element of the collection of the aggregated objects {a1 , . . . , an }. A fundamental aspect of this operator is the re-ordering step, in particular an aggregate ai is not associated with a particular weight wi but rather a weight is associated with a particular ordered position of aggregate. When we view the OWA weights as a column vector we shall find it convenient to refer to the weights with the low indices as weights at the top and those with the higher indices with weights at the bottom. It is noted that different OWA operators are distinguished by their weighting function. In [6], Yager introduced a measure of orness associated with the weighting vector W of an OWA operator, defined as n X n−i · wi . orness(W ) = n−1 i=1 and it characterizes the degree to which the aggregation is like an or operation. It is clear that orness(W ) ∈ [0, 1] holds for any weighting vector. It is clear that the actual type of aggregation performed by an OWA operator depends upon the form of the weighting vector [9]. A number of approaches have been suggested for obtaining the associated weights, i.e., quantifier guided aggregation [6, 7], exponential smoothing [3] and learning [10]. Another approach, suggested by O’Hagan [5], determines a special class of OWA operators having maximal entropy of the OWA weights for a given level of orness. This approach is based on the solution of he following mathematical programming problem: maximize disp(W ) = − n X wi ln wi i=1 n X n−i · wi = α, 0 ≤ α ≤ 1 subject to orness(W ) = n−1 i=1 w1 + · · · + wn = 1, 0 ≤ wi , i = 1, . . . , n. 1 (1) In [4] we transfered problem (1) to a polinomial equation which is then was solved to determine the maximal entropy OWA weights. Another interesting question is to determine the minimal variablity weighting vector under given level of orness [8]. We shall meausure the variance of a given weighting vector as n X 1 · (wi − E(W ))2 n i=1 X 2 n n 1 n 1 1X 1X 2 wi − wi = wi2 − 2 . = n n n n D 2 (W ) = i=1 i=1 i=1 where E(W ) = (w1 + · · · + wn )/ stands for the arithmetic mean of weights. To obtain minimal variablity OWA weights under given level of orness we need to solve the following constrained mathematical programming problem D 2 (W ) = minimize n 1 X 1 wi2 − 2 · n n i=1 subject to orness(w) = n X i=1 n−i · wi = α, 0 ≤ α ≤ 1, n−1 (2) w1 + · · · + wn = 1, 0 ≤ wi , i = 1, . . . , n. Using the Kuhn-Tucker second-order sufficiency conditions for optimality, we shall solve problem (2) analytically and derive the exact minimal variablity OWA weights for any level of orness. 2 Obtaining minimal variability OWA weights Let us consider the constrained optimization problem (2). First we note that if n = 2 then from orness(w1 , w2 ) = α the optimal weights are uniquely defined as w1∗ = α and w2∗ = 1 − α. Furthemore, if α = 0 or α = 1 then the associated weighting vectors are uniquely defined as (0, 0, . . . , 0, 1)T and (1, 0, . . . , 0, 0)T , respectively, with variability D2 (1, 0, . . . , 0, 0) = D 2 (0, 0, . . . , 0, 1) = 1 1 − 2. n n Suppose now that n ≥ 3 and 0 < α < 1. Let us L(W, λ1 , λ2 ) = X X n n n 1X 1 n−i wi2 − 2 + λ1 wi − α + λ2 wi − 1 . n n n−1 i=1 i=1 i=1 denote the Lagrange function of constrained optimization problem (2), where 2 λ1 and λ2 are real numbers. Then the partial derivatives of L are computed as ∂L 2wj n−j = + λ1 + · λ2 = 0, 1 ≤ j ≤ n, ∂wj n n−1 n X ∂L wi − 1 = 0, = ∂λ1 ∂L = ∂λ2 i=1 n X i=1 (3) n−i · wi − α = 0. n−1 We shall suppose that the optimal weighting vector has the following form W = (0, . . . , 0, wp , . . . , wq , 0 . . . , 0)T (4) where 1 ≤ p < q ≤ n and use the notation I{p,q} = {p, p + 1, . . . , q − 1, q}, for the indexes from p to q. So, wj = 0 if j ∈ / I{p,q} and wj ≥ 0 if j ∈ I{p,q} . For j = p we find that ∂L 2wp n−p = + λ1 + · λ2 = 0 ∂wp n n−1 and for j = q we get ∂L 2wq n−q = + λ1 + · λ2 = 0 ∂wq n n−1 That is, 2(wp − wq ) q − p + · λ2 = 0 n n−1 and therefore, the optimal values of λ1 and λ2 (denoted by λ∗1 and λ∗2 ) should satisfy the following equations 2 n−q n−p n−1 2 ∗ λ1 = · wp − · wq · · (wq − wp ). (5) and λ∗2 = n q−p q−p q−p n Substituting λ∗1 for λ1 and λ∗2 for λ2 in (3) we get 2 2 n−q n−p n−j n−1 2 · wj + · wp − · wq + · · · (wq − wp ) = 0. n n q−p q−p n−1 q−p n That is the jth optimal weight should satisfy the equation wj∗ = j−p q−j · wp + · wq , j ∈ I{p,q} . q−p q−p From representation (4) we get q X wi∗ = 1, i=p 3 (6) That is, q X q−i i−p · wp + · wq q−p q−p i=p = 1, i.e. wp + wq = 2 . q−p+1 From the constraint orness(w) = α we find q q q X X X n−i n−i q−i n−i i−p · wi = · · wp + · · wq = α n−1 n−1 q−p n−1 q−p i=p i=p i=p that is, 2(2q + p − 2) − 6(n − 1)(1 − α) . (q − p + 1)(q − p + 2) (7) 6(n − 1)(1 − α) − 2(q + 2p − 4) 2 − wp∗ = . q−p+1 (q − p + 1)(q − p + 2) (8) wp∗ = and wq∗ = The optimal weighting vector W ∗ = (0, . . . , 0, wp∗ , . . . , wq∗ , 0 . . . , 0)T is feasible if and only if wp∗ , wq∗ ∈ [0, 1], because according to (6) any other wj∗ , j ∈ I{p,q} is computed as their convex linear combination. Using formulas (7) and (8) we find 1 2q + p − 2 1 q + 2p − 4 ∗ ∗ wp , wq ∈ [0, 1] ⇐⇒ α ∈ 1 − · ,1 − · 3 n−1 3 n−1 The following (disjunctive) partition of the unit interval (0, 1) will be crucial in finding an optimal solution to problem (2): (0, 1) = n−1 [ Jr,n ∪ J1,n ∪ n−1 [ J1,s . (9) s=2 r=2 where Jr,n J1,n J1,s 1 2n + r − 2 1 2n + r − 3 = 1− · , 1− · , r = 2, . . . , n − 1, 3 n−1 3 n−1 1 2n − 1 1 n−2 = 1− · , 1− · , 3 n−1 3 n−1 1 s−2 1 s−1 , 1− · , s = 2, . . . , n − 1. = 1− · 3 n−1 3 n−1 Consider again problem (2) and suppose that α ∈ Jr,s for some r and s from partition (9). Such r and s always exist for any α ∈ (0, 1), furthermore, r = 1 or s = n should hold. 4 Then W ∗ = (0, . . . , 0, wr∗ , . . . , ws∗ , 0 . . . , 0)T (10) where wj∗ = 0, if j ∈ / I{r,s} , 2(2s + r − 2) − 6(n − 1)(1 − α) , (s − r + 1)(s − r + 2) 6(n − 1)(1 − α) − 2(s + 2r − 4) ws∗ = , (s − r + 1)(s − r + 2) s−j j −r wj∗ = · wr + · ws , if j ∈ I{r+1,s−1} . s−r s−r wr∗ = (11) and I{r+1,s−1} = {r + 1, . . . , s − 1}. Furthermore, from the construction of W ∗ it is clear that s n X X wi∗ = 1, wi∗ ≥ 0, i = 1, 2, . . . , n, wi∗ = i=r i=1 = α, that is, W ∗ is feasible for problem (2). We will show now that W ∗ , defined by (10), satisfies the Kuhn-Tucker secondorder sufficiency conditions for optimality ([2], page 58). Namely, and orness(W ∗ ) (i) There exist λ∗1 , λ∗2 ∈ R and µ∗1 ≥ 0, . . . , µ∗n ≥ 0 such that, n−1 X n X n−i ∂ wi − 1 + λ∗2 · wi − α D 2 (W ) + λ∗1 ∂wk n−1 i=1 i=1 n X =0 µ∗j (−wj ) + ∗ W =W j=1 for 1 ≤ k ≤ n and µ∗j wj∗ = 0, j = 1, . . . , n. (ii) W ∗ is a regular point of the constraints, (iii) The Hessian matrix, n−1 X n X n−i ∂2 ∗ 2 ∗ w − 1 + λ · w − α D (W ) + λ i i 2 1 ∂W 2 n−1 i=1 i=1 n X ∗ µj (−wj ) + j=1 W =W ∗ is positive definite on T T T X̂ = y h1 y = 0, h2 y = 0 and gj y = 0 for all j with µj > 0 , where h1 = T 1 n−1 n−2 , , ..., ,0 , n−1 n−1 n−1 5 (12) (13) and h2 = (1, 1, . . . , 1, 1)T . (14) are the gradients of linear equality constraints, and jth z}|{ gj = (0, 0, . . . , 0, −1 , 0, 0, . . . , 0)T (15) is the gradient of the jth linear inequality constraint of problem (2). Proof. (i) According to (5) we get 2 n−s ∗ n−r ∗ ∗ λ1 = · · wr − · ws n s−r s−r and λ∗2 = n−1 2 · · (ws∗ − wr∗ ) s−r n and 2 n−k ∗ · wk∗ + λ∗1 + · λ − µk = 0. n n−1 2 for k = 1, . . . , n. If k ∈ I{r,s} then µ∗k 2 n−s ∗ n−r ∗ 2 s−k ∗ k−r ∗ ·w + · ws + · · wr − · ws = · n s−r r s−r n s−r s−r n−k n−1 2 + · · · (ws∗ − wr∗ ) n−1 s−r n 1 2 (s − k + n − s − n + k)wr∗ + (k − r − n + r + n − k)ws∗ = · n s−r = 0. If k ∈ / I{r,s} then wk∗ = 0. Then from the equality λ∗1 + n−k ∗ · λ − µk = 0, n−1 2 we find n−k ∗ µ∗k = λ∗1 + ·λ n−1 2 n−k n−1 2 2 n−s ∗ n−r ∗ · wr − · ws + · · · (ws∗ − wr∗ ) = · n s−r s−r n−1 s−r n 2 1 = · · (k − s)wr∗ + (r − k)ws∗ . n s−r We need to show that µ∗k ≥ 0 for k ∈ / I{r,s} . That is, 2(2s + r − 2) − 6(n − 1)(1 − α) (s − r + 1)(s − r + 2) 6(n − 1)(1 − α) − 2(s + 2r − 4) +(r − k) · ≥ 0. (s − r + 1)(s − r + 2) (k − s)wr∗ + (r − k)ws∗ = (k − s) · 6 (16) If r = 1 and s = n then we get that µ∗k = 0 for k = 1, . . . , n. Suppose now that r = 1 and s < n. In this case the inequality k > s > 1 should hold and (16) leads to the following requirement for α, α≥1− (s − 1)(3k − 2s − 2) . 3(n − 1)(2k − s − 1) On the other hand, from α ∈ J1,s and s < n we have 1 s−1 1 s−2 α∈ 1− · , 1− · , 3 n−1 3 n−1 and, therefore, α ≥1− 1 s−1 · 3 n−1 Finally, from the inequality 1− (s − 1)(3k − 2s − 2) 1 s−1 · ≥1− 3 n−1 3(n − 1)(2k − s − 1) we get that (16) holds. The proof of the remaining case (r > 1 and s = n) is carried out analogously. (ii) The gradient vectors of linear equality and inequality constraints are computed by (13), (14) and (15), respectively. If r = 1 and s = n then wj∗ 6= 0 for all j = 1, . . . , n. Then it is easy to see that h1 and h2 are linearly independent. If r = 1 and s < n then wj∗ = 0 for j = s + 1, . . . , n, and in this case jth z}|{ gj = (0, 0, . . . , 0, −1 , 0, 0, . . . , 0)T , for j = s + 1, . . . , n. Consider the matrix T , . . . , gnT ] ∈ Rn×(n−s+2) . G = [hT1 , hT2 , gs+1 Then the determinant of the lower-left submatrix of dimension (n − s + 2) × (n − s + 2) of G is n−s+1 1 0 ... 0 n−1 n−s n − s + 1 1 0 . . . 0 1 1 n−1 n−s n − 1 = (−1)n−s n−s−1 = (−1) n − s n − 1 1 −1 . . . 0 1 n−1 n − 1 .. .. .. . . . . . 0 . 0 1 0 . . . −1 which means that the columns of G are linearly independent, and therefore, the system {h1 , h2 , gs+1 , . . . , gn }, 7 is linearly independent. If r > 1 and = n then wj∗ = 0 for j = 1, . . . , r − 1, and in this case jth z}|{ gj = (0, 0, . . . , 0, −1 , 0, 0, . . . , 0)T , for j = 1, . . . , r − 1. Consider the matrix T F = [hT1 , hT2 , g1T , . . . , gr−1 ] ∈ Rn×(r+1) . Then the determinant of the upper-left submatrix of dimension (r + 1) × (r + 1) of F is n−1 1 −1 . . . 0 n−1 .. .. .. . . . . . 0 . n−r n−r+1 1 1 r−1 n − 1 = 1 0 . . . −1 = (−1) (−1)r−1 n−1 n − r − 1 n − 1 n−r 1 n − 1 1 0 ... 0 n−1 n−r−1 1 0 . . . 0 n−1 which means that the columns of F are linearly independent, and therefore, the system {h1 , h2 , g1 , . . . , gr−1 }, is linearly independent. So W ∗ is a regular point for problem (2). (iii) Let us introduce the notation K(W ) = D 2 (W ) + λ∗1 X n i=1 X n−1 n X n−i ∗ µ∗j (−wj ). · wi − α + wi − 1 + λ2 n−1 i=1 j=1 The Hessian matrix of K at W ∗ is ∂2 ∂2 2 2 K(W ) D (W ) = = · δkj , ∂wk ∂wj ∂wk ∂wj n W =W ∗ W =W ∗ where δjk = That is, 1 if j = k 0 otherwise. 1 0 ... 0 2 0 1 . . . 0 K ′′ (W ∗ ) = · . . . . n .. .. . . .. 0 0 ... 1 which is a positive definite matrix on Rn . 8 So, the objective function D2 (W ) has a local minimum at point W = W ∗ on n n X X n−i n wi = 1, X = W ∈ R W ≥ 0, · wi = α (17) n−1 i=1 i=1 where X is the set of feasible solutions of problem (2). Taking into consideration that D2 : Rn → R is a strictly convex, bounded and continuous function, and X is a covex and compact subset of Rn , we can conclude that D 2 attains its (unique) global minimum on X at point W ∗ . 3 Example In this Section we will determine the minimal variability five-dimensional weighting vector under orness levels α = 0, 0.1, . . . , 0.9 and 1.0. First, we construct the corresponding partition as (0, 1) = 4 [ Jr,5 ∪ J1,5 ∪ Jr,5 = J1,s . s=2 r=2 where 4 [ 4−r 5−r 1 5−r−1 1 5−r · , · , = , 3 5−1 3 5−1 12 12 for r = 2, 3, 4 and J1,5 = 1 5 − 2 1 10 − 1 · , · 3 5−1 3 5−1 = 3 9 , , 12 12 and 1 s−2 13 − s 14 − s 1 s−1 , 1− · , = , J1,s = 1 − · 3 5−1 3 5−1 12 12 for s = 2, 3, 4, and, therefore we get, 1 1 2 2 3 3 9 (0, 1) = 0, , , , ∪ ∪ ∪ 12 12 12 12 12 12 12 9 10 10 11 11 12 ∪ , , , ∪ ∪ . 12 12 12 12 12 12 Without loss of generality we can assume that α < 0.5, because if a weighting vector W is optimal for problem (2) under some given degree of orness, α < 0.5, then its reverse, denoted by W R , and defined as wiR = wn−i+1 is also optimal for problem (2) under degree of orness (1 − α). Really, as was shown by Yager [7], we find that D 2 (W R ) = D 2 (W ) and orness(W R ) = 1 − orness(W ). Therefore, for any α > 0.5, we can solve problem (2) by solving it for level of orness (1 − α) and then taking the reverse of that solution. Then we obtain the optimal weights from (11) as follows 9 • if α = 0 then W ∗ (α) = W ∗ (0) = (0, 0, . . . , 0, 1)T and, therefore, W ∗ (1) = (W ∗ (0))R = (1, 0, . . . , 0, 0)T . • if α = 0.1 then 1 2 , α ∈ J3,5 = , 12 12 and the associated minimal variablity weights are w1∗ (0.1) = 0, w2∗ (0.1) = 0, 2(10 + 3 − 2) − 6(5 − 1)(1 − 0.1) 0.4 = = 0.0333, (5 − 3 + 1)(5 − 3 + 2) 12 2 w5∗ (0.1) = − w3∗ (0.1) = 0.6334, 5−3+1 1 1 w4∗ (0.1) = · w3∗ (0.1) + · w5∗ (0.1) = 0.3333, 2 2 w3∗ (0.1) = So, W ∗ (α) = W ∗ (0.1) = (0, 0, 0.033, 0.333, 0.633)T , and, consequently, W ∗ (0.9) = (W ∗ (0.1))R = (0.633, 0.333, 0.033, 0, 0)T . with variance D 2 (W ∗ (0.1)) = 0.0625. • if α = 0.2 then 2 3 , α ∈ J2,5 = 12 12 and in a similar manner we find that the associated minimal variablity weighting vector is W ∗ (0.2) = (0.0, 0.04, 0.18, 0.32, 0.46)T , and, therefore, W ∗ (0.8) = (0.46, 0.32, 0.18, 0.04, 0.0)T , with variance D 2 (W ∗ (0.2)) = 0.0296. • if α = 0.3 then 3 9 , α ∈ J1,5 = 12 12 and in a similar manner we find that the associated minimal variablity weighting vector is W ∗ (0.3) = (0.04, 0.12, 0.20, 0.28, 0.36)T , and, therefore, W ∗ (0.7) = (0.36, 0.28, 0.20, 0.12, 0.04)T , with variance D 2 (W ∗ (0.3)) = 0.0128. 10 • if α = 0.4 then α ∈ J1,5 = 3 9 , 12 12 and in a similar manner we find that the associated minimal variablity weighting vector is W ∗ (0.4) = (0.12, 0.16, 0.20, 0.24, 0.28)T , and, therefore, W ∗ (0.6) = (0.28, 0.24, 0.20, 0.16, 0.12)T , with variance D 2 (W ∗ (0.4)) = 0.0032. • if α = 0.5 then W ∗ (0.5) = (0.2, 0.2, 0.2, 0.2, 0.2)T . with variance D 2 (W ∗ (0.5)) = 0. 4 Summary We have extended the power of decision making with OWA operators by introducing considerations of minimizing variability into the process of selecting the optimal alternative. Of particular significance in this work is the development of a methodology to calculate the minimal variability weights in an analytic way. References [1] R.G. Brown, Smoothing, Forecasting and Prediction of Discrete Time Series (Prentice-Hall, Englewood Cliffs, 1963). [2] V. Chankong and Y. Y. Haimes, Multiobjective Decision Making: Theory and Methodology (North-Holland, 1983). [3] D. Filev and R. R. Yager, On the issue of obtaining OWA operator weights, Fuzzy Sets and Systems, 94(1998) 157-169. [4] R. Fullér and P. Majlender, An analytic approach for obtaining maximal entropy OWA operator weights, Fuzzy Sets and Systems, 124(2001) 53-57. [5] M. O’Hagan, Aggregating template or rule antecedents in real-time expert systems with fuzzy set logic, in: Proc. 22nd Annual IEEE Asilomar Conf. Signals, Systems, Computers, Pacific Grove, CA, 1988 81-689. [6] R.R.Yager, Ordered weighted averaging aggregation operators in multicriteria decision making, IEEE Trans. on Systems, Man and Cybernetics, 18(1988) 183-190. 11 [7] R.R.Yager, Families of OWA operators, Fuzzy Sets and Systems, 59(1993) 125-148. [8] R.R.Yager, On the Inclusion of variance in decision making under uncertainty, International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, 4(1995) 401-419. [9] R.R.Yager, On the analytic representation of the Leximin ordering and its application to flexible constraint propagation, European Journal of Operational Research, 102(1997) 176-192. [10] R. R. Yager and D. Filev, Induced ordered weighted averaging operators, IEEE Trans. on Systems, Man and Cybernetics – Part B: Cybernetics, 29(1999) 141-150. 12 Lemminkäisenkatu 14 A, 20520 Turku, Finland | www.tucs.fi University of Turku • Department of Information Technology • Department of Mathematics Åbo Akademi University • Department of Computer Science • Institute for Advanced Management Systems Research Turku School of Economics and Business Administration • Institute of Information Systems Sciences ISBN 952-12-0911-9 ISSN 1239-1891