Stochastic Proximal Gradient Consensus Over Time-Varying Multi-Agent Networks Mingyi Hong

advertisement
Stochastic Proximal Gradient Consensus Over
Time-Varying Multi-Agent Networks
Mingyi Hong
Joint work with Tsung-Hui Chang
IMSE and ECE Department,
Iowa State University
Presented at INFORMS 2015
Mingyi Hong (Iowa State University)
1 / 37
Main Content
Setup: Optimization over a time-varying multi-agent network
Mingyi Hong (Iowa State University)
2 / 37
Main Results
An algorithm for a large class of convex problems with rate guarantees
Connections among a number of popular algorithms
Mingyi Hong (Iowa State University)
3 / 37
Outline
1
Review of Distributed Optimization
2
The Proposed Algorithm
The Proposed Algorithms
Distributed Implementation
Convergence Analysis
3
Connection to Existing Methods
4
Numerical Results
5
Concluding Remarks
Mingyi Hong (Iowa State University)
3 / 37
Review of Distributed Optimization
Basic Setup
Consider the following convex optimization problem
N
min f (y) :=
y ∈R
∑ f i ( y ),
(P)
i =1
Each f i (y) is a convex and possibly nonsmooth function
A collection of N agents connected by a network:
1
Network defined by an undirected graph G = {V , E }
2
|V | = N vertices and |E | = E edges.
3
Each agent can only communicate with its immediate neighbors
Mingyi Hong (Iowa State University)
4 / 37
Review of Distributed Optimization
Basic Setup
Numerous applications in optimizing networked systems
1
Cloud computing [Foster et al 08]
2
Smart grid optimization [Gan et al 13] [Liu-Zhu 14][Kekatos 13]
3
Distributed learning [Mateos et al 10] [Boyd et al 11] [Bekkerman et al 12]
4
Communication and signal processing [Rabbat-Nowak 04] [Schizas et al 08]
[Giannakis et al 15]
5
Seismic Tomography [Zhao et al 15]
6
...
Mingyi Hong (Iowa State University)
5 / 37
Review of Distributed Optimization
The Algorithms
A lot of algorithms are available for problem (P)
1
The distributed subgradient (DSG) based methods
2
The Alternating Direction Method of Multiplier (ADMM) based methods
3
The Distributed Dual Averaging based methods
4
...
Algorithm families differ in applicable problems and convergence cond.
Mingyi Hong (Iowa State University)
6 / 37
Review of Distributed Optimization
The DSG Algorithm
Each agent i keeps a local copy of y, denoted as xi
Each agent i iteratively computes
xir+1 =
N
∑ wijr xrj − γr dri ,
∀ i ∈ V.
j =1
We used the following notations
1
dri ∈ ∂ f i (yri ): a subgradient of the local function f i
2
wijr ≥ 0: the weight for the link eij at iteration r
3
γr > 0: some stepsize parameter
Mingyi Hong (Iowa State University)
7 / 37
Review of Distributed Optimization
The DSG Algorithm (Cont.)
Compactly, the algorithm can be written in vector form
xr+1 = Wxr − γr dr
1
xr ∈ R: vector of the agents’ local variable
2
dr ∈ R: vector of subgradients
3
W: a row-stochastic weight matrix
Mingyi Hong (Iowa State University)
8 / 37
Review of Distributed Optimization
The DSG Algorithm (Cont.)
Convergence has been analyzed in many works [Nedić-Ozdaglar
09a][Nedić-Ozdaglar 09b]
√
The algorithm converges with a rate of O(ln(r )/ r ) [Chen 12]
Usually diminishing stepsize
The algorithm has been generalized to problems with
1
constraints [Nedić-Ozdaglar-Parrilo 10]
2
quantized messages [Nedić et al 08]
3
directed graphs [Nedić-Olshevsky 15]
4
stochastic gradients [Ram et al 10]
5
...
Accelerated versions with rates O(ln(r )/r ) [Chen 12] [Jakovetić et al 14]
Mingyi Hong (Iowa State University)
9 / 37
Review of Distributed Optimization
The EXTRA Algorithm
Recently, [Shi et al 14] proposed an EXTRA algorithm
xr+1 = Wxr −
1 r 1 r −1
b r −1
d + d
+ xr − Wx
β
β
b = 1 ( I + W ); f is assumed to be smooth; W is symmetric
where W
2
EXTRA is an error-corrected version of DSG
xr+1 = Wxr −
r
1 r
b ) x t −1
d + ∑ (W − W
β
t =1
It is shown that
1
A constant stepsize β can be used (with computable lower bound)
2
The algorithm converges with a (improved) rate of O(1/r )
Mingyi Hong (Iowa State University)
10 / 37
Review of Distributed Optimization
The ADMM Algorithm
The general ADMM solves the following two-block optimization problem
min
x,y
s.t.
f ( x ) + g(y)
Ax + By = c, x ∈ X, y ∈ Y
The augmented Lagrangian
ρ
L( x, y; λ) = f ( x ) + g(y) + hλ, c − Ax − Byi + kc − Ax − Byk2
2
The algorithm
1
Minimize L( x, y; λ) w.r.t. x
2
Minimize L( x, y; λ) w.r.t. y
3
λ ← λ + ρ(c − Ax − By)
Mingyi Hong (Iowa State University)
11 / 37
Review of Distributed Optimization
The ADMM for Network Consensus
For each link eij introduce two link variables zij , z ji
Reformulate problem (P) as [Schizas et al 08]
N
min
f ( x ) :=
∑ f i ( x i ),
i =1
s.t.
Mingyi Hong (Iowa State University)
xi = zij ,
x j = zij ,
xi = z ji ,
x j = z ji ,
∀ eij ∈ E .
12 / 37
Review of Distributed Optimization
The ADMM for Network Consensus (cont.)
The above problem is equivalent to
N
min
f ( x ) :=
∑ f i ( x i ),
i =1
s.t.
(1)
Ax + Bz = 0
where A, B are matrices related to network topology
Converges with O(1/r ) rate [Wei-Ozdaglar 13]
When the objective is smooth and strongly convex, linear convergence
has been shown in [Shi et al 14]
For a star-network, convergence
to stationary solution for nonconvex
√
problem (with rate O(1/ r )) [H.-Luo-Razaviyayn 14]
Mingyi Hong (Iowa State University)
13 / 37
Review of Distributed Optimization
Comparison of ADMM and DSG
Table: Comparison of ADMM and DSG.
Problem Type
Stepsize
Convergence Rate
Network Topology
Subproblem
DSG
general convex
( a)
diminishing
√
O(ln(r )/ r )
dynamic
simple
ADMM
smooth/smooth+simple NS.
constant
O(1/r )
static(b)
difficult(c)
(a) Except [Shi et al 14], which uses a constant stepsize
(b) Except [Chang-H.-Wang 14] [Ling et al 15], gradient-type subproblem
(c) Except [Wei-Ozdaglar 13], random graph
Mingyi Hong (Iowa State University)
14 / 37
Review of Distributed Optimization
Comparison of ADMM and DSG
Connections?
Mingyi Hong (Iowa State University)
15 / 37
Review of Distributed Optimization
Outline
1
Review of Distributed Optimization
2
The Proposed Algorithm
The Proposed Algorithms
Distributed Implementation
Convergence Analysis
3
Connection to Existing Methods
4
Numerical Results
5
Concluding Remarks
Mingyi Hong (Iowa State University)
15 / 37
The Proposed Algorithm
Setup
The proposed method is ADMM based
We consider
N
min f (y) :=
∑
N
f i (y) =
i =1
1
∑ gi ( y ) + h i ( y ) ,
Each hi is lower-semicontinuous with easy “prox” operator
β
proxh (u) := min hi (y) +
y
2
(Q)
i =1
β
k y − u k2 .
2
Each gi has a Lipschitz continuous gradient, i.e., for some ρi > 0
k∇ gi (y) − ∇ gi (v)k ≤ Pi ky − vk, ∀ y, v ∈ dom(h), ∀ i.
Mingyi Hong (Iowa State University)
16 / 37
The Proposed Algorithm
Graph Structure
Both static and random time-varying graph
For random network assume that
1
At a given iteration G r is a subgraph of a connected graph G
2
Each link e has a probability of pe ∈ (0, 1] of being active
3
A node i is active if an active link connects to it
4
Each iteration the graph realization is independent
Mingyi Hong (Iowa State University)
17 / 37
The Proposed Algorithm
Gradient Information
Each agent has access to an estimate of the gradient g̃i ( xi , ξ i ) such that
E[ g̃i ( xi , ξ i )] = ∇ gi ( xi )
i
h
E k g̃i ( xi , ξ i ) − ∇ gi ( xi )k2 ≤ σ2 ,
∀i
Can be extended to allow only subgradient of the obj
Mingyi Hong (Iowa State University)
18 / 37
The Proposed Algorithm
The Augmented Lagrangian
The problem we solve is still given by
N
min
f ( x ) :=
∑ gi ( x i ) + h i ( x i ) ,
i =1
s.t.
Ax + Bz = 0
The augmented Lagrangian
N
LΓ ( x, z, λ) =
1
∑ gi (xi ) + hi (xi ) + hλ, Ax + Bzi + 2 k Ax + BzkΓ 2 .
i =1
A diagonal matrix Γ is used as the penalty parameter (one edge one ρij )
Γ := diag{ρij }ij∈E
Mingyi Hong (Iowa State University)
19 / 37
The Proposed Algorithm
The Proposed Algorithms
The DySPGC Algorithm
The proposed algorithm is named DySPGC (Dynamic Stochastic
Proximal Gradient Consensus)
It optimizes LΓ ( x, z, λ) using similar steps as ADMM
The x-step will be replaced by a proximal gradient step
Mingyi Hong (Iowa State University)
20 / 37
The Proposed Algorithm
The Proposed Algorithms
The DySPGC: Static Graph + Exact Gradient
Algorithm 1. PGC Algorithm
T x0 .
At iteration 0, let B T λ0 = 0, z0 = 12 M+
At each iteration r + 1, update the variable blocks by:
xr+1 = arg min h∇ g( xr ), x − xr i + h( x )
2 1
1
+ Ax + Bzr + Γ−1 λr + k x − xr k2Ω
2
2
Γ
2
1
r +1
r +1
−1 r z
= arg min Ax
+ Bz + Γ λ Γ
2
λr+1 = λr + Γ Axr+1 + Bzr+1
Mingyi Hong (Iowa State University)
21 / 37
The Proposed Algorithm
The Proposed Algorithms
The DySPGC: Static Graph + Exact Gradient
Algorithm 1. PGC Algorithm
T x0 .
At iteration 0, let B T λ0 = 0, z0 = 12 M+
At each iteration r + 1, update the variable blocks by:
xr+1 = arg min h∇ g( xr ), x − xr i + h( x )
1
1
+ Ax + Bzr + Γ−1 λr 2 + k x − xr k2Ω
2
2
Γ
1
r +1
−1 r 2
r +1
z
= arg min Ax
+ Bz + Γ λ Γ
2
λr+1 = λr + Γ Axr+1 + Bzr+1
Mingyi Hong (Iowa State University)
21 / 37
The Proposed Algorithm
The Proposed Algorithms
The DySPGC: Static Graph + Stochastic Gradient
Algorithm 2. SPGC Algorithm
T x0 .
At iteration 0, let B T λ0 = 0, z0 = 21 M+
At each iteration r + 1, update the variable blocks by:
E
D
xr+1 = arg min G̃ ( xr , ξ r+1 ), x − xr + h( x )
2 1
1
+ Ax + Bzr + Γ−1 λr + k x − xr k2Ω+η r+1 I
MN
2
2
Γ
2
1
zr+1 = arg min Axr+1 + Bz + Γ−1 λr Γ
2
r +1
r
r +1
r +1
λ
= λ + Γ Ax
+ Bz
Mingyi Hong (Iowa State University)
22 / 37
The Proposed Algorithm
The Proposed Algorithms
The DySPGC: Static Graph + Stochastic Gradient
Algorithm 2. SPGC Algorithm
T x0 .
At iteration 0, let B T λ0 = 0, z0 = 21 M+
At each iteration r + 1, update the variable blocks by:
D
E
xr+1 = arg min G̃ ( xr , ξ r+1 ), x − xr + h( x )
2 1
1
+ Ax + Bzr + Γ−1 λr + k x − xr k2Ω+η r+1 I
MN
2
2
Γ
2
1
zr+1 = arg min Axr+1 + Bz + Γ−1 λr 2
Γ
r +1
r
r +1
r +1
λ
= λ + Γ Ax
+ Bz
Mingyi Hong (Iowa State University)
22 / 37
The Proposed Algorithm
The Proposed Algorithms
The DySPGC: Dynamic Graph + Stochastic Gradient
Algorithm 3. DySPGC Algorithm
T x0 .
At iteration 0, let B T λ0 = 0, z0 = 12 M+
At each iteration r + 1, update the variable blocks by:
D
E
xr+1 = arg min G̃r+1 ( xr , ξ r+1 ), x − xr + hr+1 ( x )
2 1
1
+ Ar+1 x + Br+1 zr + Γ−1 λr + k x − xr k2Ωr+1 +η r+1 I
MN
2
2
Γ
xir+1 = xir , if i ∈
/ V r +1
2
1
zr+1 = arg min Ar+1 xr+1 + Br+1 z + Γ−1 λr 2
Γ
r +1
r +1
r
/A
zij = zij , if eij ∈
λ r +1 = λ r + Γ A r +1 x r +1 + B r +1 z r +1
Mingyi Hong (Iowa State University)
23 / 37
The Proposed Algorithm
The Proposed Algorithms
The DySPGC: Dynamic Graph + Stochastic Gradient
Algorithm 3. DySPGC Algorithm
T x0 .
At iteration 0, let B T λ0 = 0, z0 = 12 M+
At each iteration r + 1, update the variable blocks by:
D
E
xr+1 = arg min G̃r+1 ( xr , ξ r+1 ), x − xr + hr+1 ( x )
2 1
1
+ Ar+1 x + Br+1 zr + Γ−1 λr + k x − xr k2Ωr+1 +η r+1 I
MN
2
2
Γ
xir+1 = xir , if i ∈
/ V r +1
2
1
zr+1 = arg min Ar+1 xr+1 + Br+1 z + Γ−1 λr 2
Γ
r +1
r +1
r
/A
zij = zij , if eij ∈
λ r +1 = λ r + Γ A r +1 x r +1 + B r +1 z r +1
Mingyi Hong (Iowa State University)
23 / 37
The Proposed Algorithm
Distributed Implementation
Distributed Implementation
The algorithms admit distributed implementation
In particular, the PGC admits a single-variable characterization
Mingyi Hong (Iowa State University)
24 / 37
The Proposed Algorithm
Distributed Implementation
Implementation of PGC
Define a stepsize parameter as
β i :=
∑ (ρij + ρ ji ) + wi ,
∀ i.
j∈Ni
({ωi }: proximal parameters; {ρij }: penalty parameters for constraints)
Define a stepsize matrix Υ := diag([ β 1 , · · · , β N ]) 0.
Define a weight matrix W ∈ R N × N as (a row-stochastic matrix)

ρ ji +ρij
ρ ji +ρij


 ∑`∈Ni (ρ`i +ρi` )+ωi = βi , if eij ∈ E ,
ωi
(W [i, j]) =
= ωβii ,
∀ i = j, i ∈ V
∑`∈Ni (ρ`i +ρi` )+ωi



0,
otherwise,
Mingyi Hong (Iowa State University)
25 / 37
The Proposed Algorithm
Distributed Implementation
Implementation of PGC (cont.)
Implementation of PGC
Let ζ r ∈ ∂h( xr ) be some subgradient vector for the nonsmooth function; then
the PGC algorithm admits the following single-variable characterization
x r +1 − x r + Υ −1 ( ζ r +1 − ζ r )
1
= Υ−1 −∇ g ( xr ) + ∇ g xr−1 + Wxr − ( IN + W ) xr−1 .
2
In particular, for smooth problems
1
xr+1 = Wxr − Υ−1 ∇ g( xr ) + Υ−1 ∇ g( xr−1 ) + xr − ( IN + W ) xr−1 .
2
Mingyi Hong (Iowa State University)
26 / 37
The Proposed Algorithm
Convergence Analysis
Convergence Analysis
We analyze the (rate of) convergence of the proposed methods
Let us define a matrix of Lip-constants
Pe = diag([ P1 , · · · , PN ]).
Measure convergence rate by [Gao et al 14, Ouyang et al 14]
| f ( x̄r ) − f ( x ∗ ) |,
|
{z
}
objective gap
Mingyi Hong (Iowa State University)
and
k A x̄r + Bz̄r k
|
{z
}
consensus gap
27 / 37
The Proposed Algorithm
Convergence Analysis
Convergence Analysis
Table: Main Convergence Results.
Algorithm
Network Type
Gradient Type
Static
Static
Random
Random
Exact
Stochastic
Exact
Stochastic
Convergence Condition
ΥW + Υ
ΥW + Υ
Ω
Ω
2 Pe
2 Pe
Pe
Pe
Convergence Rate
O(1/r )
√
O(1/ r )
O(1/r )
√
O(1/ r )
Note: For the exact gradient case, stepsize β can be halved if only
convergence is needed
Mingyi Hong (Iowa State University)
28 / 37
The Proposed Algorithm
Convergence Analysis
Outline
1
Review of Distributed Optimization
2
The Proposed Algorithm
The Proposed Algorithms
Distributed Implementation
Convergence Analysis
3
Connection to Existing Methods
4
Numerical Results
5
Concluding Remarks
Mingyi Hong (Iowa State University)
28 / 37
Connection to Existing Methods
Comparison with Different Algorithms
Algorithm
Conn. with DySPCA
Special Setting
EXTRA [Shi 14]
DSG [Nedić 09]
IC-ADMM [Chang14]
DLM [Ling 15]
PG-EXTRA [Shi 15]
Special Case
Different x-step
Special Case
Special Case
Special Case
Static, h ≡ 0, W = W T , G̃ = ∇ g
Static, g smooth, G̃ = ∇ g
Static, G̃ = ∇ g, g composite
Static, G̃ = ∇ g, h ≡ 0, β ij = β, ρij = ρ
Static, W = W T , G̃ = ∇ g
Mingyi Hong (Iowa State University)
29 / 37
Connection to Existing Methods
Comparison with Different Algorithms
Figure: Relationship among different algorithms
Mingyi Hong (Iowa State University)
30 / 37
Connection to Existing Methods
The EXTRA Related Algorithms
The EXTRA related algorithms (for either smooth or nonsmooth cases)
[Shi et al 14, 15] are special cases of DySPCA
1
Symmetric weight matrix W = W T
2
Exact gradient
3
Scalar stepsize
4
Static graph
Mingyi Hong (Iowa State University)
31 / 37
Connection to Existing Methods
The DSG Method
Replacing our x-update by (setting the dual variable λr = 0)
xr+1 = arg min h∇ g( xr ), x − xr i + h0, Ax + Bzr i
1
1
+ k Ax + Bzr k2Γ + k x − xr k2Ω
2
2
Let β i = β j = β, then the PGC algorithm becomes
1
e r.
xr+1 = − ∇ g( xr ) + Wx
β
e = 1 (I + W)
with W
2
This is precisely the DSG iterates
Convergence not covered by our results
Mingyi Hong (Iowa State University)
32 / 37
Connection to Existing Methods
Outline
1
Review of Distributed Optimization
2
The Proposed Algorithm
The Proposed Algorithms
Distributed Implementation
Convergence Analysis
3
Connection to Existing Methods
4
Numerical Results
5
Concluding Remarks
Mingyi Hong (Iowa State University)
32 / 37
Numerical Results
Numerical Results
Some preliminary numerical results by solving a LASSO problem
min
x
1
2
∑iN=1 k Ai x − bi k2 + νk x k1
where Ai ∈ RK × M , bi ∈ RK
The parameters: N = 16, M = 100, ν = 0.1, K = 200
Data matrix randomly generated
Static graphs, generated according to the method proposed in
[Yildiz-Scaglione 08], with a radius parameter set to 0.4.
Mingyi Hong (Iowa State University)
33 / 37
Numerical Results
Comparison between PG-EXTRA and PGC
Stepsize of PG-EXTRA chosen according to conditions given in [Shi 14]
W is Metropolis constant edge weight matrix
PCG: wi = Pi /2, ρij = 10−3
Figure: Comparison between PG-EXTRA and PGC
Mingyi Hong (Iowa State University)
34 / 37
Numerical Results
Comparison between DSG and Stochastic PGC
Stepsize of DSG chosen as a small constant
σ2 = 0.1
W is Metropolis constant edge weight matrix
SPCG: wi = Pi , ρij = 10−3
Figure: Comparison between DSG and SPGC
Mingyi Hong (Iowa State University)
35 / 37
Numerical Results
Outline
1
Review of Distributed Optimization
2
The Proposed Algorithm
The Proposed Algorithms
Distributed Implementation
Convergence Analysis
3
Connection to Existing Methods
4
Numerical Results
5
Concluding Remarks
Mingyi Hong (Iowa State University)
35 / 37
Concluding Remarks
Summary
Develop a DySPGC algorithm for multi-agent optimization
It can deal with
1
Stochastic gradient
2
Time-varying networks
3
Nonsmooth composite objective
Convergence rate guarantee for various scenarios
Mingyi Hong (Iowa State University)
36 / 37
Concluding Remarks
Future Work/Generalization
Identified the relation between DSG-type and ADMM-type methods
Allows for significant generalization
1
Acceleration [Ouyang et al 15]
2
Variance Reduction for local problem when f i is a finite sum
M
f i ( xi ) =
∑ ` j ( xi )
j =1
3
Inexact x-subproblems (using, e.g., Conditional-Gradient)
4
Nonconvex problems [H.-Luo-Razaviyayn 14]
5
...
Mingyi Hong (Iowa State University)
37 / 37
Concluding Remarks
Thank You!
Mingyi Hong (Iowa State University)
37 / 37
Concluding Remarks
Parameter Selection
It is easy to pick various parameters in various different scenarios
Case A: The weight matrix W is given and symmetric
1
We must have β i = β j = β;
2
For any fixed β, can compute (Ω, {ρij })
3
Increase β to satisfy convergence condition
Case B: The user has the freedom in picking ({ρij }, Ω)
1
For any set of ({ρij }, Ω), can compute W and { β i }
2
Increase Ω to satisfy convergence condition
In either case, the convergence condition can be verified by local agents
Mingyi Hong (Iowa State University)
37 / 37
Concluding Remarks
Case 1: Exact Gradient with Static Graph
Convergence for PGC Algorithm
Suppose that problem (Q) has a nonempty set of optimal solutions X ∗ 6= ∅.
Suppose G r = G for all r and G is connected. Then the PGC converges to a
primal-dual optimal solution if
T
e
= ΥW + Υ P.
2Ω + M+ ΞM+
T is some matrix related to network topology
M+ ΞM+
A sufficient condition is
Ω Pe
or ωi > Pi for all i ∈ V ; can be determined locally.
Mingyi Hong (Iowa State University)
37 / 37
Concluding Remarks
Case 2: Stochastic Gradient with Static Graph
Convergence for SPGC Algorithm
Assume that dom(h) is a bounded set. Suppose that the following conditions
hold
√
η r+1 = r + 1, ∀ r,
and the stepsize matrix satisfies
T
e
2Ω + M+ ΞM+
= ΥW + Υ 2 P.
(8)
Then at a given iteration r, we have
σ2
d2
1
E [ f ( x̄r ) − f ( x ∗ )] + ρk A x̄r + Bz̄r k ≤ √ + √x +
r 2 r 2r
d2z + d2λ (ρ) + max ωi d2x
i
where dλ (ρ) > 0, d x > 0, dz > 0 are some problem dependent constants.
Mingyi Hong (Iowa State University)
37 / 37
Concluding Remarks
Case 2: Stochastic Gradient with Static Graph (cont.)
√
Both objective value and constraint violation converge with rate O(1/ r )
Easy to extend to the exact gradient case, with rate O(1/r )
Requires larger proximal parameter Ω than Case 1
Mingyi Hong (Iowa State University)
37 / 37
Concluding Remarks
Case 3: Exact Gradient with Time-Varying Graph
Convergence for DySPGC Algorithm
Suppose that problem (Q) has a nonempty set of optimal solutions X ∗ 6= ∅,
e( xr , ξ r+1 ) = ∇ g( xr ) for all r. Suppose the graph is randomly generated.
and G
If we choose the following stepsize
Ω
1e
P
2
then ( xr , zr , λr ) that converges w.p.1. to a primal-dual solution.
1
The stepsize is more restrictive than Case 1 (not dependent on graph)
2
Convergence is in the sense of with probability 1
Mingyi Hong (Iowa State University)
37 / 37
Concluding Remarks
Case 4: Stochastic Gradient with Time-Varying Graph
Convergence for DySPGC Algorithm
Suppose {wt } = { x t , zt , λt } is a sequence generated by DySPCA, and that
√
e
η r+1 = r + 1, ∀ r, and Ω P.
Then we have
E [ f ( x̄r ) − f ( x ∗ ) + ρk A x̄r + Bz̄r k]
σ2
d2x
1
2
2
2
≤ √ + √ +
2d J + dz + dλ (ρ) + max ωi d x
i
r 2 r 2r
where dλ (ρ), d J , d x , dy are some positive constants.
Mingyi Hong (Iowa State University)
37 / 37
Download