Norm-regularized empirical risk minimization Sara van de Geer ETH Z¨ urich

advertisement
Norm-regularized empirical risk minimization
Sara van de Geer
ETH Zürich
The usefulness of `1 -norm regularization in high-dimensional problems is nowadays well recognized. A fundamental property of the `1 -norm that allows for
adaptive estimation and oracle results is its decomposability. The `1 -norm of a
vector β ∈ Rp is
p
X
kβk1 =
|βj |.
j=1
With decomposability of k · k1 we mean that for any set S ⊂ {1, . . . , p},
kβk1 = kβS k1 + kβ−S k1 ,
where βS = {βj : j ∈ S} and β−S := {βj : j ∈
/ S}. In this talk, we review
results for alternative norms Ω on Rp . Fix some β ∈ Rp . We call Ω weakly
decomposable at β if
Ω ≥ Ω+ + Ω− ,
where Ω+ and Ω− are semi-norms on Rp , and where Ω+ (β) = Ω(β) and Ω− (β) =
0. We will show sharp oracle results - depending on the (approximate) weak
decomposability of Ω at certain “oracle” values β - for empirical risk minimizers
with regularization penalty proportional to Ω. We also present an approach
based on the so-called triangle property. We say that Ω has the triangle property
at β if there exists semi-norms Ω+ and Ω− such that for all β 0
max z T (β 0 − β) ≥ Ω− (β 0 ) − Ω+ (β 0 − β).
z∈∂Ω(β)
Here, ∂Ω(β) is the sub-differential of Ω at β. Several examples with various
loss functions (least squares, minus log-likelihood) and penalties (wedge penalty,
nuclear norm penalty) will illustrate the theory.
1
Download