STAT 200C: High-dimensional Statistics
Arash A. Amini
April 14, 2021
1 / 29
Linear regression setup
sample
T
• The data is (y , X ) where y 2 Rn and X 2 Rn⇥d , and the model
y = X ✓⇤ + w .
• ✓⇤ 2 Rd is an unknown parameter.
• w 2 Rn is the vector of noise variables.
o⇤
• Equivalently,
yi = h✓ , xi i + wi , i = 1, . . . , n
where xi 2 Rd is the nth row of X :
2
6
6
X =6
4
• Recall h✓⇤ , xi i =
L
Pd
⇤
j=1 ✓j xij .
|
x1T
x2T
..
.
xnT
{z
d
3
M Nz
d
kn E IR
7
7
7
5
}
k
Intrested in thecasewhere n
d
2 / 29
Sparsity models
XO p
why
d
has
may
solution in
gurl
tu mollis
• When n < d, no hope of estimating ✓⇤ ,
fifiable
⇤
• unless we impose some sort of of low-dimensional model on ✓ .
• Support of ✓⇤ (recall [d] = {1, . . . , d}):
fog
supp(✓⇤ ) := S(✓⇤ ) = j 2 [d] : ✓j⇤ 6= 0 .
⇤
• Hard sparsity assumption: s = |S(✓ )| ⌧ d.
s
• Weaker sparsity assumption via `q balls for q 2 [0, 1]
11011g
q I
I
toil't T
thisis proper
n
d
Bq (Rq ) = ✓ 2 R :
• q = gives `1 ball.
b
d
X
j=1
q
o
|✓j | Rq .
11011g
E
ER
11 4,5
suppl0 11 3
Supp101
O cBg Rg
Rafa
• q = 0 the `0 ball, same as hard sparsity:
go.com
this is Tpaisti.IE k✓⇤ k0 := |S(✓⇤ )| = # j; ✓j⇤ 6= 0
rot adnormgl
yay
Evil
subadditive 110 to'll S
110149
110 11
Es
110449
3 / 29
9
u
unit by ball
unite ball
9
1
go
I
0
g as
75
t l
It
e
a
Tn
11
II
f
i
Y
(from HDS book)
4 / 29
Basis pursuit
Try y xp I
110 1115
nad
t
• Consider the noiseless case y = X ✓⇤ . a
• We assume that k✓⇤ k0 is small.
solve
y Xsp
Isis's
foray
min k✓k0
✓2Rd
solution
OT andOT both
11104 021 0
subject to
f this
1 Of 02 0
2sEn_
Xs is full rank
ISIS
t foray
s
02110525
1109
• Ideal program to solve:
canwe recover O
y = X✓
• k · k0 is highly non-convex, relax to k · k1 :
min k✓k1
✓2Rd
subject to
y = X✓
0 1D
110 DH
This is called basis pursuit (regression).
• (1) is a convex program.
D
• In fact, can be written as a linear program1 .
(1)
Beker X
i
tttDH
110
for some t
HOH
EQ
O
• Global solutions can be obtained very efficiently.
1 Exercise:
Introduce auxiliary variables sj 2 R and note that minimizing
|✓j | sj gives the `1 norm of ✓.
P
j sj
subject to
5 / 29
Restricted null space property (RNS)
d
seeds p
• Define
d
2 Rd : k
C(S) = {
S c k1
k
S k1 }.
f
(2)
Theorem 1
The following two are equivalent:
• For any ✓⇤ 2 Rd with support ✓ S, the basis pursuit program (1) applied to
the data (y = X ✓⇤ , X ) has unique solution ✓b = ✓⇤ .
• The restricted null space (RNS) property holds, i.e.,
C(S) \ ker(X ) = {0}.
iii iii iii
Deas
a
DEUS
e
y
(3)
i
AER
6 / 29
r
Proof
• Consider the tangent cone to the `1 ball (of radius k✓⇤ k1 ) at ✓⇤ :
T(✓⇤ ) = {
t
2 Rd : k✓⇤ + t k1 k✓⇤ k1 , for some t > 0.}
i.e., the set of descent directions for `1 norm at point ✓⇤ .
• Feasible set is ✓⇤ + ker(X ), i.e.
• ker(X ) is the set of feasible directions
=✓
✓⇤ .
• Hence, there is a minimizer other than ✓⇤ if and only if
T(✓⇤ ) \ ker(X ) 6= {0}
• It is enough to show that
C(S) =
[
(4)
T(✓⇤ ).
✓ ⇤ 2Rd : supp(✓ ⇤ )✓S
7 / 29
B1
✓⇤
• (1)
i
⇤
T(✓(2)
)
Ker(X)
⇤
T(✓(1)
)
d = 2,
11011 L
•✓⇤
to
(2)
⇤ = (0, 1), ✓ ⇤ = (0,
[d] = {1, 2}, S = {2}, ✓(1)
(2)
C(S) = {( 1 , 2 ) : | 1 | | 2 |}.
C(S)
1).
8 / 29
• It is enough to show that
Ot's
T(✓⇤ )
(5)
✓ ⇤ 2Rd : supp(✓ ⇤ )✓S
2 T1 (✓⇤ ) i↵2
• We have
HO't
C(S) =
To 05 0
ES
HOT
[
Dlf E 110 11
110511
DstDscH E
k
k✓S⇤ k1
S c k1
k✓S⇤ +
S k1
zo't sa suppl0
ES
2 T1 (✓⇤ ) for some ✓⇤ 2 Rd s.t. supp(✓⇤ ) ⇢ S i↵
h
i
HDsoll
t
1
Il
⇤
⇤
1105 Ds
k S c k1 sup k✓S k1 k✓S + S k1
• We have
t
d
✓S⇤ 2Rs
ecomposibihy
ofle
decorpse
oursupport
=k
S k1
disjoint
2 Let
subset.
T1 (✓ ⇤ ) be the subset of T(✓ ⇤ ) where t = 1, and argue that w.l.o.g. we can work this
I
D
110 1DH E 11011
9 / 29
Sufficient conditions for restricted nullspace inCgi
• [d] := {1, . . . , d}
• For a matrix X 2 Rd , let Xj be its jth column (for j 2 [d]).
• The pairwise incoherence of X is defined as
hXi , Xj i
n
PW (X ) := max
i, j2[d]
Ya
1 t
he
1{i = j}
• Alternative form: X T X is the Gram matrix of X ,
• (X T X )ij = hXi , Xj i.
did
X X
d
T
Ip k1
n
norm of the matrix.
PW (X )
where k · k1 is the vector `1
:= k
Proposition 1 (HDS Prop. 7.1)
(Uniform) restricted nullspace holds for all S with |S| s if
PW (X )
• Proof: Exercise 7.3.
1
3s
10 / 29
Restricted isometry property (RIP)
• A more relaxed condition:
Definition 1 (RIP)
X 2 Rn⇥d satisfies a restricted isometry property (RIP) of order s with constant
s (X ) > 0 if
XST XS
|||
n
I |||op
s (X ),
for all S with |S| s
• PW incoherence is close to RIP with s = 2;
p
• for example, when kXj / nk2 = 1 for all j, we have
2 (X )
• In general, for any s
FTII
=
PW (X ).
2, (Exercise 7.4)
PW (X )
s (X )
s
PW (X ).
11 / 29
Definition (RIP)
X 2 Rn⇥d satisfies a restricted isometry property (RIP) of order s with constant
s (X ) > 0 if
Jhs
X X
|||
T
S
n
S
s
I |||op
s (X ),
for all S with |S| s
• Let xiT be the i th row of X . Consider the sample covariance matrix:
n
X
1
1
b := X T X =
⌃
xi xiT 2 Rd⇥d .
n
n
i=1
b SS = 1 X T XS , hence, RIP is
• Then ⌃
n S
b SS
|||⌃
I |||op
s
<1
b SS ⇡ Is . More precisely,
i.e., ⌃
(1
b SS uk2 (1 + )kuk2 ,
)kuk2 k⌃
HISsub Hulk
8u 2 Rs
12 / 29
• RIP gives sufficient conditions:
Spur E
Is
Proposition 2 (HDS Prop. 7.2)
(Uniform) restricted null space holds for all S with |S| s if
2s (X )
1
3
• Consider a sub-Gaussian matrix X with i.i.d. entries and EXij = 0 and
EXij2 = 1 (Exercise 7.7):
• We have
N
Z
zs
wtf
fr lo
nom
n & s 2 log d =)
n & s log
ed
s
=)
PW (X ) <
2s
<
1
,
3
1
,
3s
w.h.p.
w.h.p.
• Sample complexity requirement for RIP is milder.
13 / 29
Neither RIP or PWI is necessary
• For more general covariance ⌃, it is harder to satisfy either PWI or RIP.
• Consider X 2 Rn⇥d with i.i.d. rows Xi ⇠ N(0, ⌃).
• Letting 1 2 Rd be the all-ones vector, and
⌃ := (1
µ)Id + µ11T
for µ 2 [0, 1). (A spiked covariance matrix.) III
iii int
• We have
max (⌃SS )
1) ! 1 as s ! 1.
= 1 + µ(s
d
n
x xanm
E
stfu
Ess
stir
• Exercise 7.8,
µs Gt
will Chou Ess
(a) PW is violated w.h.p. unless µ ⌧ 1/s.
Iss
p
(b) RIP is violated w.h.p. unless
µ
⌧
1/
s.
p
Ess can not be
In
fact
grows
like
µ
s
for
any
fixed
µ
2
(0,
1).
2s
using
on to Ig
vs • However, for any µ 2 [0, 1), basis pursuit succeeds w.h.p. if
n & s log
(A later result shows this.)
n
ed
.
s
if
s as
n
Z s loj.IE
14 / 29
under s
Noisy sparse regression
yNRo
7g
Rou
fit
att
y
X
b b
nd
E Ks is fullrateforay 151525
wornHEH
w
tu
If
we can
xo
pidentified
uniquely
da
glimnet
• A very popular estimator is the `1 -regularized least-squares:
h1
✓b 2 argmin
ky
2n
d
✓2R
on
Or
X ✓k22
+ k✓k1
i
(6)
Fregularitation
On• The idea: minimizing ` norm leads to sparse solutions.
1
a
• (6) is a convex program; global solution can be obtained efficiently.
• Other options: constrained form of lasso
coin
4
I
min
k✓k1 R
lion
Is
01
1
ky
2n
0
11
unbiased
X ✓k22 .
wine
o
for
(7)
O
Not
Enid
a
unbiased
15 / 29
Edin231in
i
g
E
110112
A UNT
XmaxA
Xmin A
Max uTAw
Hallet
min uTAu
Huh 1
tui
Tho
11 112
at UNUTu
T
F
max
min
v 10 Hull
Utne
11h11
Ufo
min
ufo
i
UTAI
Hull
Restricted eigenvalue condition
it
• For a constant ↵
1,
2 Rd | k
C↵ (S) :=
S c k1
↵k
S k1
E tnxtx
• A strengthening of RNS is:
.
smph
tn.seyoniTcor.m
Definition 2 (RE condition)
ki
ashy
are
A matrix X satisfies the restricted eigenvalue (RE) condition over S with ten men
parameters (, ↵) if
DT
1
kX k22
n
• RNS corresponds to
peek cyst103
Htt
km
zpecisl.IO
D
k k22
for all
restricted
IkAEC
2 C↵ (S).
inf
DID
mine't TseCaisy libya
1
kX k22 > 0
n
s.t XD
y
for all
Zk
2 C1 (S) \ {0}.
kXDH O
16 / 29
Deviation bounds under RE
Theorem 2 (Deterministic deviations)
Assume that y = X ✓⇤ + w , where X 2 Rn⇥d and ✓⇤ 2 Rd , and
• ✓⇤ is supported on S ⇢ [d] with |S| s
• X satisfies RE(, 3) over S.
Let us define z =
XTw
n .
Then, we have the following:
(a) Any solution of Lasso (6) with
non
otitis
k✓b
2kzk1 satisfies
✓⇤ k2
I
1
WEIR
End
Teµdxn
3p
s
(b) Any solution of constrained Lasso (7) with R = k✓⇤ k1 satisfies
k✓b
✓⇤ k2
4p
skzk1
17 / 29
IT
Example (fixed design regression)
• Assume y = X ✓⇤ + w where w ⇠ N(0,
2
In ), and
• X 2 Rn⇥d fixed and satisfying RE condition and normalization
µxjH Ecru
Vj
max
j=1,...,d
kXj k
p a C .
n
d
n
where Xj is the jth column of X .
• Recall z = X T w /n.
O
NIO
n
• It is easy to show that w.p.
kzk1
• Thus, setting
w.p. at least 1
q
= 2C
k✓b
2e
n
2
y2
2e n /2 , q
⇣r 2 log d
⌘
C
+
n
1
2 log d
n
+
g
d mexlltilly
t 3211716 up
l 2e
, Lasso solution satisfies
r
⇣
⌘
p
6C
2 log d
⇤
✓ k2
s
+
n
/2
.
3
Epd
18 / 29
I c arguing Hy
a
XOR 1411011
Toy
eyed
Basic inequality
G
is opliml forthe puller
and
E L
0 1
1111811
L d
s
I E
to
expand the hits
11 5112
In
XI
2
110 11
LI
2S
no'll
HEH
E 11011
triangle
4
110511
E 1111041410
s app 10 1
of fuel
use decomposihiy
t
e dew
10
1110 11
5
0
Taylor
Ott is feasible
4814
s X no'll
Scot
Leon
I
iM
Hot tbh
i
Ot's
108
t
Hot's 1511
Is
HIS'll
TTYZY ftHbsH
E lidsth
l
Ds
Hbs
it
He
Ms
1105
yfnaintbhfh
hall.tl b
KatbN S Hbk
Hall
edit
s
11 54
In
g dZ HBK
dubs11
or 112165
In
Is
11 511
3118511
E
yall
in
11
E
Use the RE
2
11
t X Hbs
Hbs41
S HBH 112lb
d 3211
o
hmdlN
Hopn.siney.Lduli4bet
I
3 Hbs11
T
41
E
KUBIK s
HdHas
Hbs
trs 115112
J
115511
I
RECTI
cordite
3
118511
t
E
rs
f
on
at
• Taking
=
p
2 logd/n, we have
l
F2d
k✓b
consistency
✓ k2 .
⇤
r
s log d
n
if
s.bg
A
0
s Loyd
n
• This is the typical high-dimensional scaling in sparse problems.
w.h.p. (i.e.,
1
1
).
• Had we known the support S in advance, our rate would be (w.h.p.)
r
s
si
k✓b ✓⇤ k2 .
.
N
n
• The log d factor is the price for not knowing the support;
• roughly the price for searching over
supports.
d
s
d s collection of candidate
19 / 29
Proof of Theorem 2
• Let us simplify the loss L(✓) :=
• Setting
=✓
✓⇤ ,
L(✓) =
=
=
=
=
1
2n kX ✓
y k2 .
1
kX (✓ ✓⇤ ) w k2
2n
1
kX
w k2
2n
1
1
kX k2
hX , w i + const.
2n
n
1
1
kX k2
h , X T w i + const.
2n
n
1
kX k2 h , zi + const.
2n
where z = X T w /n. Hence,
L(✓)
o
L(✓⇤ ) =
1
kX k2
2n
h , zi.
(8)
• Exercise: Show that (8) is the Taylor expansion of L around ✓⇤ .
20 / 29
Proof (constrained version)
• By optimality of ✓b and feasibility of ✓⇤ :
• Error vector b := ✓b
b L(✓⇤ )
L(✓)
✓⇤ satisfies basic inequality
• Using Holder inequality
1
kX b k22 hz, b i.
2n
1
kX b k22 kzk1 k b k1 .
2n
b 1 k✓⇤ k1 , we have b = ✓b
• Since k✓k
✓⇤ 2 C1 (S), hence
p
k b k1 = k b S k1 + k b S c k1 2k b S k1 2 sk b k2 .
• Combined with RE condition ( b 2 C3 (S) as well)
p
1 b 2
k k2 2 skzk1 k b k2 .
2
which gives the desired result.
21 / 29
Proof (Lagrangian version)
• Let L(✓) := L(✓) + k✓k1 be the regularized loss.
• Basic inequality is
• Rearranging
• We have
b + k✓k
b 1 L(✓⇤ ) + k✓⇤ k1
L(✓)
1
kX b k22 hz, b i + (k✓⇤ k1
2n
k✓⇤ k1
• Since
2kzk1 ,
b 1 = k✓⇤ k1 k✓⇤ + b S k1
k✓k
S
S
k b S k1 k b S c k1
1
kX b k22
n
k b k1 + 2 (k b S k1
( 3k b S k1
k b S c k1 )
b 1)
k✓k
k b S c k1
k b S c k1 )
• It follows that b 2 C3 (S) and the rest of the proof follows.
22 / 29
RE condition for anisotropic design
• For a PSD matrix ⌃, let
⇢2 (⌃) = max ⌃ii .
wi
t
i
ETheorem 3
Let X 2 Rn⇥d with rows i.i.d. from N(0, ⌃). Then, there exist universal
adx
constants c1 < 1 < c2 such that
kX ✓k22
n
p
c1 k ⌃✓k22
3
withe probability at least 1
e
c2 ⇢2 (⌃)
16
n/32
/(1
log d
k✓k21 ,
n
n/32
e
for all ✓ 2 Rd
(9)
).
• Exercise 7.11: (9) implies RE condition over C3 (S) uniformly over all
subsets of cardinality
E we assume
swellesteigenvalue
p
min (⌃) n
cont E |S| c1
on
fue
depends
32c2 ⇢2 (⌃) log d
J
of
Tmi E
O
I
• In other words, n & s log d =) RE condition over C3 (S) for all |S| s.
23 / 29
We look at all 0 for which HE't0112 1
proof of Thin 3
E Coital f
12911101111
41
found 111 01
gal 4191
same
czmax
d spat
c
d
4b
ga
PICE't 31
so thy
smell
R E
on
glholl
E
oyu
Of glkoll
vet
C fo
ftp.zep
p
Ie Lol game 24424
he pie
nie
f
Esp's
E sanity
taglioni 54
FFgi
Ti
Io
Issa
s
jiff f
Peeky
c2E2a't 25
file
J cCon
tab
Ff
all
lt
i
E
t 2galom
inf l
GET
l
Joe
Il
E
sty
fat
Ate
0
c
s's
o
lot I
Coin.to
nIe
t
s
fol s's I
Ate
al OE
St
fue Slg
stil
e
Ee
i ii
i
f
f lol
11491
1140
s
2g
g
11011
E
2g 11,1 E
4 for some
f
2
2e tu
Ente
f
2
r
tI iM
Ee E Tointon.IE
XO is Gansu
O
process
it
ig
I.gg
m
Supreme of GPs
Yt EIR
widely studies
Ie
foe
gig
nie
v Enoevnti
v I s Ev cIet
Yt
U
Lunt
T
wifi
fuku ml
v1 wth'll
Eno
Sf
cu
I
11h11
un
247
1914it
otm.aEIiii5
variation.fi
t
o
II
tht
HDP
ET
in
w
min
ER ex
f
are
E sup Yt
teh1Yt
c
eE.e o
l
kukri
EIR
et
Yeo
Ya
of
Iki
rt
lEE.Fi
indp'tentnNloN
Vh1
is the
Ytml
a
a
and her
GP hae
linen
tank
wonaldmb.fm
We
are
gory
visitawww
1
comparison ineywhly
Gordon's inequality
6.65
HDS
the
let 12am
suppone
a
pair
by
and supper tht
E Yun
of
ten
nonempty
y
s
Yai
for all
and
we
have equality
Elin y zig
k to our problem
M 1
Zane 49
Yum
on
Lu Wu
I Yu ol
and
Ux V
set T
ti il
E Zun
if
Yum
f
E
GEIR
X
ii
rY
E Yun Yi
In Efsa
indexed
GPs
mean
luv
be
Jl ET
nd
and
v
Effirmi
u
tum
IRD
have ii d
Mom entries
he
d has i it
No
1
entries
xD
ut
ITcotxo
Inla
L
tHoH4u
Cov
Xv
co
rent
110112
In
Kye
cool
Huie
X
Kth
f
curl
he
he
Xeivj
Xkivi
E corkkiri Kyiv
i i
sq.cat
4civiiXigY
corLXkivi
iil
uXrivivav
un
Elsa v ta vY
EfL9iu_
Y
Iuliu vill
2
In
verify
oth condita
tu
Backido
feinting
man
VEVNIE
tunic.nu
control
e's 294
paint
of
Z
lmvEInsemiui.ve hu 3
E 1241
4 241
I
n
E ni
u
a
o
IT
tu v
ul Mil
s E
Grind
E Tiff
then
E
t
E
I
v cv
emvanfemi.fm
9
I
in
4
I
vav
Comments
• Note that
kX ✓k22
E n
p
= k ⌃✓k22 .
• Bound (9) says that kX ✓k22 /n is lower-bounded by a multiple of its
expectation minus a slack / k✓k21 .
• Why the slack is needed?
• Inpthe high-dimensional setting kX ✓k2 = 0 for any ✓ 2 ker(X ), while
k ⌃✓k2 > 0 for any ✓ 6= 0 assuming ⌃ is non-singular.
p
• In fact, k ⌃✓k2 is uniformly bounded below in that case:
p
2
k ⌃✓k22
min (⌃)k✓k2 ,
p
showing that the population level version of X /n, that is, ⌃, satisfies RE,
and in fact global eigenvalue condition.
• (9) gives a nontrivial/good lower bound for ✓ for which k✓k1 is small
compared to k✓k2 , i.e., sparse vectors.
p
• Recall that if ✓ is s-sparse, then k✓k1 sk✓k2 ,
p
• while for general ✓ 2 Rd , we have k✓k1 dk✓k2 .
24 / 29
Examples
• Toeplitz family: ⌃ij = ⌫ |i
j|
,
⇢2 (⌃) = 1,
• Spiked model: ⌃ := (1
min (⌃)
(1
⌫)2 > 0
µ)Id + µ11T ,
⇢2 (⌃) = 1,
min (⌃)
=1
µ
• For future applications, note that (9) implies
kX ✓k22
n
where ↵1 = c1
↵1 k✓k22
↵2 k✓k21 ,
2
min (⌃) and ↵2 = c2 ⇢ (⌃)
8✓ 2 Rd .
log d
.
n
25 / 29
Lasso oracle inequality
• For simplicity, let ̄ =
oi oi
min (⌃)
and ⇢2 = ⇢2 (⌃) = maxi ⌃ii
trend fillery
Theorem 4
Under condition (9), consider the Lagrangian Lasso with regularization parameter
2kzk1 where z = X T w /n. For any ✓⇤ 2 Rd , any optimal solution ✓b
satisfies the bound
1st 1110541 t tend110511,2
g
k✓b
✓⇤ k22
fond
144 2
16
32c2 ⇢2 log d ⇤ 2
⇤
|S| +
k✓ c k1 +
k✓S c k1
c1 ̄ S
c1 ̄ n
c12 ̄2
|
{z
}
| {z }
d
Estimation error
(10)
ApproximationError
valid for any subset S with cardinality
F
i
5
Take
|S|
c1 ̄ n
.
64c2 ⇢2 log d
HEIDI
O is sparse
S suppl0 1 assay
26 / 29
• Simplifying the bound
k✓b
✓⇤ k22 1
2
|S| + 2 k✓S⇤ c k1 + 3
log d ⇤ 2
k✓S c k1
n
where 1 , 2 , 3 are constant dependent on ⌃.
• Assume
= 1 (noise variance) for simplicity.
p
• Since we kzk1 . log d/n w.h.p., we can take of this order:
r
log
d
log d ⇤
log d ⇤ 2
k✓b ✓⇤ k22 .
|S| +
k✓S c k1 +
k✓S c k1
n
n
n
• Optimizing the bound
k✓b
✓⇤ k22 .
inf n
|S|. log d
"
log d
|S| +
n
r
log d ⇤
log d ⇤ 2
k✓S c k1 +
k✓S c k1
n
n
#
• An oracle that knows ✓⇤ can choose the optimal S.
27 / 29
Example: `q -ball sparsity
Pd
• Assume that ✓⇤ 2 Bq , i.e.,
• Then, assuming
2
2
j=1
|✓j⇤ |q 1,
for some q 2 [0, 1].
= 1, we have the rate (Exercise 7.12)
Sketch:
k✓b
✓⇤ k22
.
⇣ log d ⌘1
q/2
n
.
• Trick: take S = {i : |✓i⇤ | > ⌧ } and find a good threshold ⌧ later.
• Show that k✓S⇤ c k1 ⌧ 1
q
q
and |S| ⌧
• The bound would be of the form (" :=
"2 ⌧
q
+ "⌧ 1
• Ignore the last term (assuming "⌧ 1
q
q
.
p
log d/n)
+ ("⌧ 1
q 2
) .
1, it is not dominant),
28 / 29
Other
results in Chapter 7
Bounds
Yl
far we
for
prediction
error
prediction error
LI mis
Ini
fo
t
it
Ei
in
cared about bandy
prediction we
can
110
abort
xil foxtail
f In't ne In
refest
fight In Hui oaf
etffor th
atgeneralization
0 117
your
Model selection consistency support recovery
The solution of the Lasso is generally spam
in the hard spark does model selectin
automatically
tf
Oti is actually s spars
supplant
The hardest problem
Youneedadditul
day
Supplo'T
I
recover tmsaopfo.gr
1
most stringent condition
constraints on a design
mm conditm
irrep
a
my lotil
largeeagh
l
u
Proof of Theorem 4 (Compact)
• Basic inequality argument and assumption
1
kX b k22 (3k b S k1
n
where b := 2k✓S⇤ c k1 .
2kzk1 gives
k b S c k1 + b).
• The error satisfies k b S c k1 3k b S k1 + b,
• k b k21 32 s k b k22 + 2b 2 ( b behaves almost like an s-sparse vector.)
• Bound (9) can be written as (↵1 , ↵2 > 0)
1
kX k22
n
↵1 k k22
↵2 k k21 ,
8
2 Rd .
• Combining, rearranging, dropping, etc., we get
p
(↵1 32↵2 )k b k22 (3 sk b k2 + b) + 2↵2 b 2
• Assume ↵1 32↵2 s
upper bound on b .
• Hint: ax 2 bx + c, x
↵1 /2 > 0 so that the quadratic inequality enforces an
0 =) x
b
a
+
pc
a
=) x 2
2b 2
a2
+
2c
a .
29 / 29
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )