Uploaded by 孫利東

200C slides slinreg

advertisement
STAT 200C: High-dimensional Statistics
Arash A. Amini
April 14, 2021
1 / 29
Linear regression setup
sample
T
• The data is (y , X ) where y 2 Rn and X 2 Rn⇥d , and the model
y = X ✓⇤ + w .
• ✓⇤ 2 Rd is an unknown parameter.
• w 2 Rn is the vector of noise variables.
o⇤
• Equivalently,
yi = h✓ , xi i + wi , i = 1, . . . , n
where xi 2 Rd is the nth row of X :
2
6
6
X =6
4
• Recall h✓⇤ , xi i =
L
Pd
⇤
j=1 ✓j xij .
|
x1T
x2T
..
.
xnT
{z
d
3
M Nz
d
kn E IR
7
7
7
5
}
k
Intrested in thecasewhere n
d
2 / 29
Sparsity models
XO p
why
d
has
may
solution in
gurl
tu mollis
• When n < d, no hope of estimating ✓⇤ ,
fifiable
⇤
• unless we impose some sort of of low-dimensional model on ✓ .
• Support of ✓⇤ (recall [d] = {1, . . . , d}):
fog
supp(✓⇤ ) := S(✓⇤ ) = j 2 [d] : ✓j⇤ 6= 0 .
⇤
• Hard sparsity assumption: s = |S(✓ )| ⌧ d.
s
• Weaker sparsity assumption via `q balls for q 2 [0, 1]
11011g
q I
I
toil't T
thisis proper
n
d
Bq (Rq ) = ✓ 2 R :
• q = gives `1 ball.
b
d
X
j=1
q
o
|✓j |  Rq .
11011g
E
ER
11 4,5
suppl0 11 3
Supp101
O cBg Rg
Rafa
• q = 0 the `0 ball, same as hard sparsity:
go.com
this is Tpaisti.IE k✓⇤ k0 := |S(✓⇤ )| = # j; ✓j⇤ 6= 0
rot adnormgl
yay
Evil
subadditive 110 to'll S
110149
110 11
Es
110449
3 / 29
9
u
unit by ball
unite ball
9
1
go
I
0
g as
75
t l
It
e
a
Tn
11
II
f
i
Y
(from HDS book)
4 / 29
Basis pursuit
Try y xp I
110 1115
nad
t
• Consider the noiseless case y = X ✓⇤ . a
• We assume that k✓⇤ k0 is small.
solve
y Xsp
Isis's
foray
min k✓k0
✓2Rd
solution
OT andOT both
11104 021 0
subject to
f this
1 Of 02 0
2sEn_
Xs is full rank
ISIS
t foray
s
02110525
1109
• Ideal program to solve:
canwe recover O
y = X✓
• k · k0 is highly non-convex, relax to k · k1 :
min k✓k1
✓2Rd
subject to
y = X✓
0 1D
110 DH
This is called basis pursuit (regression).
• (1) is a convex program.
D
• In fact, can be written as a linear program1 .
(1)
Beker X
i
tttDH
110
for some t
HOH
EQ
O
• Global solutions can be obtained very efficiently.
1 Exercise:
Introduce auxiliary variables sj 2 R and note that minimizing
|✓j |  sj gives the `1 norm of ✓.
P
j sj
subject to
5 / 29
Restricted null space property (RNS)
d
seeds p
• Define
d
2 Rd : k
C(S) = {
S c k1
k
S k1 }.
f
(2)
Theorem 1
The following two are equivalent:
• For any ✓⇤ 2 Rd with support ✓ S, the basis pursuit program (1) applied to
the data (y = X ✓⇤ , X ) has unique solution ✓b = ✓⇤ .
• The restricted null space (RNS) property holds, i.e.,
C(S) \ ker(X ) = {0}.
iii iii iii
Deas
a
DEUS
e
y
(3)
i
AER
6 / 29
r
Proof
• Consider the tangent cone to the `1 ball (of radius k✓⇤ k1 ) at ✓⇤ :
T(✓⇤ ) = {
t
2 Rd : k✓⇤ + t k1  k✓⇤ k1 , for some t > 0.}
i.e., the set of descent directions for `1 norm at point ✓⇤ .
• Feasible set is ✓⇤ + ker(X ), i.e.
• ker(X ) is the set of feasible directions
=✓
✓⇤ .
• Hence, there is a minimizer other than ✓⇤ if and only if
T(✓⇤ ) \ ker(X ) 6= {0}
• It is enough to show that
C(S) =
[
(4)
T(✓⇤ ).
✓ ⇤ 2Rd : supp(✓ ⇤ )✓S
7 / 29
B1
✓⇤
• (1)
i
⇤
T(✓(2)
)
Ker(X)
⇤
T(✓(1)
)
d = 2,
11011 L
•✓⇤
to
(2)
⇤ = (0, 1), ✓ ⇤ = (0,
[d] = {1, 2}, S = {2}, ✓(1)
(2)
C(S) = {( 1 , 2 ) : | 1 |  | 2 |}.
C(S)
1).
8 / 29
• It is enough to show that
Ot's
T(✓⇤ )
(5)
✓ ⇤ 2Rd : supp(✓ ⇤ )✓S
2 T1 (✓⇤ ) i↵2
• We have
HO't
C(S) =
To 05 0
ES
HOT
[
Dlf E 110 11
110511
DstDscH E
k
 k✓S⇤ k1
S c k1
k✓S⇤ +
S k1
zo't sa suppl0
ES
2 T1 (✓⇤ ) for some ✓⇤ 2 Rd s.t. supp(✓⇤ ) ⇢ S i↵
h
i
HDsoll
t
1
Il
⇤
⇤
1105 Ds
k S c k1  sup k✓S k1 k✓S + S k1
• We have
t
d
✓S⇤ 2Rs
ecomposibihy
ofle
decorpse
oursupport
=k
S k1
disjoint
2 Let
subset.
T1 (✓ ⇤ ) be the subset of T(✓ ⇤ ) where t = 1, and argue that w.l.o.g. we can work this
I
D
110 1DH E 11011
9 / 29
Sufficient conditions for restricted nullspace inCgi
• [d] := {1, . . . , d}
• For a matrix X 2 Rd , let Xj be its jth column (for j 2 [d]).
• The pairwise incoherence of X is defined as
hXi , Xj i
n
PW (X ) := max
i, j2[d]
Ya
1 t
he
1{i = j}
• Alternative form: X T X is the Gram matrix of X ,
• (X T X )ij = hXi , Xj i.
did
X X
d
T
Ip k1
n
norm of the matrix.
PW (X )
where k · k1 is the vector `1
:= k
Proposition 1 (HDS Prop. 7.1)
(Uniform) restricted nullspace holds for all S with |S|  s if
PW (X )
• Proof: Exercise 7.3.

1
3s
10 / 29
Restricted isometry property (RIP)
• A more relaxed condition:
Definition 1 (RIP)
X 2 Rn⇥d satisfies a restricted isometry property (RIP) of order s with constant
s (X ) > 0 if
XST XS
|||
n
I |||op 
s (X ),
for all S with |S|  s
• PW incoherence is close to RIP with s = 2;
p
• for example, when kXj / nk2 = 1 for all j, we have
2 (X )
• In general, for any s
FTII
=
PW (X ).
2, (Exercise 7.4)
PW (X )

s (X )
 s
PW (X ).
11 / 29
Definition (RIP)
X 2 Rn⇥d satisfies a restricted isometry property (RIP) of order s with constant
s (X ) > 0 if
Jhs
X X
|||
T
S
n
S
s
I |||op 
s (X ),
for all S with |S|  s
• Let xiT be the i th row of X . Consider the sample covariance matrix:
n
X
1
1
b := X T X =
⌃
xi xiT 2 Rd⇥d .
n
n
i=1
b SS = 1 X T XS , hence, RIP is
• Then ⌃
n S
b SS
|||⌃
I |||op 
s
<1
b SS ⇡ Is . More precisely,
i.e., ⌃
(1
b SS uk2  (1 + )kuk2 ,
)kuk2  k⌃
HISsub Hulk
8u 2 Rs
12 / 29
• RIP gives sufficient conditions:
Spur E
Is
Proposition 2 (HDS Prop. 7.2)
(Uniform) restricted null space holds for all S with |S|  s if
2s (X ) 
1
3
• Consider a sub-Gaussian matrix X with i.i.d. entries and EXij = 0 and
EXij2 = 1 (Exercise 7.7):
• We have
N
Z
zs
wtf
fr lo
nom
n & s 2 log d =)
n & s log
ed
s
=)
PW (X ) <
2s
<
1
,
3
1
,
3s
w.h.p.
w.h.p.
• Sample complexity requirement for RIP is milder.
13 / 29
Neither RIP or PWI is necessary
• For more general covariance ⌃, it is harder to satisfy either PWI or RIP.
• Consider X 2 Rn⇥d with i.i.d. rows Xi ⇠ N(0, ⌃).
• Letting 1 2 Rd be the all-ones vector, and
⌃ := (1
µ)Id + µ11T
for µ 2 [0, 1). (A spiked covariance matrix.) III
iii int
• We have
max (⌃SS )
1) ! 1 as s ! 1.
= 1 + µ(s
d
n
x xanm
E
stfu
Ess
stir
• Exercise 7.8,
µs Gt
will Chou Ess
(a) PW is violated w.h.p. unless µ ⌧ 1/s.
Iss
p
(b) RIP is violated w.h.p. unless
µ
⌧
1/
s.
p
Ess can not be
In
fact
grows
like
µ
s
for
any
fixed
µ
2
(0,
1).
2s
using
on to Ig
vs • However, for any µ 2 [0, 1), basis pursuit succeeds w.h.p. if
n & s log
(A later result shows this.)
n
ed
.
s
if
s as
n
Z s loj.IE
14 / 29
under s
Noisy sparse regression
yNRo
7g
Rou
fit
att
y
X
b b
nd
E Ks is fullrateforay 151525
wornHEH
w
tu
If
we can
xo
pidentified
uniquely
da
glimnet
• A very popular estimator is the `1 -regularized least-squares:
h1
✓b 2 argmin
ky
2n
d
✓2R
on
Or
X ✓k22
+ k✓k1
i
(6)
Fregularitation
On• The idea: minimizing ` norm leads to sparse solutions.
1
a
• (6) is a convex program; global solution can be obtained efficiently.
• Other options: constrained form of lasso
coin
4
I
min
k✓k1 R
lion
Is
01
1
ky
2n
0
11
unbiased
X ✓k22 .
wine
o
for
(7)
O
Not
Enid
a
unbiased
15 / 29
Edin231in
i
g
E
110112
A UNT
XmaxA
Xmin A
Max uTAw
Hallet
min uTAu
Huh 1
tui
Tho
11 112
at UNUTu
T
F
max
min
v 10 Hull
Utne
11h11
Ufo
min
ufo
i
UTAI
Hull
Restricted eigenvalue condition
it
• For a constant ↵
1,
2 Rd | k
C↵ (S) :=
S c k1
 ↵k
S k1
E tnxtx
• A strengthening of RNS is:
.
smph
tn.seyoniTcor.m
Definition 2 (RE condition)
ki
ashy
are
A matrix X satisfies the restricted eigenvalue (RE) condition over S with ten men
parameters (, ↵) if
DT
1
kX k22
n
• RNS corresponds to
peek cyst103
Htt
km
zpecisl.IO
D
k k22
for all
restricted
IkAEC
2 C↵ (S).
inf
DID
mine't TseCaisy libya
1
kX k22 > 0
n
s.t XD
y
for all
Zk
2 C1 (S) \ {0}.
kXDH O
16 / 29
Deviation bounds under RE
Theorem 2 (Deterministic deviations)
Assume that y = X ✓⇤ + w , where X 2 Rn⇥d and ✓⇤ 2 Rd , and
• ✓⇤ is supported on S ⇢ [d] with |S|  s
• X satisfies RE(, 3) over S.
Let us define z =
XTw
n .
Then, we have the following:
(a) Any solution of Lasso (6) with
non
otitis
k✓b
2kzk1 satisfies
✓⇤ k2 
I
1
WEIR
End
Teµdxn
3p
s

(b) Any solution of constrained Lasso (7) with R = k✓⇤ k1 satisfies
k✓b
✓⇤ k2 
4p
skzk1

17 / 29
IT
Example (fixed design regression)
• Assume y = X ✓⇤ + w where w ⇠ N(0,
2
In ), and
• X 2 Rn⇥d fixed and satisfying RE condition and normalization
µxjH Ecru
Vj
max
j=1,...,d
kXj k
p a C .
n
d
n
where Xj is the jth column of X .
• Recall z = X T w /n.
O
NIO
n
• It is easy to show that w.p.
kzk1
• Thus, setting
w.p. at least 1
q
= 2C
k✓b
2e
n
2
y2
2e n /2 , q
⇣r 2 log d
⌘
C
+
n
1
2 log d
n
+
g
d mexlltilly
t 3211716 up
l 2e
, Lasso solution satisfies
r
⇣
⌘
p
6C
2 log d
⇤
✓ k2 
s
+

n
/2
.
3
Epd
18 / 29
I c arguing Hy
a
XOR 1411011
Toy
eyed
Basic inequality
G
is opliml forthe puller
and
E L
0 1
1111811
L d
s
I E
to
expand the hits
11 5112
In
XI
2
110 11
LI
2S
no'll
HEH
E 11011
triangle
4
110511
E 1111041410
s app 10 1
of fuel
use decomposihiy
t
e dew
10
1110 11
5
0
Taylor
Ott is feasible
4814
s X no'll
Scot
Leon
I
iM
Hot tbh
i
Ot's
108
t
Hot's 1511
Is
HIS'll
TTYZY ftHbsH
E lidsth
l
Ds
Hbs
it
He
Ms
1105
yfnaintbhfh
hall.tl b
KatbN S Hbk
Hall
edit
s
11 54
In
g dZ HBK
dubs11
or 112165
In
Is
11 511
3118511
E
yall
in
11
E
Use the RE
2
11
t X Hbs
Hbs41
S HBH 112lb
d 3211
o
hmdlN
Hopn.siney.Lduli4bet
I
3 Hbs11
T
41
E
KUBIK s
HdHas
Hbs
trs 115112
J
115511
I
RECTI
cordite
3
118511
t
E
rs
f
on
at
• Taking
=
p
2 logd/n, we have
l
F2d
k✓b
consistency
✓ k2 .
⇤
r
s log d
n
if
s.bg
A
0
s Loyd
n
• This is the typical high-dimensional scaling in sparse problems.
w.h.p. (i.e.,
1
1
).
• Had we known the support S in advance, our rate would be (w.h.p.)
r
s
si
k✓b ✓⇤ k2 .
.
N
n
• The log d factor is the price for not knowing the support;
• roughly the price for searching over
supports.
d
s
 d s collection of candidate
19 / 29
Proof of Theorem 2
• Let us simplify the loss L(✓) :=
• Setting
=✓
✓⇤ ,
L(✓) =
=
=
=
=
1
2n kX ✓
y k2 .
1
kX (✓ ✓⇤ ) w k2
2n
1
kX
w k2
2n
1
1
kX k2
hX , w i + const.
2n
n
1
1
kX k2
h , X T w i + const.
2n
n
1
kX k2 h , zi + const.
2n
where z = X T w /n. Hence,
L(✓)
o
L(✓⇤ ) =
1
kX k2
2n
h , zi.
(8)
• Exercise: Show that (8) is the Taylor expansion of L around ✓⇤ .
20 / 29
Proof (constrained version)
• By optimality of ✓b and feasibility of ✓⇤ :
• Error vector b := ✓b
b  L(✓⇤ )
L(✓)
✓⇤ satisfies basic inequality
• Using Holder inequality
1
kX b k22  hz, b i.
2n
1
kX b k22  kzk1 k b k1 .
2n
b 1  k✓⇤ k1 , we have b = ✓b
• Since k✓k
✓⇤ 2 C1 (S), hence
p
k b k1 = k b S k1 + k b S c k1  2k b S k1  2 sk b k2 .
• Combined with RE condition ( b 2 C3 (S) as well)
p
1 b 2
k k2  2 skzk1 k b k2 .
2
which gives the desired result.
21 / 29
Proof (Lagrangian version)
• Let L(✓) := L(✓) + k✓k1 be the regularized loss.
• Basic inequality is
• Rearranging
• We have
b + k✓k
b 1  L(✓⇤ ) + k✓⇤ k1
L(✓)
1
kX b k22  hz, b i + (k✓⇤ k1
2n
k✓⇤ k1
• Since
2kzk1 ,
b 1 = k✓⇤ k1 k✓⇤ + b S k1
k✓k
S
S
 k b S k1 k b S c k1
1
kX b k22 
n

k b k1 + 2 (k b S k1
( 3k b S k1
k b S c k1 )
b 1)
k✓k
k b S c k1
k b S c k1 )
• It follows that b 2 C3 (S) and the rest of the proof follows.
22 / 29
RE condition for anisotropic design
• For a PSD matrix ⌃, let
⇢2 (⌃) = max ⌃ii .
wi
t
i
ETheorem 3
Let X 2 Rn⇥d with rows i.i.d. from N(0, ⌃). Then, there exist universal
adx
constants c1 < 1 < c2 such that
kX ✓k22
n
p
c1 k ⌃✓k22
3
withe probability at least 1
e
c2 ⇢2 (⌃)
16
n/32
/(1
log d
k✓k21 ,
n
n/32
e
for all ✓ 2 Rd
(9)
).
• Exercise 7.11: (9) implies RE condition over C3 (S) uniformly over all
subsets of cardinality
E we assume
swellesteigenvalue
p
min (⌃) n
cont E |S|  c1
on
fue
depends
32c2 ⇢2 (⌃) log d
J
of
Tmi E
O
I
• In other words, n & s log d =) RE condition over C3 (S) for all |S|  s.
23 / 29
We look at all 0 for which HE't0112 1
proof of Thin 3
E Coital f
12911101111
41
found 111 01
gal 4191
same
czmax
d spat
c
d
4b
ga
PICE't 31
so thy
smell
R E
on
glholl
E
oyu
Of glkoll
vet
C fo
ftp.zep
p
Ie Lol game 24424
he pie
nie
f
Esp's
E sanity
taglioni 54
FFgi
Ti
Io
Issa
s
jiff f
Peeky
c2E2a't 25
file
J cCon
tab
Ff
all
lt
i
E
t 2galom
inf l
GET
l
Joe
Il
E
sty
fat
Ate
0
c
s's
o
lot I
Coin.to
nIe
t
s
fol s's I
Ate
al OE
St
fue Slg
stil
e
Ee
i ii
i
f
f lol
11491
1140
s
2g
g
11011
E
2g 11,1 E
4 for some
f
2
2e tu
Ente
f
2
r
tI iM
Ee E Tointon.IE
XO is Gansu
O
process
it
ig
I.gg
m
Supreme of GPs
Yt EIR
widely studies
Ie
foe
gig
nie
v Enoevnti
v I s Ev cIet
Yt
U
Lunt
T
wifi
fuku ml
v1 wth'll
Eno
Sf
cu
I
11h11
un
247
1914it
otm.aEIiii5
variation.fi
t
o
II
tht
HDP
ET
in
w
min
ER ex
f
are
E sup Yt
teh1Yt
c
eE.e o
l
kukri
EIR
et
Yeo
Ya
of
Iki
rt
lEE.Fi
indp'tentnNloN
Vh1
is the
Ytml
a
a
and her
GP hae
linen
tank
wonaldmb.fm
We
are
gory
visitawww
1
comparison ineywhly
Gordon's inequality
6.65
HDS
the
let 12am
suppone
a
pair
by
and supper tht
E Yun
of
ten
nonempty
y
s
Yai
for all
and
we
have equality
Elin y zig
k to our problem
M 1
Zane 49
Yum
on
Lu Wu
I Yu ol
and
Ux V
set T
ti il
E Zun
if
Yum
f
E
GEIR
X
ii
rY
E Yun Yi
In Efsa
indexed
GPs
mean
luv
be
Jl ET
nd
and
v
Effirmi
u
tum
IRD
have ii d
Mom entries
he
d has i it
No
1
entries
xD
ut
ITcotxo
Inla
L
tHoH4u
Cov
Xv
co
rent
110112
In
Kye
cool
Huie
X
Kth
f
curl
he
he
Xeivj
Xkivi
E corkkiri Kyiv
i i
sq.cat
4civiiXigY
corLXkivi
iil
uXrivivav
un
Elsa v ta vY
EfL9iu_
Y
Iuliu vill
2
In
verify
oth condita
tu
Backido
feinting
man
VEVNIE
tunic.nu
control
e's 294
paint
of
Z
lmvEInsemiui.ve hu 3
E 1241
4 241
I
n
E ni
u
a
o
IT
tu v
ul Mil
s E
Grind
E Tiff
then
E
t
E
I
v cv
emvanfemi.fm
9
I
in
4
I
vav
Comments
• Note that
kX ✓k22
E n
p
= k ⌃✓k22 .
• Bound (9) says that kX ✓k22 /n is lower-bounded by a multiple of its
expectation minus a slack / k✓k21 .
• Why the slack is needed?
• Inpthe high-dimensional setting kX ✓k2 = 0 for any ✓ 2 ker(X ), while
k ⌃✓k2 > 0 for any ✓ 6= 0 assuming ⌃ is non-singular.
p
• In fact, k ⌃✓k2 is uniformly bounded below in that case:
p
2
k ⌃✓k22
min (⌃)k✓k2 ,
p
showing that the population level version of X /n, that is, ⌃, satisfies RE,
and in fact global eigenvalue condition.
• (9) gives a nontrivial/good lower bound for ✓ for which k✓k1 is small
compared to k✓k2 , i.e., sparse vectors.
p
• Recall that if ✓ is s-sparse, then k✓k1  sk✓k2 ,
p
• while for general ✓ 2 Rd , we have k✓k1  dk✓k2 .
24 / 29
Examples
• Toeplitz family: ⌃ij = ⌫ |i
j|
,
⇢2 (⌃) = 1,
• Spiked model: ⌃ := (1
min (⌃)
(1
⌫)2 > 0
µ)Id + µ11T ,
⇢2 (⌃) = 1,
min (⌃)
=1
µ
• For future applications, note that (9) implies
kX ✓k22
n
where ↵1 = c1
↵1 k✓k22
↵2 k✓k21 ,
2
min (⌃) and ↵2 = c2 ⇢ (⌃)
8✓ 2 Rd .
log d
.
n
25 / 29
Lasso oracle inequality
• For simplicity, let ̄ =
oi oi
min (⌃)
and ⇢2 = ⇢2 (⌃) = maxi ⌃ii
trend fillery
Theorem 4
Under condition (9), consider the Lagrangian Lasso with regularization parameter
2kzk1 where z = X T w /n. For any ✓⇤ 2 Rd , any optimal solution ✓b
satisfies the bound
1st 1110541 t tend110511,2
g
k✓b
✓⇤ k22
fond
144 2
16
32c2 ⇢2 log d ⇤ 2
⇤

|S| +
k✓ c k1 +
k✓S c k1
c1 ̄ S
c1 ̄ n
c12 ̄2
|
{z
}
| {z }
d
Estimation error
(10)
ApproximationError
valid for any subset S with cardinality
F
i
5
Take
|S| 
c1 ̄ n
.
64c2 ⇢2 log d
HEIDI
O is sparse
S suppl0 1 assay
26 / 29
• Simplifying the bound
k✓b
✓⇤ k22  1
2
|S| + 2 k✓S⇤ c k1 + 3
log d ⇤ 2
k✓S c k1
n
where 1 , 2 , 3 are constant dependent on ⌃.
• Assume
= 1 (noise variance) for simplicity.
p
• Since we kzk1 . log d/n w.h.p., we can take of this order:
r
log
d
log d ⇤
log d ⇤ 2
k✓b ✓⇤ k22 .
|S| +
k✓S c k1 +
k✓S c k1
n
n
n
• Optimizing the bound
k✓b
✓⇤ k22 .
inf n
|S|. log d
"
log d
|S| +
n
r
log d ⇤
log d ⇤ 2
k✓S c k1 +
k✓S c k1
n
n
#
• An oracle that knows ✓⇤ can choose the optimal S.
27 / 29
Example: `q -ball sparsity
Pd
• Assume that ✓⇤ 2 Bq , i.e.,
• Then, assuming
2
2
j=1
|✓j⇤ |q  1,
for some q 2 [0, 1].
= 1, we have the rate (Exercise 7.12)
Sketch:
k✓b
✓⇤ k22
.
⇣ log d ⌘1
q/2
n
.
• Trick: take S = {i : |✓i⇤ | > ⌧ } and find a good threshold ⌧ later.
• Show that k✓S⇤ c k1  ⌧ 1
q
q
and |S|  ⌧
• The bound would be of the form (" :=
"2 ⌧
q
+ "⌧ 1
• Ignore the last term (assuming "⌧ 1
q
q
.
p
log d/n)
+ ("⌧ 1
q 2
) .
 1, it is not dominant),
28 / 29
Other
results in Chapter 7
Bounds
Yl
far we
for
prediction
error
prediction error
LI mis
Ini
fo
t
it
Ei
in
cared about bandy
prediction we
can
110
abort
xil foxtail
f In't ne In
refest
fight In Hui oaf
etffor th
atgeneralization
0 117
your
Model selection consistency support recovery
The solution of the Lasso is generally spam
in the hard spark does model selectin
automatically
tf
Oti is actually s spars
supplant
The hardest problem
Youneedadditul
day
Supplo'T
I
recover tmsaopfo.gr
1
most stringent condition
constraints on a design
mm conditm
irrep
a
my lotil
largeeagh
l
u
Proof of Theorem 4 (Compact)
• Basic inequality argument and assumption
1
kX b k22  (3k b S k1
n
where b := 2k✓S⇤ c k1 .
2kzk1 gives
k b S c k1 + b).
• The error satisfies k b S c k1  3k b S k1 + b,
• k b k21  32 s k b k22 + 2b 2 ( b behaves almost like an s-sparse vector.)
• Bound (9) can be written as (↵1 , ↵2 > 0)
1
kX k22
n
↵1 k k22
↵2 k k21 ,
8
2 Rd .
• Combining, rearranging, dropping, etc., we get
p
(↵1 32↵2 )k b k22  (3 sk b k2 + b) + 2↵2 b 2
• Assume ↵1 32↵2 s
upper bound on b .
• Hint: ax 2  bx + c, x
↵1 /2 > 0 so that the quadratic inequality enforces an
0 =) x 
b
a
+
pc
a
=) x 2 
2b 2
a2
+
2c
a .
29 / 29
Download