Learning and Testing Submodular Functions Grigory Yaroslavtsev With Sofya Raskhodnikova (SODA’13) + Work in progress with Rocco Servedio Columbia University October 26, 2012 Submodularity • Discrete analog of convexity/concavity, law of diminishing returns • Applications: optimization, algorithmic game theory Let π: 2 π → [0, π ]: • Discrete derivative: ππ₯ π π = π π ∪ {π₯} − π π , πππ π ⊆ π, π₯ ∉ π • Submodular function: ππ₯ π π ≥ ππ₯ π π , ∀ π ⊆ π ⊆ π, π₯ ∉ π Exact learning • Q: Reconstruct a submodular π: 2π → π with poly(|X|) queries (for all arguments)? • A: Only Θ π -approximation (multiplicative) possible [Goemans, Harvey, Iwata, Mirrokni, SODA’09] • Q: Only for 1 − π -fraction of points (PAC-style membership queries under uniform distribution)? Pr Pr π π¨ πΊ = π πΊ πππππππππ π ππ π¨ πΊ∼π(2 ) • A: Almost as hard [Balcan, Harvey, STOC’11]. learning with 1 ≥1−π ≥ 2 Approximate learning • PMAC-learning (Multiplicative), with poly(|X|) queries : Pr Pr πππππππππ π ππ π¨ πΊ∼π(2π ) 1 3 Ω π π πΊ ≤ π¨ πΊ ≤ πΆπ πΊ ≤πΆ≤π • PAAC-learning (Additive) Pr Pr π πππππππππ π ππ π¨ πΊ∼π(2 ) • Running time: π π π (over arbitrary distributions [BH’11]) 1 |π πΊ − π¨ πΊ | ≤ π· ≥ 1 − π ≥ 2 |π | 2 π· • Running time: poly π Lee, SODA’12] 1 ≥1−π ≥ 2 1 log(π) π 2 π· [Gupta, Hardt, Roth, Ullman, STOC’11] , log 1 π [Cheraghchi, Klivans, Kothari, π Learning π: 2 → [0, π ] • For all algorithms π = ππππ π‘. Learning Time Extra features Goemans, Harvey, Iwata, Mirrokni Balcan, Harvey π π approximation Everywhere PMAC Multiplicative πΆ Poly(|X|) Poly(|X|) πΆ=π Gupta, Hardt, Roth, Ullman Cheraghchi, Our result with Klivans, Sofya Kothari, Lee PAAC Additive π· π Under arbitrary distribution π Tolerant queries π |π | 2 π· SQqueries, Agnostic PAC π: ππΏ → π, … , πΉ (bounded integral range π ≤ |π|) π 3 π π( π ⋅log π ) Agnostic Learning: Bigger picture ⊆ Subadditive ⊆ XOS = Fractionally subadditive } [Badanidiyuru, Dobzinski, Fu, Kleinberg, Nisan, Roughgarden,SODA’12] ⊆ Submodular ⊆ Gross substitutes OXS Additive (linear) Value demand Other positive results: • Learning valuation functions [Balcan, Constantin, Iwata, Wang, COLT’12] • PMAC-learning (sketching) valuation functions [BDFKNR’12] • PMAC learning Lipschitz submodular functions [BH’10] (concentration around average via Talagrand) Discrete convexity • Monotone convex π: {1, … , π} → 0, … , π 8 6 4 2 0 1 2 3 … <=R … … … … … … … … n … … n • Convex π: {1, … , π} → 0, … , π 8 6 4 2 0 1 2 3 … <=R … … … …>= n-R… π Discrete submodularity π: 2 → {0, … , π } • Case study: π = 1 (Boolean submodular functions π: 0,1 π → {0,1}) Monotone submodular = π₯π1 ∨ π₯π 2 ∨ β― ∨ π₯ππ (monomial) Submodular = (π₯π1 ∨ β― ∨ π₯ππ ) ∧ (π₯π1 ∨ β― ∨ π₯ππ ) (2-term CNF) • Monotone submodular • Submodular π π πΊ ≥ πΏ −πΉ πΊ ≤πΉ πΊ ≤πΉ ∅ ∅ Discrete monotone submodularity • Monotone submodular π: 2π → 0, … , π ≥ πππ(π πΊπ , π(πΊπ )) ≥ π(πΊπ ) ≥ π(πΊπ ) πΊ ≤πΉ π(πΊπ ) π(πΊπ ) Discrete monotone submodularity • Theorem: for monotone submodular π: 2π → 0, … , π for all π: π π = max π(π) π⊆π, π ≤π • π π ≥ max π⊆π, π ≤π π(π) (by monotonicity) π πΊ ≤πΉ πΊ ⊆ π», πΊ ≤πΉ Discrete monotone submodularity • π π ≤ max π⊆π, π ≤π π π • S’ = smallest subset of π such that π π = π π’ • ∀π₯ ∈ π ′ we have ππ₯ π ′ β π₯ > 0 => Restriction of π ′ π on 2 is monotone increasing =>|π’| ≤ π π π′ πΊ ≤πΉ π′ Representation by a formula • Theorem: for monotone submodular π: 2π → 0, … , π for all π: π π = max π(π) π⊆π, π ≤π π • Notation switch: π → π, 2 → (π₯1 , … , π₯π ) • (Monotone) Pseudo−Boolean k−DNF ( ∨ → πππ, π΄π = 1 → π¨π ∈ β): ππππ [π¨π ⋅ π₯π1 ∧ π₯π2 ∧ β― ∧ π₯ππ ] (no negations) • (Monotone) submodular π π₯1 , … , π₯π → 0, … , πΉ can be represented as a (monotone) pseudo-Boolean 2R-DNF with constants π΄π ∈ 0, … , πΉ . Discrete submodularity • Submodular π π₯1 , … , π₯π → 0, … , πΉ can be represented as a pseudo-Boolean 2R-DNF with constants π΄π ∈ 0, … , πΉ . • Hint [Lovasz] (Submodular monotonization): Given submodular π, define ππππ πΊ = ππππ»⊆πΊ π π» Then ππππ is monotone and submodular. π πΊ ≥ πΏ −πΉ πΊ ≤πΉ ∅ Learning pB-formulas and k-DNF • π·ππΉ π,πΉ = class of pB-DNF of width π with π΄π ∈ {0, … , πΉ} • i-slice ππ π₯1 , … , π₯π → {0,1} defined as ππ π₯1 , … , π₯π = 1 iff π π₯1 , … , π₯π ≥ π • If π ∈ π·ππΉ π,πΉ its i-slices ππ are π-DNF and: π π₯1 , … , π₯π = max (π ⋅ ππ π₯1 , … , π₯π ) 1≤π≤πΉ • PAC-learning Pr Pr ππππ(π¨) πΊ∼π( 0,1 π) π¨ πΊ =π πΊ 1 ≥1−π ≥ 2 Learning pB-formulas and k-DNF • Learn every i-slice ππ on 1 − π ′ = (1 − π / π ) fraction of arguments • Learning π-DNF (π·ππΉ π,πΉ ) (let Fourier sparsity πΊπ = πΉ π log π π ) – Kushilevitz-Mansour (Goldreich-Levin): ππππ¦ π, πΊπ queries/time. – ``Attribute efficient learning’’: ππππ¦πππ π ⋅ ππππ¦ πΊπ queries – Lower bound: Ω(2π ) queries to learn a random π-junta (∈ π-DNF) up to constant precision. • Optimizations: – Slightly better than KM/GL by looking at the Fourier spectrum of π·ππΉ π,πΉ (see SODA paper: switching lemma => πΏ1 bound) – Do all R iterations of KM/GL in parallel by reusing queries Property testing • Let C be the class of submodular π: 0,1 π → {0, … , πΉ} • How to (approximately) test, whether a given π is in C? • Property tester: (Randomized) algorithm for distinguishing: 1. π ∈ πͺ 2. (π-far): min π – π ≥ π 2π π∈ πͺ • Key idea: π-DNFs have small representations: – [Gopalan, Meka,Reingold CCC’12] (using quasi-sunflowers [Rossman’10]) ∀π > 0, ∀ π-DNF formula F there exists: π-DNF formula F’ of size ≤ π 1 π(π) log π such that πΉ – πΉ’ ≤ π2π Testing by implicit learning • Good approximation by juntas => efficient property testing [surveys: Ron; Servedio] – π-approximation by π½(π)-junta – Good dependence on π: π½ π = π 1 π(π) log π • [Blais, Onak] sunflowers for submodular functions 1 [π π log π + log π ](π+π) – Query complexity: 1 π (π) π log π (independent of n) – Running time: exponential in π½ π (we think can be reduced it to O(π½(π)) ) – We have Ω π lower bound for testing π-DNF (reduction from Gap Set Intersection: distinguishing a random π-junta vs π + O(log π)-junta requires Ω π queries) Previous work on testing submodularity π: 0,1 π → [0, π ] [Parnas, Ron, Rubinfeld ‘03, Seshadhri, Vondrak, ICS’11]: • Upper bound (π/π)πΆ( π) . } Gap in query complexity • Lower bound: π π Special case: coverage functions [Chakrabarty, Huang, ICALP’12]. Thanks!