18045 >> Eyal Lubetzky: Okay. Hi everyone. Today...

18045 >> Eyal Lubetzky: Okay. Hi everyone. Today we are very happy to have Ilias Diakonikolas from Columbia University, a Ph.D. student of Yanna Kakis. He's going to give a talk on threshold functions approximation learning and pseudorandomness. >> Ilias Diakonikolas: Thanks, Eyal. Can you hear me? It's like a small room, I guess. Thank you for inviting me. So this talk -- okay. You see the title. It's joint work with a large number of people. I will mention them, I guess, during the talk. Okay. So let me start. So first I will start with regression. I will talk about multi-objective optimization for like five minutes. This is my thesis topic. But I decided to spend my time here on like an equally important part of my research regarding threshold functions and applied probability. So let me spend a few minutes on that. And I would be happy to discuss more about it with everyone interested in the one-on-one meetings. So what is multi-objective optimization? Usually we are used to having some space, having one objective function and trying to find a solution efficiently that optimizes this function. But in life, things are more complicated. We have -- in practice, we have more than one criteria. So what do we do in this case? Usually in computer science, traditionally we choose somehow some combined function. We combine somehow the objectives and then optimizes function. So essentially would reduce a multi-objective optimization problem to a single objective problem. However, in general, this approach, you know, doesn't really make sense, because in many practical settings we don't really know what the combining function should be. And this is the usual approach in multi-objective optimization. What do we do? What are we interested in? We're interested in more complicated object, the set of part of optimal solutions. So all the solutions in the objective space that are undominated in some sense. However, this approach of computing the Pareto set is practically intractable, even for the most simple problems, say shortest path with two criteria, and this Pareto set can be exponentially large in the size of input. So we cannot really compute it. So the underlying goal in this area is to somehow efficiently find some approximation, some succinct approximation to the solution space; that is, to the Pareto set. And do this efficiently. Okay. So this has been the focus of my Ph.D. thesis. And this is an example of a multi-objective problem. So we have a graph, a source, a sync, every edge has two different weights. Like length and cost. I want to find the part that's as short possible and cheap as possible. However, these two goals are conflicting. There's no single Pareto that optimizes both of them simultaneously. So in the objective space this set of ping points is a Pareto set. And somehow I want to efficiently approximate it. So the important observation here is that this set is not given to. Us explicitly. Otherwise things would be a bit easier. We're given -- it's giving implicitly through the instance. So we have to find a way to approximate it in polynomial time without explicitly constructing the entire set. Okay. I think this is it. And I'll be happy to discuss in person. Okay. So now let me move to the actual talk. The talk will be about linear threshold functions and more generally about the low degree multi polynomial regression functions. And let me start by defining these functions. What is a linear threshold function? It's a boolean function. The domain is a hypercube. Range 01 or minus 11 and it's expressible in this form, the sine of an afine form. So essentially what is it? It's a partition of the hypercube by hyperplane. So the one side of the hyperplane is a plus point. The other side is a minus point. So these functions, you know, even though they might look very simple, they, you know -- they're very basic questions about them are really challenging to answer. And they have been studied in various fields because they are related to important problems in these fields. In particular, like perhaps the most, perhaps the most important such field is the most appealing to you might be machine learning, where we have like the perception algorithm, the winner algorithm, the support vector machines and boosting, so all these algorithms and notions, they are intimately related to linear threshold functions. And perhaps the problem of learning an unknown such function has been one of the most influential problems in machine learning. Both theory and practice. In complexity theory, I mean, they appear in many different settings. One of the most -- one of the most embarrassing open problems in certain complexity is this. So I give you a depth, two circuits, of polynomial size, whose gates are linear threshold functions. Try to find the lower bound for this class of circuit. So depth two, threshold circuit. We know no other bound. It's considerable that this is NP. Of course, this is ridiculous. It's not NP. But we know how to prove a lower bound. >>: So lower bound on the ->> Ilias Diakonikolas: So we need to find some function in NP that is not computable by such a circuit. By a polynomial size threshold circuit. We don't know of any function. So there are conjectures. We know how to solve special case of this problem but we don't know how to solve the general problem. So this is one. This class is called zero. But even some other results you might be familiar with like harness approximation, the majority stabilization, all this stuff. They have -- they have intimate connection to linear threshold functions. And also you know a field outside computer science, I guess, the intersection. So it's a choice theory, where these functions are viewed as voting schemes. So these excise here the variables. They are the voters. So everyone votes, one or minus one. And the weights essentially represent the influence of a voter. Okay. So, okay, this is a definition. These functions are important. So now let me move to some very basic thoughts. I care about these functions over the hypercube for the purpose of this talk. Some very basic factors since the domain, the domain is finite, et cetera, we can assume the weights, the WI and theta, they're integers. But, unfortunately, they have to be exponentially large to be able to represent every such function. You know, functions you have seen in your life many common functions belong to this class, like perhaps the most common one is the majority function. Okay. So a generalization of linear threshold functions is the class of polynomial threshold functions. So the functions that are again over the cube but are expressible as a sign of a low degree polynomial for the purpose of this talk, D, will denote the degree of this polynomial. It will be assumed to be some absolute constant. Okay. So now let me move to my first set of results. Analyzing the sensitivity of low degree polynomial threshold functions. So I need to define the notion of the influence of providable boolean function. I'm sure most of you in this group have seen this before. Okay. So what is the influence of -- what is the sensitivity of an input in a boolean function? So we take X and we see how many of its neighbors they get a different value under F. Okay. So how many edges in the hypercube are bichromatic. Then we'll take the expectation of this over all Xs where the X is uniformly distributed. Okay. For every function F this notion of the algorithm sensitivity is between zero and N. And alternate equivalent way to view this is we see the edges in the ith direction so the influence of the ith variable in the function F is the probability of a random X if we flip the ith bit of X, the value of the function changes. And the total influence, the sum of the individual influences is easily seemed to be equal to the other sensitivity, the notion I defined before. So this notion is well understood for linear threshold functions. It's actually very easy to show that every linear threshold function has other sensitivity at most written. And this is actually true for every monotone function. Even for unique functions. And the majority function is the unique maximizer. So every linear threshold function has algorithm sensitivity at most to them and the majority function obtains this as well. So the bound is tight. So I care about this question in higher dimensions. So what is the average sensitivity of degree D polynomial threshold functions? So until very recently nothing was known about this. Nothing, nontrivial, nothing beyond the trivial upper bound of N. And it has this question has the obvious can be a total interpretation, like the number of edges in the hypercube that are sliced by the polynomial surface P. So let us do a simple example, as we said, for the majority function, you know, the majority function is essentially the optimal, attains the worst case. At least for the linear gaze, and this is an example for the degree two, degree two threshold function. Okay. And, as I said, for every linear threshold function, this is the upper bound site. You know, this question is the corresponding question for degree D is actually still open. We have made some progress, which I will describe. So Gotsman and Linial conjectured in 1994 that essentially the average sensitivity of any degree DPTF is at most D times written. And in particular that the symmetric function that slices the middle D layers of the cube is the worst case. So this function actually has attains its upper bounds. This question is still open for D course 2, but we are able to show something. We are able to make the first progress in joint work with Ragavendraz, Varidio and Tan [phonetic]. We're able to show an upper bound of from of N to the 1minus 1 over 5 D. So for every constant D this is nontrivial for D. I don't know. Like some power of log N. And we also have a very different proof that gives a better bound when this small, for example, for degree two we can get an upper bound of N to the three-quarters. >>: So is that bigger D as D goes to infinity? >> Ilias Diakonikolas: Yes. So it's like a hundred to the D. It's a hundred to the D. Okay. They want to mention that similar results will prove independently by these guys, Harsha, Klivans and Meka using relatively similar techniques. So, okay, we don't prove the conjecture, but we make the first -- the first progress on this open problem and actually this upper bound is suffice to get some nice learning applications which I will mention later. So the first application is to noise sensitivity. Eyal is here. So I don't know what should I say. Okay. What is the notion sensitivity of boolean function? I take -- so I flip each bit of the function independently with probability delta. So this corresponds to adding some random notion to the function. And I want to calculate the probability that the value of the function changes. Okay. So X is uniform here. And Y is obtained from X by flipping each bit with probability epsilon. Okay. This probability is like the notion sensitivity of F at noise rate epsilon. So this has been a very influential notion in several areas of mathematics. I guess most of you are more, are experts in some of these areas. So I won't say something more. So, okay, so this is the definition. Now, one basic observation is that if we have an upper bound for the noise sensitivity for every boolean function, then this translates relatively easy to an upper bound on the other sensitivity. So in particular this is essentially straightforward that the average sensitivity of F for any boolean function, is at most like order of N times the noise sensitivity of 1 over N. This is essentially trivial and true for every boolean function. So what we're able to show is a converse for the class of degree DPTFs. Okay, which is obviously not in general. So we prove that any upper bound on the average sensitivity of the degree DPTFs, translates to a similar upper bound on the noise sensitivity. And proven essentially inspired by Peres proof of linearity central functions. It's actually very similar. So this is a reduction. So what does it say? That if AS and comma D is a maximum average sensitivity of N degree D, PTF with N variables, then the noise sensitivity of any degree PTF noise rate epsilon is at most epsilon times the average sensitivity where N is 1 over epsilon. Okay. And so this yields an upper bound of essentially epsilon to the 1 over 5 D, the noise sensitivity of degree DPTF, first nontrivial upper bound on this quantity 2, so independent of N. And in particular for me this is interesting, mostly because it gives two learning applications. Essentially immediately. So one of them -- so the basic -- so I don't know if you're familiar with these things, with learning. But it's like straightforward that an upper bound to the noise sensitivity implies the for the concentration, implies that spectrum of the boolean function has very little mass above some level. In this kind of -- this kind of condition for conjecture concentration is related to learning algorithms. We can prove concentration for a class of functions and we can learn it roughly. Okay. And in particular our result implies that the class of degree DPTFs, for any constant D, is learnable in polynomial time in the agnostic model. The agnostic model is one of the most realistic models of learning that incorporates adversarial noise. Okay. If there is no noise, it was known that degree DPTFs are learnable. But in practice there is noise. And based on bound on noise sensitivity we can get this learning application, this is the first application. I can say more about it if you're interested, but essentially straightforward from nontechniques from the upper bound. The second application is part learning a class of circuits, in particular constant depth circuits, with super constant number of threshold gates. I don't know, perhaps this problem seems a bit obscure to you, but it has been an open problem in learning for a while, since at least 2000, to do this. It was known how to learn the class of functions, either it was one majority gate of the talk, but it wasn't known for arbitrary threshold gates. And actually there is an intrinsic difference between these two cases. Okay. So this is the result. >>: Can you go back to the previous slide? >> Ilias Diakonikolas: So it's not polynomial. It's exponential. But it's the best we know. The previous was trivial. So the idea here to learn the circuits is to give them with random restriction, what you will get will be a low degree ability of slight concentration at elementary machinery and the upper bound. It's not immediate. And actually not involved with this work. It's, I just mentioned, it's an application. >>: This is ->> Ilias Diakonikolas: Gopalan and Servidio. The slides were made in haste. I shall continue. I'll give you some rough idea of the proof. So basically the main open problem here is to actually prove the Gotsman Linial conjecture to prove that any degree DPTF the average sensitivity is at most this. At most D time. It would be nice, it's like very natural conjecture and I would like to solve it. It actually has some other implications, but I don't know if I have the time to go into it. I can discuss it later. Okay. So let me give you like a very rough idea of how the proof goes. And actually this pattern is a recipe, but applies essentially to all the problems. I'm talking about here related to degree DPTFs. So what is a degree DPTF? It's like the sign of a degree D multi-linear polynomial. So I want t analyze it. How will I analyze it? Well, I don't know. So what is natural to do? So if we know, if we excise the random variables in the polynomial were Gaussian, I would be able to say something. I know Gaussians are nice. I know many things about low degrees of polynomial over Gaussians. Unfortunately, this is not the case. So what we do is break it into two cases. First, consider the case of regular degree threshold functions where regular means it essentially behaves as if we are in the Gaussian setting, approximately, and then I try to reduce the general case to the regular case. So this reduction that will lose something, but that's life. So this is like the general recipe. It applies to this problem and also to many other problems. In particular, toward the problem we will discuss today. So again first solve the regular case, what I will follow up with what regular means. Then reduce the general case to the regular case. In particular, what is regular here? Regular means that I look at the influences. So I want to -- let me use a board for a bit. How much time do I have? >>: Still have time. >> Ilias Diakonikolas: Okay. I just want to -- there's lots of material. That's why I want to -- so I have the function F. Okay. The boolean function. Can you see this? That is the degree DPTF. So the sign of degree D polynomial. And I look at the influences of the variables in this polynomial. The influence of the variables in P. Okay. So if all these influences are small, then by using the invariance principle of [indiscernible] essentially relate to the distribution of P of X where X is Bernoulli to the distribution of P of Z, where it's Gaussian. So these distributions are close to each other, up to an area that depends on this tau, which is the result parameter. Since I have this, these two distributions are close, then I can deduce that P of X, okay, is actually underconcentrated. So the probability that it puts substantial mass in a small interval is small. The reason I know this is because I know this is true for low degree polynomial over Gaussians. So this is like a main, a main intuition that we use. And this essentially suffices to prove the result for the regular case. I want to go to the definition of the influence. I'm sure like all of you know already. Okay. So I mean the arguments under this constraint, the argument is simple. We can easily show by combination of like degree D general bound which follows by hypercontractivity and then under concentration bound which follows from invariance plus the same property for low degree polynomials, but the average sensitivity of low influence degree DPTFs is small. It's a very simple, very simple argument. The difficult thing is to actually do it in general. To actually reduce the general case to the regular case, and this requires like the new machinery. So how do we do this? >>: I think you went too fast. So how would you -- go back up one more slide. So everybody understands. One more slide. >> Ilias Diakonikolas: So are you interested in this? I think I can give it to you. I just don't think that I will have the time to do it. >>: So maybe off ->> Ilias Diakonikolas: Okay. I can go through it. So basically we break up -- so I want to upper bound the influence of the first variable, okay, the influence of the first variable in the threshold function F equals sine of P. What do we do? We write, we write the polynomial like this, and essentially by degree D threshold -- by degree D general bounds we know that the probability that this function has large [indiscernible] and the reason is because it's L2 norm. It's small. Essentially bounded by its influence. So the probability that this is big is small. This is the first and the second, these are the probabilities of the polynomial is very close, it's very close to any threshold that's also small. So by taking a unit bound we upper bound the influence of the first variable and multiply it by N and get the overall bounds. Make sense? Okay. So now let me say a few words about the reduction, which I think is the most interesting part, which is why I skipped the previous things. So what we do. Okay. So I have a general degree DPTF. And it's not regular. What does this mean? It means there's some variable in the polynomial P that defines this. It has large influence. So the structural lemma that we prove is this: Roughly. That there is a way to restrict a small number of variables such that the restricted some function is sufficiently regular with disprobability. In particular, there is a small set of variables roughly log N over tau where tau is a variable parameter such that a random restriction of a selected, of a set of variables of this size is tau times log N to the D regular for at least this fraction of the restriction, with probability that at least 1 over 2 to the D. And to do this we use some kind of, some new tool, what we call the critical index of like a degree D polynomial, which essentially quantifies how fast the influence is decreased. This is a bit technical, how to give you on the bordering ones. This is like the main structural result. So based on this, okay, we can just -- we can just do a case analysis, and get the upper bound on the other sensitivity. Okay. I think this is catching up. Any questions? >>: Random, do we know what items -- what items for the variables? >> Ilias Diakonikolas: Yes. So I order the variables according to their influence in decreasing influence. And then, you know, roughly I randomly restrict a subset of the most influential ones. And, you know, for, of course, another restrictions what I get will be relatively regular. What is difficult here? What is difficult is when I restrict variables, the influences change. But of course the polynomial has low degree, they don't change by a lot. That's rough intuition. That's not exact -- I'm lying here. There is a case where this cannot be done but it can be handled. Okay. So this sort of summarizes this work. Upper bounding the average sensitivity of the degree DPTFs. And I guess I mentioned the applications to learning. Now a strengthening of these results, strengthening of this like result of the sensitivity yields some kind of regularity lemma of PTFs, which I think is useful probablistically, but it's also like crucial for the second result which I'm going to talk about which is random writers for PTFs. So basically the result, what I tell you is this. So for any degree -- this is what we proved with Poisso Logan and Miank [phonetic] that for any invariable degree DPTF we can restrict a small set of variables such that for constant number of restrictions we get a regular PTF where the large parameter is worse than tau, it's like tau times polynomial log N. Note here there's a dependence of N of this bound and this is not good enough for some of the applications. And in particular for the pseudorandomness applications. However, we can strengthen it with a more careful analysis and get something like this. So we can restrict a set of D over tau many variables, okay, not dependence on N. And get for a constant number of restrictions a regularity tau times log 1 over tau to the D. Actually, it's not -it's not, you know, easy to go from here to here. But it can be done. Okay. So based on this super vision of the structural lemma we can get unlarity lemma for degree DPTFs which essentially says what? That any degree can be decomposed and as a decision tree of this depth, only a function of tau. Okay. And such that essentially most of the leaves of this tree are regular degree DPTFs. So this is the statement, how should I explain it a bit because I think it's important for the -- so what do we do? We start from a degree DPTFs and we carefully, you know, we carefully restrict variables. So we get a decision tree. We can do this for every boolean function. The important thing is that the degree DPTFs have some structure. This allows the depth of the tree to be constant only function of tau and the leaves who correspond to the restrictions of the variables on the corresponding path are regular. So this essentially allows us to reduce, you know, several questions on the degree DPTFs for the general case and the regular case, so this is essentially a reduction. Yes. >>: You know the influence with the algorithm for the decision tree each time true for the variable, has the highest influence? Restrict that? >> Ilias Diakonikolas: We don't construct exactly like this. We construct -- we construct recursively. We want this statement to be true with high probability. However, this previous lemma here it holds with constant probability 1 over 2 to the D. Unfortunately, what we have to do is apply the lemma recursively and for the good leafs we are okay every everybody recurse until the probability becomes very high. So it's not -- we don't take the first influence, most influential variables and do it once. We do it many times. So when we apply this lemma for a leaf, the influence have changed. >>: What I'm asking is just to compare this decision tree to the one where you first choose the most influential variable, fix it and recompute the influences, get the next most influential variable and so on. So that's also a way to make a decision tree, which would give them a new function, asking how that would solve a product compared to the decision tree. >> Ilias Diakonikolas: Yeah, I don't like -- I'm not exactly sure how you would preform. I really think the worst case this would not work, because essentially here we crucially use the fact that you restrict a specifically defined set of variables in every step. I will discuss this more individually with you if you're interested. I can tell you more how this works. So this lemma here allows us to do several things. So one thing is -- this is like a picture. So one thing it allows us is to get low weight approximators for degree DPTFs. We can approximate every degree DPTF over the distribution by another constant degree PTF that has weighted some function of epsilon times N to the D. This bound here on this dependence on N. N to the D is optimal for this problem. Now this problem is not raised by learning because low weight threshold functions are nice for several reasons. For example, algorithms like Perceptron and many heuristics they work better where they actually have probable guarantees for low degree DPTFs as opposed to general ones. Okay. And now let me move to the last part of my talk, which is about pseudorandom generators against PTFs. In particular, I'll show you something that I think must be interesting for you, seeings as you're doing probability. I'll show you a derandomization of a central lemma theory, bounded independence, at least a special case for the theorem. So actually let me go straight to this and then come back. Okay. So a version of the very essential theorem have a linear combination of [indiscernible] dependent. I normalize the coefficients. The sum of the square is 1. So if every weight is at most epsilon in absolute value, then I know that the, like the common goal of distance between the CDF of X, which is a linear combination of the Bernoulli's and the CDF of the Gaussian is at most epsilon. Okay. This is a very essential lemma theorem. Now, a special case of what I will show you shows that this theorem is true if we only assume that the bits in the Bernoulli, in the linear combination of Bernoullis have sufficient independence, 1 over epsilon squared independent. So for any possible distribution on the joint distribution on the YIs, that has this amount of independence, the Berry-Esseen theorem still holds. I don't know. I hope this is not known. And I believe it's not known in probability. Actually, I think a very special case of this was given by Benja Mingulvich [phonetic], who is here, and Palad. That's right, very special case of this was given in your paper. But this is like more general. So let me go back now and tell you how this central lemma theorem is related to threshold functions, linear threshold functions. The motivation for me -- the motivation for me is how important is randomness and computation and do we need randomness. In general, randomness is very useful, but to actually get perfect randomness is very hard. So many times we have faulty randomness or we want to analyze to be able to say, you know, to have performance is about the running time of the performance of randomized algorithms, assuming the randomness is faulty. It's like an entire area in the field of computation, it's called derandomization. I care about a specific subclass of this theory. In particular, what's the power of bounded independence against classes, against natural class of functions, and in particular against the threshold functions. Okay. So a distribution is called KYs independence, if any of its restrictions to like a subspace of K over variables is uniform in all these things. There are many constructions of such distribution, may explicit constructions that construct such distributions which support N to the K. And this is optimal. So this corresponds to number of random bits K log N as opposed to N but the uniform distribution applies. And the theorem we proved recently with Gopalan Jaiswal Servedio Viola is that any distribution of the cube that has sufficient independence has independence at least 1 over epsilon squared, roughly. It fools the class of linear threshold functions, meaning if we take the expectation of any such function under this distribution D, which is KYs independent, and the expectation of this function under the uniform distribution, then these are like -- these are epsilon close. So this class of functions cannot distinguish up to epsilon the fully uniform distribution from a KYs independent one if K is large enough. And know that K is completely independent of N here which is something that we expect. For like [indiscernible] iterations in this case. So this theorem is optimal up to the log squared. In particular something stronger is known, that even for the majority function in essentially 1 over epsilon squared independence, this was proved by Bezeman Golivech and Pillads [phonetic]. This theorem, I claim to give -- from this we get immediately the derandomization of a central lemma theorem. Because what is a linear threshold function? Is it like the sine of sub linear form? So essentially the probability that the probability is -- the CDF of this random variable X point T is actually the probability that the sine of the corresponding hash space is equal to minus 1. So this statement about fooling spaces is equivalent to this derandomization of the central lemma theorem. >>: Does this converge to the distribution variant for densities? >> Ilias Diakonikolas: For the densities? No. >>: [indiscernible]. >> Ilias Diakonikolas: The thing is that KYs independence, we have so much more support. Support will be uniform everywhere. It will be zero. For density, I don't know what to say. Okay. Now, I guess I hope this is interesting for you. It is definitely interesting in derandomization, because it gives the first explicit pseudorandom generator for this class of function. This has been an open problem for a while. Previously only special cases were known. And there are two like main open questions. One question is what happens for degree greater than 1. Does constant independence suffice, like only a function of D and epsilon? That's a main, that's the main open question from this. And also, you know, in the context of the randomization, it will be actually interesting to get a distribution support, polynomial in N and 1 over epsilon. So the distribution over the KYs distribution has support N to the K and therefore we can have full spaces support N to the 1 over epsilon squared. It will be interesting to actually get polynomial independence and epsilon, too. And I mean this is possible. Unless if it's not possible then please it's not equal to P. In complexity theory it's like a standard conjecture. Okay. So let me briefly describe some progress I recently have of this question. And the first question with fellow students, with Daniel Kane and Jennel Nelson [phonetic]. So Daniel is at Harvard and Jennel is at MIT. This work, part of this work happened in IBM Almaden last summer. So we actually wanted to prove bounded independence sufficed for the case of degree two PTFs. In particular same dependence, polynomial 1 over epsilon, and we actually introduced some interesting analytic techniques in this paper that I think will have other applications. So beyond anything -- these two might seem a bit specialized. And in fact we don't know how to prove it for generally, but we do know how to prove, we do know -- what we do know how to prove is this: That for any degree D to prove that bounded independence suffices to full degree DPTFs, it suffices to prove the statement for regular degree DPTFs. So because of regularity lemma I described before, the general question can be reduced to the regular, to the regular case. And in fact the most challenging part for this derandomization problem is actually to solve the regular case. It's like essentially S hard is the general case. Okay. And in order to be able to do this, I mean there is a standard way to approach this question. You need to construct some sandwiching polynomial with functions in a technical sense. Okay. I don't know if you're familiar with this condition. But it's like a standard duality question. To prove that boolean function F is fooled by KY independence, it suffices and is in fact necessary to find two polynomials Q upper and Q lower that have small degree, degree at most K. And they sandwich F from above and below on the domain, on the cube. And their expectation, the expectation of their difference and of the uniform distribution is at most epsilon. So the one direction of this equivalence, so here I guess I should say if and only if, is straightforward. Follows from the outward expectation. The other direction uses LP duality. We don't even need the other direction for the proof. We just need the straightforward one. Okay. So the main challenge for all these problems is to be able to construct this low degree polynomials with these properties for degree DPTFs. Okay. And in particular, for the dequals 1 case, for the case of linear threshold function cases, we do this: We construct an approximation to the univariate. So at this point it's polynomial N degree polynomials. And I want to construct -- I want to construct them somehow. So I do this using symmetricization. So I construct a good approximation to the univariate sine function under the Gaussian distribution, and then I just plug in W.X. So for the irregular case, I know W.X behaves like a Gaussian approximately. So this reduces essentially the problem from N dimensions to 1 dimension, it actually works in the linear case. I mean they're like many [indiscernible] to be worked out and use approximation theory to construct these univariate functions for the sine function. But this is a rough idea. Now for dequals 2, this univariate approach does not work. So we cannot construct univariate approximations to the sine function or to any like function of 1 variable does not suffice and we need to do much more. So we need to raise the problem to hard dimension. And this essentially is what we do in the paper with Kain and Nelson. This I think gives you like a very rough idea of the proof and let me give you some developments afterward. And you know this paper was essentially published, made available in February. So since then there have been a bunch of results on pseudorandomness for degree, for linear threshold functions and higher degree functions. In particular Mecka and Zuckerman were able to actually construct PRGs, pseudorandom generators, for degree DPTFs. However the generators are not based on bounded dependence, using some other specialized distribution. So the problem whether bounded dependence for degree is still relevant. I think it's like quite challenging. Gopalan and O'Donnell, Wu and Zuckerman, they generalized our results for the linear case to product distribution. For example, they get some more general. So the derandomization of the based theorem, when the exercise is not necessarily plus minus 1 Bernoulli, but they belong to some product distribution with some like general moment assumptions. Harshah Klivans Meka, they have results on intersection of spaces, actually the paper with Kain and Nelson from my paper had results in this, for such functions, too. And finally [indiscernible] they were able to say something about a special case degree DPTFs and I don't know much about the result. Okay. Again, so I think the message is again that in this case, too, what we do is we first solve the regular case and then we use the general case, regular case, so this is also like an example of the recipe to approach problems about low degree DPTFs. Do I have time? Five minutes, okay. So I guess let me give you a very rough idea of why, of how this univariate polynomial for the same function works. Again, the context is I want to follow linear threshold functions, and I want this equivalent to constructing some univariate polynomial that's a good approximation to the sine function, the uniform distribution. So how do I do this and what are the properties of this function? So imagine, imagine there is a line under the uniform distribution. So I want my polynomial to look sort of like this. Okay. So what are the properties of the polynomial? I have there are a lineup position to versions. First of all, this is the standard. This is what's important. So the first thing is that in the region where the Gaussian distribution puts most of its mass, the area between the sine function and my approximation is small, at most epsilon. Okay. I can guarantee this. The second is that in the area close to the origin, where the sine function just continues, you know there is at most constant. This is not that hard. However, I have to make -- I have to make this decision narrow enough so that it doesn't have lots of probability mass. But this section of the problem because I know the Gaussian distribution is concentrated, so the probability puts mass, some mass in an interval of length epsilon at most epsilon. And therefore if I make this interval narrow, I'm okay. And the third region is the region where where the polynomial diverges the area between the polynomial and the sine function is big. However, for that region, I'm also okay because since the polynomial has low degree, it cannot increase very fast. And I know that the K of the Gaussian distribution is very good. So these two things, you know, can be balanced. So essentially the contribution to the expectation in like this region is also small. Okay. And these are the properties of the polynomial. The way to construct it using some kind of combination of theorems from approximation theory. [applause] >>: I guess in the point of KY is dependence, the first K moments as if they were completely independent. So in your setting you get something out by just really moments, and you're playing into whatever is known about how close you are to normal first K moments sufficiently based to normal. So I guess I would probably ask to what extent you are doing something different, you're getting a better result out from what you get out just by looking at the moment. >> Ilias Diakonikolas: Yeah, I think so. I think the approach like trying to take approach in [indiscernible], I don't know how to make this work for the general case. >> Eyal Lubetzky: Thank the speaker. [applause]

18045 >> Eyal Lubetzky: Okay. Hi everyone. Today...

Related documents

Products

Support

18045 &gt;&gt; Eyal Lubetzky: Okay. Hi everyone. Today...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

18045 >> Eyal Lubetzky: Okay. Hi everyone. Today...