Tuesday, July 18, 2006

Average-case complexity

What do we do with an NP-complete (optimization) problem of practical relevance? We strongly believe that there is no polynomial time exact algorithm, but what are the next best alternatives? Garey and Johnson, in 1978, list various possible ways to "cope" with NP-completeness, including looking for approximate algorithms and for algorithms that are efficient on average. Of course, when a problem is really hard, even these options may be intractable, and it would be good to know when this is the case.

Before the development of PCP machinery, complexity theorists had almost no tool to study the complexity of approximations. When it comes to studying the average-case complexity of natural problems in NP, unfortunately, we are still at the stage of pre-PCP studies of approximation.

Indeed, it has been quite difficult even to lay out the ground rules of the study of average-case complexity. Even to define such basic notions as "computational problem," "efficient algorithm," "reduction," and "complete problem" in the setting of average-case complexity leads to surprisingly subtle difficulties.

Much of this foundational work was laid out in a famously succint paper of Levin's, with important extensions appearing in a paper of Ben-David and others and in a paper of Impagliazzo and Levin.

The main results and insights in this field as of 1995 are summarized in a paper by Russell Impagliazzo that is perhaps the best survey paper in theoretical computer science ever written.

Andrej Bogdanov and I have written a more recent survey on the subject. We are in the process of revising it for journal publication, and we will be very grateful to anybody pointing out errors and omissions (especially in the references).

So, given that we have no tools yet to study the average-case complexity of natural problems, what exactly do we talk about for 58 pages?

There are, indeed, a number of interesting results and insights.

  1. To begin with, if we have a language L and a distribution D over inputs, how do we define the notion that an algorithm A is "efficient on average" for L with respect to D? The simplest definition is to require the expected running time to be polynomial. Indeed, all expected polynomial time algorithms satisfy the intuitive notion of being "efficient on average," but there are algorithms of expected superpolynomial running time that are also intuitively "typically efficient." Suppose for example that, for every n, our algorithm takes time n2 on all inputs of length n, except one exceptional input on which the algorithm takes time 22n. Levin's definition better captures the intuitive notion and includes such "pathological" cases. In Levin's definition, an algorithm is "polynomial on average" if, after running for t(n) steps, the algorithm can solve all but a poly(n)/(t(n))c fraction of inputs of length n, for some c>0.

  2. Now that we have a notion of efficiency, the next step is to have the average-case analog of an NP-complete problem. Consider the "bounded halting" problem BH where, on input a pair (M,x) where M is a non-deterministic Turing machine and x is a string of length n, we ask whether M can accept x in at most, say, n2 steps. It's clear that BH is NP-complete. If L is a language in NP, and M is the machine that decides L, then the reduction from L to BH essentially maps x into (M,x). (We have to add some padding if M takes more than quadratic time to solve L, but the details are trivial.)

    Now let D be a distribution. Consider the reduction from L to BH that maps x into (M',C(x)) where C is an encoding function and M' is a machine that on input y first finds x such that C(x)=y and then simulates M on input x. As long as C is injective and polynomial time computable this is still a valid reduction. Levin shows that, if D comes from a certain class of "polynomial-time computable distributions," then we can find a C such that C(x) is "almost uniformly" distributed when x is sampled from D. This means that the reduction maps a random input from D into a pair (M,C(x)) that is "almost uniformly" distributed. (Technically, the resulting distribution has very high min-entropy or, which is the same, is "dominated" by the uniform distribution.) This means that an algorithm that is good-on-average for BH under the uniform distribution yields an algorithm that is good-on-average for L under the distribution D. Recall that L was an arbitrary problem in NP and D was an arbitrary "polynomial time computable" distribution.

  3. If there is a good-on-average algorithm for BH under the uniform distribution, then there is a good-on-average algorithm for the search version of BH as well. (In the search version, we want to find an accepting computation of M if one exists.) The equivalence is trivial in the worst-case setting, but it requires an ingenious proof due to Ben-David et al. in the average-case setting.

  4. If there is a good-on-average algorithm for BH under the uniform distribution, then there is a good-on-average algorithm for every problem L in NP under any polynomial time samplable disribution. This is a result of Impagliazzo and Levin which is a proper generalization of the completeness result above. Samplable distributions are more general than computable distributions and capture essentially all "natural" distributions.

  5. If there is a good-on-average algorithm for BH under the uniform distribution, can we say that there are worst-case efficient algorithms for all problems in NP? Not if the connection is made with a non-adaptive reduction. This is a result of Andrej and I, generalizing earlier work of Feigenbaum and Fortnow. Can we at least say that there are good-on-average algorithms for all L in NP and all distributions D? Such a conclusion implies worst-case efficient algorithms for NP, so the answer is again probably not.

  6. So it is probably impossible to equate worst-case complexity and average-case complexity of NP-complete problems via reductions. What about different "degrees" of average-case complexity? If we define good-on-average algorithms to always run in polynomial time and to make a bounded number of mistakes, we can quantify average-case complexity by the fraction of inputs on which the best polynomial time algorithms makes mistakes. O'Donnell shows the equivalence of strong versus weak assumptions about the existence of average-case-hard problems in NP. Ryan's proof brilliantly generalizes Yao's XOR Lemma.

Sunday, July 16, 2006

Empathy Ennui

Martin Kuz reports on the SF Weekly on the state of activism in San Francisco.