Friday, November 09, 2007

The December Issue of the Notices of the AMS

The December issue of the Notices of the AMS is now available online, and it includes letters written by Oded Goldreich, Boaz Barak, Jonathan Katz, and Hugo Krawczyk in response to Neal Koblitz's article which appeared in the September issue.

Despite this, the readers of the Notices remain the losers in this "controversy." Koblitz's petty personal attacks and straw man arguments appeared in the same space that is usually reserved, in the Notices, for expository articles and obituaries of mathematicians. It is from those pages that I learned about the Kakeya problem and about the life of Grothendieck (who, I should clarify, is not dead, except perhaps in Erdos' use of the word).

I find it strange enough that Koblitz would submit his piece to such a venue, but I find it as mind-boggling that the editors would run his piece as if they had commissioned Grothendieck's biographical article to a disgruntled ex-lover, who would focus most of the article on fabricated claims about his personal hygiene.

I can only hope that the editors will soon run on those pages one or more expository articles on modern cryptography, not as rebuttals to Koblitz's piece (which has already been discussed more than enough), but as a service to the readers.

And while I am on the subject of Notices article, let me move on to this article on how to write papers.

All beginning graduate students find the process of doing research mystifying, and I do remember feeling that way. (Not that such feelings have changed much in the intervening years.) One begins with a sense of hopelessness, how am I going to solve a problem that people who know much more than I do and who are smarter than me have not been able to solve?; then a breakthrough comes, out of nowhere, and one wonders, how is this ever going to happen again? Finally it's time to write up the results, and mathematical proofs definitely don't write themselves, not to mention coherent and compelling introductory sections. I think it's great when more experienced scholars take time to write advice pieces that can help students navigate these difficulties. And the number of atrociously badly written papers in circulation suggests that such pieces are good not just for students, but for many other scholars as well.

But I find that advice on "how to publish," rather than "how to write well" (like advice on "how to get a job" rather than "how to do research") misses the point (I am thinking of one of the few times I thought Lance Fortnow gave bad advice). For this reason, I found the first section of the Notices article jarring, and the following line (even if it was meant as a joke) made me cringe

I have written more than 150 articles myself. (...) I have never written an article and then been unable to publish it.

I think that this calls for an Umeshism in response.

Tuesday, November 06, 2007

The Impagliazzo Hard-Core-Set Theorem

The Impagliazzo hard-core set theorem is one of the bits of magic of complexity theory. Say you have a function $g:\{ 0, 1 \}^n \rightarrow \{ 0,1\}$ such that every efficient algorithm makes errors at least $1%$ of the times when computing $g$ on a random input. (We'll think of $g$ as exhibiting a weak form of average-case complexity.) Clearly, different algorithms will fail on a different $1%$ of the inputs, and it seems that, intuitively, there should be functions for which no particular input is harder than any particular other input, per se. It's just that whenever you try to come up with an algorithm, some set of mistakes, dependent on the algorithmic technique, will arise.

As a good example, think of the process of generating $g$ at random, by deciding for every input $x$ to set $g(x)=1$ with probability $99%$ and $g(x)=0$ with probability $1%$. (Make the choices independently for different inputs.) With very high probability, every efficient algorithm fails with probability at least about $1%$, but, if we look at every efficiently recognizable large set $H$, we see that $g$ takes the value 1 on approximately $99%$ of the elements of $H$, and so the trivial algorithm that always outputs 1 has a pretty good success probability.

Consider, however, the set $H$ of size $\frac {2}{100} 2^n$ that you get by taking the $\approx \frac{1}{100} 2^n$ inputs $x$ such that $g(x)=0$ plus a random sample of $\frac{1}{100} 2^n$ inputs $x$ such that $g(x)=1$. Then we can see that no efficient algorithm can compute $g$ on much better than $50%$ of the inputs of $H$. This is the highest form of average-case complexity for a boolean function: on such a set $H$ no algorithm does much better in computing $g$ than an algorithm that makes a random guess.

The Impagliazzo hard-core theorem states that it is always possible to find such a set $H$ where the average-case hardness is "concentrated." Specifically, it states that if every efficient algorithm fails to compute $g$ on a $\geq \delta$ fraction of inputs, then there is a set $H$ of size $\geq \delta 2^n$ such that every efficient algorithm fails to compute $g$ on at least a $\frac 12 - \epsilon$ fraction of the elements of $H$. This is true for every $\epsilon,\delta$, and if "efficient" is quantified as "circuits of size $s$" in the premise, then "efficient" is quantified as "circuits of size $poly(\epsilon,\delta) \cdot s$" in the conclusion.

The example of the biased random function given above implies that, if one wants to prove the theorem for arbitrary $g$, then the set $H$ cannot be efficiently computable itself. (The example does not forbid, however, that $H$ be efficiently computable given oracle access to $g$, or that a random element of $H$ be samplable given a sampler for the distribution $(x,g(x))$ for uniform $x$.)

A number of proofs of the hard core theorem are known, and connections have been found with the process of boosting in learning theory and with the construction and the decoding of certain error-correcting codes. Here is a precise statement.

Impagliazzo Hard-Core Set Theorem
Let $g:\{0,1\}^n \rightarrow \{0,1\}$ be a boolean function, $s$ be a size parameter, $\epsilon,\delta>0$ be given. Then there is a $c(\epsilon,\delta) = poly(1/\epsilon,1/\delta)$ such that the following happens.

Suppose that for every function $f:\{0,1\}^n \rightarrow \{0,1\}$ computable by a circuit of size $\leq c\cdot s$ we have

$Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta$

Then there is a set $H$ of size $\geq \delta 2^n$ such that for every function $f$ computable by a circuit of size $\leq s$ we have

$Pr_{x\in H} [ f(x) = g(x) ] \leq \frac 12 + \epsilon$

Using the "finitary ergodic theoretic" approach of iterative partitioning, we (Omer Reingold, Madhur Tulsiani, Salil Vadhan and I) are able to prove the following variant.

Impagliazzo Hard-Core Set Theorem, "Constructive Version"
Let $g:\{0,1\}^n \rightarrow \{0,1\}$ be a boolean function, $s$ be a size parameter, $\epsilon,\delta>0$ be given. Then there is a $c(\epsilon,\delta) = exp(poly(1/\epsilon,1/\delta))$ such that the following happens.

Suppose that for every function $f:\{0,1\}^n \rightarrow \{0,1\}$ computable by a circuit of size $\leq c\cdot s$ we have

$Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta$

Then there is a set $H$ such that: (i) $H$ is recognizable by circuits of size $\leq c\cdot s$; (ii) $|H| \geq \delta 2^n$, and in fact the number of $x$ in $H$ such that $g(x)=0$ is at least $\frac 12 \delta 2^n$, and so is the number of $x$ in $H$ such that $g(x)=1$; and (iii) for every $f$ computable by a circuit of size $\leq s$,

$Pr_{x\in H} [ g(x) = f(x) ] \leq max \{ Pr_{x\in H}[ g(x) = 0] , Pr_{x\in H} [g(x)=1] \} + \epsilon$

The difference is that $H$ is now an efficiently recognizable set (which is good), but we are not able to derive the same strong average-case complexity of $g$ in $H$ (which, as discussed as the beginning, is impossible in general). Instead of proving that a "random guess algorithm" is near-optimal on $H$, we prove that a "fixed answer algorithm" is near-optimal on $H$. That is, instead of saying that no algorithm can do better than a random guess, we say that no algorithm can do better than either always outputting 0 or always outputting 1. Note that this conclusion is meaningless if $g$ is, say, always equal to 1 on $H$, but in our construction we have that $g$ is not exceedingly biased on $H$, and if $\epsilon < \delta/2$, say, then the conclusion is quite non-trivial.

One can also find a set $H'$ with the same type of average-case complexity as in the original Impagliazzo result by putting into $H'$ a $\frac 12 \delta 2^n$ size sample of elements $x$ of $H$ such that $g(x)=0$ and an equal size sample of elements of $H$ such that $g$ equals 1. (Alternatively, put in $H'$ all the elements of $H$ on which $g$ achieves the minority value of $g$ in $H$, then add a random sample of as many elements achieving the majority value.) Then we recover the original statement except that $c(\epsilon,\delta)$ is exponential instead of polynomial.

Coming up next, the proof of the "constructive hard core set theorem" and my attempt at explaining what the techniques have to do with "finitary ergodic theory."