Thursday, December 14, 2006

Zeilberger on why P differs from NP

Scott Aaronson, Lance Fortnow and Bill Gasarch have discussed the reasons why they believe P differs from NP here, here and here.

Doron Zeilberger, motivated by Avi Wigderson's talk in Madrid devotes his latest opinion to the subject. (Via Luca A.)

His dismissal of the creativity cannot be mechanized argument is based on his long-standing belief (which I share) that human creativity, especially human mathematical creativity can in fact be emulated by algorithms and that, in the long run, algorithms will end up being superior. I think this is a misunderstanding of the argument, whose point is rather that creating something good (by humans and by algorithms) seems to require much more effort than appreciating something good, and that there are levels of "genius" which we would be able to recognize if we saw them but that we are prepared to consider infeasible. In the end, this is the same as the a fool can ask more questions than a wise man can answer argument that Zeilberger himself proposes.

Then there is the issue of whether it is fair to say that "P $\neq$ NP is a statement that affirms its own intractability." Indeed, the P versus NP question is a statement about asymptotics, while proving it is a problem of finite size.

I have two observations.

One is that the "natural proofs" results show that, assuming strong one-way functions exist (an assumption in the "ballpark" of P $\neq$ NP) there are boolean functions that are efficiently computable but have all the efficiently computable properties of random functions. This means that any circuit lower bound proof must work in a way that either would fail when applied to random functions (and there are reasons why it is difficult to come up with such proofs) or would rely on hard-to-compute properties of the function in question. So although the proof is a finite object, it does define an "algorithm" (the one that describes the properties of the function that are used in the proof) and such algorithm cannot be asymptotically efficient.

The other is that, however cleaner our theories are when formulated asymptotically, we should not lose sight of the fact that the ultimate goals of complexity theory are finite results. It will be a historic milestone when we prove that P $\neq$ NP by showing that SAT requires time at least $2^{-300} \cdot n^{(\log n) / 100 }$, but the real deal is to show that there is no circuit of size less than $2^{300}$ that solves SAT on all formulae having 10,000 clauses or fewer. The statements that we care about are indeed finite.

Wednesday, December 13, 2006

Russian humor

The January issue of the Notices of the AMS is out, and it contains an article by Anatoly Vershik on the Clay Millenium prize. You may remember a discussion we had on the wisdom of awards in general, in the context of the AMS idea of creating a fellows program and of the pros and cons of having a best paper award at STOC and FOCS.

Vershik returns to some of the standard points in this debate, and makes a few new ones. Although the tone of the article is completely serious, there are hints of deadpan humor (especially in the way he characterizes the opinions of others).

There is really no connection, but I was reminded of Closing the collapse gap, the hilarious presentation by Dmitry Orlov (found via the Peking Duck), in which he argues that when the U.S. economy will collapse, we will be much less prepared than the Russians were, and we will be in much deeper troubles. As stated in the comments at the Peking Duck, when Russians are deadly serious about something, they deal with it through dark humour.

Tuesday, December 12, 2006

Eigenvalues and expansion

Before trying to describe how one reasons about the eigenvalues of the adjacency matrix of a Cayley graph, I would like to describe the "easy direction" of the relationships between eigenvalues and expansion.

First, a one-minute summary of linear algebra. If $A$ is an $n\times n$ matrix, then a (column) non-zero vector $x$ is an eigenvector of $A$ provided that, for some scalar $\lambda$, we have $Ax= \lambda x$. If so, then $\lambda$ is an eigenvalue of $A$. A couple of immediate observations:

  • The equation $Ax=\lambda x$ can be rewritten as $(A-\lambda I)x = 0$, so if $\lambda$ is an eigenvalue of $A$ then the columns of the matrix $A-\lambda I$ are not linearly independent, and so the determinant of $A-\lambda I$ is zero. But we can write $det(A-\lambda I)$ as a degree-$n$ polynomial in the unknown $\lambda$, and a degree-$n$ polynomial cannot have more than $n$ roots.

    So, an $n\times n$ matrix has at most $n$ eigenvalues.

  • If $\lambda$ is an eigenvalue of $A$, then the set of vectors $x$ such that $Ax=\lambda x$ is a linear space. If $\lambda_1$ and $\lambda_2$ are different eigenvalues, then their respective spaces of eigenvectors are independent (they only intersect at $0$).


If $A$ is a symmetric real-valued matrix, then a series of miracles (or obvious things, depending on whether you have studied this material or not) happen:


  • All the roots of the polynomial $det(A-\lambda I)$ are real; so, counting multiplicities, $A$ has $n$ real eigenvalues.
  • If $\lambda$ is an eigenvalue of $A$, and it is a root of multiplicity $k$ of $det(A-\lambda I)$, then the space of vectors $x$ such that $Ax=\lambda x$ has dimension $k$.
  • If $\lambda_1 \neq \lambda_2$ are two different eigenvalues, then their respective spaces of eigenvectors are orthogonal


From this fact, it is easy to conclude that starting from symmetric real matrix $A$ we can find a sequence of $n$ numbers

$\lambda_1\geq \ldots \geq \lambda_n$

and a sequence of $n$ unit vectors $v_1,\ldots,v_n$ such that the $v_i$ are orthogonal to each other and such that $A v_i = \lambda_i v_i$.

Now the claim is that these $n$ numbers and $n$ vectors completely describe $A$. Indeed, suppose that we are given a vector $x$ and that we want to compute $Ax$. First, we can write $x$ as a linear combination of the $v_i$:

$x= \alpha_1 v_1 + \cdots \alpha_n v_n$

and since the $v_i$ are an orthonormal basis, we know what the coefficients $\alpha_i$ are like: each $\alpha_i$ is just the inner product of $x$ and $v_i$.

Now, by linearity, we have

$Ax = \alpha_1 A v_1 + \cdots \alpha_n A v_n$

and, since the $v_i$ are eigenvectors,

$= \alpha_1 \lambda_1 v_1 + \cdots + \alpha_n \lambda_n v_n$

So, once we express vectors in the basis defined by the $v_i$, we see that multiplication by $A$ is just the effect of multiplying coordinate $i$ by the value $\lambda_i$.

But there is more. Recall that $\alpha_i = v_i^T x$ and so we can also write

$Ax = (\lambda_1 v_1^T v_1) x + \cdots + (\lambda_n v_n^T v_n) x$

and so, in particular, matrix $A$ is just

$A = \lambda_1 v_1^T v_1 + \cdots + \lambda_n v_n^T v_n $

a sum of $n$ rank-1 matrices, with the coefficients of the sum being the eigenvalues.

Let now $A$ be the adjacency matrix of a $d$-regular undirected graph (so $A$ is real-valued and symmetric), $\lambda_1\geq \cdots \geq \lambda_n$ be its eigenvalues, and $v_1,\ldots,v_n$ be a corresponding set of orthonormal eigenvectors.

It is not hard to see (no, really, it's two lines, see the bottom of page 2 of these notes) that all eigenvalues are between $d$ and $-d$. Also, $d$ is always an eigenvalue, as shown by the eigenvector $(1,\ldots,1)$. This means that $\lambda_1 = d$ and we can take

$v_1 = \left( \frac 1 {\sqrt n},\cdots, \frac 1 {\sqrt n} \right)$

and we have

$A = (d v_1^T v_1) x + \cdots + (\lambda_n v_n^T v_n) x$

Suppose now that all the other $\lambda_i$, $i=2,\ldots,n$ are much smaller than $d$ in absolute value. Then, intuitively, the sum is going to be dominated by the first term, and $A$ behaves sort of like the matrix $(d v_i^T v_i)$, that is, the matrix that has the value $d/n$ in each entry.

You may be skeptical that such an approximation makes sense, but see where this leads. If our graph is $G=(V,E)$, and $(S,V-S)$ is a cut of the set of vertices into two subsets, let $1_S$ be the $n$-dimensional vector that is 1 in coordinates corresponding to $S$ and 0 in the other coordinates, and define $1_{V-S}$ similarly. Then the number of edges crossing the cut is precisely

$1_S^T A 1_{V-S}$

and if we replace $A$ by the matrix that has $d/n$ in each coordinate we get the approximation $|S|\cdot|V-S|\cdot d /n$, which is the expected number of edges that would cross the cut if we had a random graph of degree $d$. This is pretty much the right result: if $|\lambda_2|,\ldots, |\lambda_n|$ are all much smaller than $d$, and if $|S|$ and $|V-S|$ are at least a constant times $n$, then the number of edges crossing the cut is indeed very close to $|S|\cdot |V-S| \cdot d/n$. This result is known as the Expander Mixing Lemma, and the proof is just a matter of doing the "obvious" calculations. (Write $1_S$ and $1_{V-S}$ in the eigenvector basis, distribute the product, remember that the $v_i$ are orthogonal, use Cauchy-Schwarz ...)

It is perhaps less intuitive that, provided that all other eigenvalues are at least a little bit smaller than $d$, then a reasonably large number of edges crosses the cut. In particular, if $|S| \leq n/2$ and if $\lambda_2 \leq d-\epsilon$, then at least $\epsilon |S|/2$ edges cross the cut. That is, if all eigenvalues except the largest one are at most $d-\epsilon$ then the edge expansion of the graph is at least $\epsilon/2$. (It takes about five lines to prove this, see the beginning of the proof of Thm 1 in these notes.)

To summarize: the adjacency matrix of an undirected graph is symmetric, and so all eigenvalues are real. If the graph is $d$-regular, then $d$ is the largest eigenvalue, and all others are between $d$ and $-d$. If all the others are between $d-\epsilon$ and $-d$, then the graph has edge-expansion $\epsilon/2$ (or better).

Sunday, December 10, 2006

Madonna and Letterman (1994)

As a break from expanders and groups, here is Madonna's 1994 notorious appearence on Letterman's show.



The incident has its own Wikipedia entry, one more piece of evidence that Wikipedia is superior to the Encyclopaedia Britannica.