Saturday, June 03, 2006

Szemeredi's theorem

Szemeredi's theorem on arithmetic progressions is one of the great triumphs of the "Hungarian approach" to mathematics: pose very difficult problems, and let deep results, connections between different areas of math, and applications, come out as byproducts of the search for a solution.

(Tim Gowers's essay The two cultures of mathematics discusses brilliantly this "problem solving" way of doing math, as distinct from the "theory building" way.)

Some of the math surrounding Szemeredi's theorem has found applications in property testing, in PCP constructions, and in extractor constructions, and more future applications are likely. Besides, it is truly beautiful (and somewhat accessible) math. It can be a lot of fun for a computer scientist to delve into this literature.

In telling the story of Szemeredi's theorem, one usually starts from Van der Waerden's 1927 theorem that, no matter how you color the integers with a finite number of colors, there are arbitrarily long monochromatic arithmetic progressions. (An aritmetic progression is a sequence of the form a,a+b,a+2b,...,a+kb,....) In a more quantitative form, the theorem says that for every constants c,k there is an N(c,k) such that for every N>N(c,k), no matter how we color the integers {1,...,N} with c colors, there will be a monochromatic arithmetic progression of length k.

In 1936, Erdos and Turan conjectured that the coloring in Van der Waerden's theorem was just a "distraction," and that the "true reason" why the theorem is true is that every dense enough set of integers (in particular, the largest color class in Van der Waerden's theorem) must contain long arithmetic progressions. Specifically, they conjectured that for every density 0 < d <1 and every integer k there is an N(d,k) such that every subset A of {1,...,N} of cardinality dN contains a length-k arithmetic progression, provided N> N(d,k).

This conjecture became one of the great unsolved problems in Ramsey theory. In 1953, Roth proved the k=3 case of the Erdos-Turan conjecture. In Roth's proof, N(d,3) is doubly exponential in 1/d. In other words, Roth proves that a subset of {1,...,N} containing at least about N/log log N elements must contain a length-3 progression. The proof is analytical, and uses Fourier analysis. (Or the "circle method" as number theorists say.)

It was only in 1974 that Szemeredi proved the Erdos-Turan conjecture in the general case, and this is the result that is now known as Szemeredi's Theorem. Szemeredi's proof is combinatorial, and works via a reduction to a graph-theoretic problem. The Szemeredi regularity Lemma, that has several applications in computer science, is part of the proof. The value of even N(d,4) is a tower of exponentials in Szemeredi's proof; one only gets that a subset of {1,...,N} of size about N/log* log* n must contain a length-4 progression.

In 1977 Furstemberg found a completely different proof. He proved a "transfer theorem" showing that results about arithmetic progressions (and other related structures) can be derived from statements about certain continous objects, and then he proved the continous statements by "standard methods" in Ergodic Theory. The transfer theorem uses a result that requires the Axiom of Choice and so, while the proof establishes that finite values of N(d,k) exist, it is impossible, even in principle, to extract any quantitative bound.

Quantitative bounds are important because of the long-standing (and recently solved) question of whether the primes contain arbitrarily long arithmetic progressions. If one could show that a subset of {1,...,N} of size about N/log N contains long arithmetic progressions, then the result for the primes follows purely from the fact that the set of primes is dense enough.

There is a simple reduction that shows that, to prove Szemeredi's theorem over the integers, it is enough to prove it for the additive group ZN, with N prime. (That is, it is enough to look for arithmetic progressions mod N in {1,...,N} .) Also, if one looks at Roth's proof, it can be seen that it gives a much stronger bound in the additive group Fn3; in that setting, the proof shows that in every subset A of size about 3n/n there are three points in arithmetic progressions (that is, three points of the form a,a+b,a+b+b). The bound is of the form N/log N, where N is the size of the group. The proof, however, uses the fact that Fn3 is a linear space, that it has subspaces, and so on. Bourgain (1999) was able to "simulate" this proof in ZN by using the notion of "Bohr" set of ZN, and showing that it can play a role analogous to that of "subspace" in Fn3. Bourgain obtains a bound that is about N/sqrt(log N).

The great quantitative breakthrough came with the work of Gowers (1998-2001), who proved bounds of the form N/loglog N (as in Roth's proof) for progressions of any length. His work introduced the notion of Gowers uniformity I discussed two days ago.

So there are now three completely different proofs, Szemeredi's proof using graph theoretic arguments, Furstemberg's proof using Ergodic Theory, and Gowers's analytical proof. Recent work by Tao, however, shows that there are deep similarities between the proofs, although a complete "understanding" of what goes on is probably still to come.

There is also a new (2004-05) fourth proof, that uses a hypergraph regularity lemma, and that is similar in spirit to (but technically very different from, and simpler than) Szemeredi's proof. It also has deep relations with the the other three proofs. (See e.g. this paper.) It is an independent achievement of Nagle, Rödl, Schacht, and Skokan; Gowers; and Tao.

The most famous result on arithmetic progressions is probably the Green-Tao proof that the primes contain arbitrarily long arithmetic progressions.

The structure of the proof is quite amazing, and perhaps it contains ideas that could also be useful in computer science. They begin by defining a notion of pseudorandomness for sets of integers (which requires two conditions: Gowers uniformity and an additional properties). Then they show:

  1. Assuming Szemeredi's theorem as a black-box, the theorem is also true for subsets of pseudorandom sets. That is, if S is a large enough and pseudorandom enough set of integers, and A is a subset of A of constant density, then A must contain long arithmetic progressions. (Roughly speaking, a "distinguisher" that looks for arithmetic progressions cannot distinguish a dense subset of a pseudorandom set from a dense subset of all the integers. This is akin to the notion of "pseudo-entropy" in HILL, but the quantification is different.)

  2. The set of "pseudoprimes" (integers having only few, large, divisors) is pseudorandom in the integers. (This uses new results of Goldston and Yildirim.)

  3. The primes have constant density in the pseudoprimes.



Coming up next: Roth's proof for k=3, the structure of Gowers's proof for general k, the Green-Tao improvement for k=4, and the constellation of wonderful open questions around the k=3 and k=4 cases in finite vector spaces. (After that, back to politics, food, movies, and China.)

Thursday, June 01, 2006

Gowers uniformity

One of the three questions that were more frequently asked to me at STOC was why I did not define Gowers uniformity in my talk on this joint work with Alex Samorodnitsky. Is it because the definition is very complicated? Not at all, it is just that the definition seems very arbitrary unless one has the time to tell some stories about it and to look at a few equivalent versions of the definition and some of its basic properties.

My favorite source is this paper by Green and Tao; the first 6-7 sections are a wonderful review of what is known. Another great source is this survey paper by Green, that presents several open questions that could be of interest to us, especially the question of how large can a subset of F3n be so that no three points are on a line. (The big question is whether the asnwer is (3-o(1))n or not.) There is also a question related to "linearity testing," and I will probably talk about it in another post. Green has also written a more recent survey paper that is more limited in scope but that has more technical details.

In this post, I will tell a story about Gowers uniformity in the computer science language of linearity testing and low degree test for Boolean functions, which may be the gentlest introduction for many readers. I will almost exclusively refer to Boolean functions, but the theory applies more generally to bounded complex-valued functions whose domain is an Abelian group. (Some definitions/results, however, are more simply stated in the special case of Boolean functions.)

Umesh often complains that, during technical talks, the suspense kills him, meaning that the speaker goes on and on with technical definitions and results, without giving any indication of where he or she is going.

To spoil all suspense: the eventual goal is to have a "higher degree" Fourier analysis that allows us to talk about correlations of a given function with low-degree polynomials. (In contrast with Fourier analysis that, in the Boolean case, talks about correlations of a given function with linear functions.) For every fixed k, we will define a "dimension-k" norm for Boolean functions; such norms are conjectured to "approximate" the correlation of the function with the closest degree-(k-1) polynomial. The conjecture is a theorem for k=2 (by Fourier analysis) and for k=3 (by a tour de force), and it is open otherwise. What is known, however, is that functions with small norm have several useful "pseudorandomness" properties, and that functions with noticable large norm have useful "structure." (Low-degree-polynomial-like structure.) Most proofs that make use of Gowers uniformity have a two-part structure: if the norm is small, we get what we want from the pseudorandomness; if the norm is large, we get what we want from the structure. In our work on PCP, for example, when we analyse the soundness, we can argue that if the PCP proof is "pseudorandom" then the verifier is about as likely to accept it as it is to accept a random proof (hence, very low probability), but a PCP proof with "structure" can be "decoded" to a valid witness (hence, this cannot happen in the soundness case).

Let's get started with the definition. If
f:{0,1}n -> {0,1}

is a Boolean function, and y is an element of {0,1}n, let us define
fy (x) := f(x) - f(x+y)

where all operations are mod 2 (or bit-wise mod 2). We think of fy as a sort of "derivative" of f. In particular, if f is a constant then fy is identically 0 for every y, and if f is a degree-d polynomial then fy is a degree-(d-1) polynomial.

We also define fy,z as (fy)z, and so on.

Now, consider the following test of "non-constancy:" pick at random x,y, and check whether fy(x)=0. If f is a constant, then the test accepts with probability 1; it is also possible to argue that the test accepts with probability that is always at least 1/2, and that it accepts with probability 1/2+eps if and only if f has agreement 1/2 + eps' with a constant function.

This may not be too exciting, but consider the next "logical" step, that is, pick x,y,z at random, and check if fy,z(x)=0. If f is a degree-1 polynomial (an affine function), then the test accepts with probability 1. Also, it can be shown that the test accepts with probability at least 1/2. The interesting thing, which is easy to prove using Fourier analysis, is that the test accepts with probability noticeably larger than 1/2 if and only if f has agreement noticeably larger than 1/2 with a degree-1 function. Indeed, the test checks whether
f(x) - f(x+y) -f(x+z) + f(x+y+z)=0

and, up to a change of variables, this is the same as checking if
f(x) + f(y) +f(z) = f(x+y+z)

a slight variant of the Blum-Luby-Rubinfeld linearity test.

So you see where this is going: pick x,x1,...,xk at random and check if fx1,...,xk(x) =0. If f is a degree-(k-1) polynomial then the test accepts with probability 1.

Alon, Kaufman, Krivelevich, Litsyn and Ron proved in Random'03 that if the test accepts with probability 1-eps then f has agreement 1-O(2k eps) with a degree-(k-1) polynomial. Samorodnitsky, in a manuscript that predates our PCP work, proves the more challenging "low-end" result that, for k=3, if the test accepts with probability 1/2 + eps then the function has agreement 1/2 + eps' with a degree-2 polynomial.

The low-end result is open for k=4! Furthermore, for k=3, eps' is exponentially small in eps. It is an open question to have a polynomial dependency.

So, what about these Gowers uniformity norms? Take the probability that the test with x1...xk accepts, subtract 1/2 from the probability, and multiply by 2, so you get a number between 0 and 1. This is the dimension-k Gowers uniformity of f; Alex and I use the notation Uk(f). The 2kth root of the number I just described is the dimension-k Gowers uniformity norm of f, denoted ||f|| with subscript Uk (sorry, HTML does not allow subscripts of subscripts).

What's with this subtracting 1/2 and multiplying by 2 business? The analytical study of Boolean functions is often cleaner if we think of them as functions
f: {0,1}n -> {-1,1}

where bit b is replaced by bit (-1)b in the output. With this notation, we can define
fy (x) := f(x) * f(x+y)

and now the dimension-k uniformity can be given the following equivalent, but much cleaner, definition:
Uk (f) := Ex,x1,...,xk fx1...xk (x)

Now the advantage is that the definition can be applied to any real-valued function (which is often useful), not just Boolean functions. The 2kth root of Uk (f) is a norm for the space of real-valued functions of domain {0,1}n. The definitions extends to functions f: G-> R where G is any Abelian group (we just need to re-interpret the + in the definition) and R are the real numbers. The definition can also be extended to complex-valued functions, with a small change to the definition of fy.

With this notation, Alex proves that, for Boolean f, U3(f) is noticeably large if and only if f has noticeable correlation with a degree-2 polynomial. Green and Tao, in their 72-page paper I referenced above, prove an analogous result for bounded functions f:G->C where G is an Abelian group of order not divisible by 2 or by 3, and C are the complex number. (If the group contains a large cyclic subgroup, then it is quite messy to even state the theorem. If G is of the form Fnq where n is large compared with q, then the statement is simply about noticeable norm versus correlation with polynomials.)

And, to summarize again the hundreds of pages of mathematical knowledge we have about these norms, if the norm is small, the function is kind of pseudorandom; if the norm is not small, then the function is kind of "structured," in a way that sort of resembles a low-degree polynomial. It is open to turn "sort of resembles" into "is correlated with." (And it cannot be done if the domain of the function is a group with a large cyclic subgroup -- in that case, there are more complicated conjectures.)

Sunday, May 28, 2006

Talk talk China

If state's support and Andy's energy do not wane, Andy Yao's institute in Beijing might become, over time, the next Weizmann: one of the best places for theory in the world, the first choice for Chinese nationals who complete their PhD in the US, and an attractive destination for non-Chinese for their post-doc/ sabbatical/ summer. Meanwhile, it is an exciting place to be, with a palpable sense that great things are going to happen. (And I don't remember if I have already mentioned that the food is good.)

As I anticipate many happy returns, I have been reading a bit about China, and I have enjoyed reading talk talk China. It is written by three expats, living in Beijing, China and Hong Kong. Some entries were written when one of them lived in Shanghai. It is mostly devoted to ranting about China, so many posts may be offensive to many people, but, through the ranting, they say very interesting things about China, about foreigners in China, and about the Chinese.

Their claim to fame is this post about the "lao wai death stare," which I experienced even in my very short stay. This , this, this, and this sound familiar too.

Apparently, it is very difficult for a foreigner, or even for a Chinese-born white person, to feel accepted. The comments on that post are particularly interesting, because many posters make a comparison with Taiwan and speculate on how things may change in China. (Or, in the rest of China, if you prefer.)

Indeed, when I arrived at the customs/immigration at Beijing airport, I felt something was strange with the signs pointing to the lines for "Chinese nationals" and for "Foreigners," and then I realized I had never seen the word "foreigner" anywhere else in this context. In Europe it's always "EU Nationals" and "Non-EU Nationals," in the US it is "US Citizens" and "Visitors" (even though, in the paperwork, I am an alien, which is about the worst term they could possibly choose) and so on. My impression is that a lot of this has to do with there being no notion of "politically correct speech" in China (yet). "I like foreigners, with their big noses and sunken eyes" someone told me in Beijing, presumably as a compliment. I am not saying it is a bad thing, I am not a big fan of PC-speech myself. (But this is a subject that would need its own post.)

I had the most fun reading this post about English names. (Warning: even the title of the post is offensive.) In Beijing I met a Beckham and a Gerrard (they are names of football players), and in Taipei I met a Bevis, but this is nothing compared to the examples given by the original poster and the commenters. Of course, now that I am considering "reckless card" as my Chinese name, the joke is on me.