## Monday, June 26, 2006

### Property testing and Szemeredi's Theorem

After discussing Szemeredi's Theorem and the analytical approaches of Roth and Gowers, let's see some ideas and open problems in the combinatorial approach.

The following result is a good starting point.
Triangle Removal Lemma. For every d>0 there is a constant eps(d)>0 such that if G is an n-vertex graph with at most eps(d)n3 triangles, then it is possible to make G triangle-free by removing at most dn2 edges.

This result follows easily from the Szemeredi Regularity Lemma, and it is a prototype of several results in the field of property testing.

Indeed, consider the problem of distinguishing triangle-free graphs from graphs that are not even close to being triangle-free. For the purpose of this example, we think of a graph as "close to being triangle-free" if it can be made triangle-free by removing at most dn2 edges, for some small constant d. Then the Lemma tells us that in a not-even-close graph we have at least eps(d)n3 triangles, and so a sample of O(1/eps(d)) vertices is likely to contain a triangle. So here is an algorithm for the problem: pick at random O(1/eps(d)) vertices and see if they induce a triangle. We note that:

• The algorithm satisfies our requirement;

• The running time is a constant independent of n and dependent only on d;

• this all makes sense only if 1/eps(d) grows moderatly as a function of 1/d.

Let us now see that the Triangle Removal Lemma proves Szemeredi's Theorem for sequences of length 3. As we discussed earlier, it is enough to prove the Theorem in groups of the type ZN, with N prime; we will do better and prove the Theorem in any abelian group. So let G be an abelian group of size N, an let A be a subset of G of size dN. Construct a tri-partite graph (X,Y,Z,E) where X,Y,Z are sets of vertices, each of size N, and each a copy of the group G. The set of edges is defined as follows:

1. for x in X and y in Y, (x,y) is an edge if there is a a in A such that x+a=y
2. for x in X and z in Z, (x,z) is an edge if there is a b in A such that x+b+b=z
3. for y in Y and z in Z, (y,z) is an edge if there is a c in A such that y+c=z

Now we notice that if x,y,z is a triangle, then there are a,b,c in A such that (after some cancelations) a+c = b+b, which means that a,b,c is an arithmetic progression of length 3 in the group G. In fact, we see that the number of triangles in the graph is precisely N times the number of triples (a,b,c) in arithmetic progression in A.

Consider now the N*|A| = dN2 triangles corresponding to the "trivial" progressions of the form a,a,a. (These are the triangles x,x+a,x+a+a.) We can see that these triangles are edge-disjoint, and so in order just to remove such triangles we have to remove from the graph at least dN2 edges. So the Triangle Removal Lemma implies that there are at least eps(d)N3 triangles in the graph, and so at least eps(d)N2 triples in arithmetic progression in A. If N > d/eps(d), then some of those arithmetic progressions must be non-trivial, and we have the length-3 case of Szemeredi's theorem.

We see that both for the "property testing" application and for the Szemeredi Theorem application it is important to have good quantitative bounds on eps(d). We know that, in Szemeredi's theorem, N must be super-polynomial in 1/d, and so, in the Triangle Removal Lemma, 1/eps(d) must be super-polynomial in 1/d.

The known bounds, however, are quite unreasonable: we only know how to bound 1/eps(d) by a tower of exponentials whose height is poly(1/d). Unfortunately, such unreasonable bounds are unavoidable in any proof of the Triangle Removal Lemma based on Szemeredi's Regularity Lemma. This is because, as proved by Gowers, such tower-of-exponentials bounds are necessary in the Regularity Lemma.

Can we find a different proof of the Triangle Removal Lemma that has only a singly- or doubly-exponential eps(d)? That would be wonderful, both for property testing (because the proof would probably apply to other sub-graph homomorphism problems) and for additive combinatorics. Over the last year, I have found at least three such proofs. Unfortunately they were all fatally flawed.

Or, is there a more-than-exponential lower bound for the Triangle Removal Lemma? This would also be a major result: it would show that several results in property testing for dense graphs have no hope of being practical, and that there is a "separation" between the quantitative version of Szemeredi's Theorem provable with analytical methods versus what can be proved with the above reduction. Besides, it's not often that we have super-exponential lower bounds for natural problems, with Gowers's result being a rare exception.

By the way, what about Szemeredi's Theorem for longer sequences? For sequences of length k one needs a "k-clique removal lemma" for (k-1)-uniform hypergraphs, which in turn can be derived from the proper generalization of the Regularity Lemma to hypergraphs. This turns out to be quite complicated, and it has been accomplished only very recently in independent work by Nagle, Rödl and Schacht; and by Gowers. An alternative proof has been given by Tao. The interested reader can find more in expository paper by Gowers and Tao.

What about Szemeredi's own proof? It does use the Regularity Lemma, which was conceived and proved specifically for this application, and it involves a reduction to a graph-theoretic problem. I have to admit that I don't know much else.