### Entropy and the weak Regularity Lemma

If you remember my discussion of Szemeredi's proof of the Szemeredi Regularity Lemma, the proof goes by progressively refining an initial arbitrary partition until one gets an eps-regular partition. A potential function is used to prove that the number of refinement steps is at most poly(1/eps). I said the potential function measures "variance" or "disorder" in the densities of edges between blocks of the partition. I used this terminology because I was aware of an alternate proof by Tao where the Regularity Lemma is reformulated as a statement about random variables and the proof uses conditional entropy (but not in a straightforward way) as a potential function. Tao's proof gives a stronger version of the Regularity Lemma, similar to another strong version that was proved by Alon, Fischer, Krivelevich and Szegedy, motivated by applications to property testing.

I still cannot understand the intuition of the stronger statement, the role of the parameter "F" in Tao's proof, and the role of the "fine" partition. The probabilistic interpretation, however, leads to an amusing three-line proof of the

Before stating the weak Regularity Lemma, let us define the notion of weakly regular partition. Let

We say that the partition

approximates

Finally we can state the weak Regularity Lemma

Note that the number of elements of the partition is singly exponential in 1/eps, instead of being a tower of exponentials. This is a considerably improvement, but the weaker property of the partition is not suitable for many applications, including to property testing and to additive combinatorics.

The observant reader should see how to modify the original proof to fit this new setting. (Start from an arbitrary partition, if it is not good, use the sets

What about the information-theoretic proof? It will be a simplification of Tao's proof to the case of the weak Regularity Lemma. Define the following random variables: X and Y are two independent and uniformly distributed vertices, and E is the 0/1 indicator of the event that (X,Y) is an edge.

We use the notation H(Z) for the entropy of a random variable Z, H(Z|W) as the entropy of Z conditioned on W, I(Z,W):= H(Z)-H(Z|W) for the

Finally, we come to the proof. We ask the question: Are there boolean functions f1,g1:V -> {0,1} such that I(E, (f1(X),g1(Y))) > eps? That is, after someone picks X and Y at random, is there one bit of information we can ask about X and one bit of information we ask about Y that helps us guess whether there is an edge between X and Y? Equivalently, are there sets S and T such that knowing whether X is in S and whether Y is in T helps us guess whether there is an edge between X and Y? If not, it can be checked that the number of edges between any two sets S and T essentially depends only on the size of S and T, and V itself is a weakly poly(eps)-regular partition. (It is a "partition" with only one block.)

Otherwise, we ask whether there are functions f2,g2 such that

If not, then for every two boolean functions f2,g2, once we know f1(X),g1(X),f1(Y),g1(Y), it does not help to know f2(X) and g2(Y) in order to guess whether (X,Y) is an edge. This means that if we partition V into 4 blocks (corresponding to possible values of f1,g1), then for every two sets S,T (corresponding to f2,g2), the number of edges between S and T essentially only depends on the sizes of the intersections of S and T with the four blocks of the partition. Hence, the partition is weakly regular.

Alternatively, we keep going, but, if we increase the mutual information by eps at each step, we cannot go for more than 1/eps steps. So there must be a k < 1/eps and 2k functions f1,...,fk,g1,...,gk such that, for all functions f,g, if we know the 4k bits f1(X),f1(Y),...,gk(X),gk(Y), then we gain less than another eps bits of information about E if we are also told f(X) and g(Y). Someone familiar with inequalities about entropy (this is the step I am missing) should be able to infer that if we partition V into the 2

Maybe this is not really three lines (in fact it is possibly more complicated than the "variance" argument), but I feel that one can

I still cannot understand the intuition of the stronger statement, the role of the parameter "F" in Tao's proof, and the role of the "fine" partition. The probabilistic interpretation, however, leads to an amusing three-line proof of the

*weak*Regularity Lemma. (As we shall see, unfortunately, one of the three lines is too hard for me to figure out.)Before stating the weak Regularity Lemma, let us define the notion of weakly regular partition. Let

*G=(V,E)*be a graph,*(V1,...,Vk)*be a partition of the vertices, and for any two blocks of the partition define the densityd(Vi,Vj) := e(Vi,Vj) / |Vi| * |Vj|where

*e(Vi,Vj)*is the number of edges between*Vi*and*Vj*; it is also convenient to have the notationd(Vi,Vi) := 2*e(Vi) / |Vi|where^{2}

*e(Vi)*is the number of edges within*Vi*.We say that the partition

*(V1,...,Vk)*is weakly eps-regular if for every two disjoint sets*S,T*of vertices we have that the quantitysum_{{i,j}}|V_i intersection S| * |V_j intersection T| * d(Vi,Vj)

approximates

*e(S,T)*up to an additive error of at most*eps |V|*.^{2}Finally we can state the weak Regularity Lemma

Weak Szemeredi Regularity Lemma.For every eps there is a c(eps) such that every graphG=(V,E)admits a weakly eps-regular partition(V1,...,Vk)wherek=2.^{{poly(1/eps)}}

Note that the number of elements of the partition is singly exponential in 1/eps, instead of being a tower of exponentials. This is a considerably improvement, but the weaker property of the partition is not suitable for many applications, including to property testing and to additive combinatorics.

The observant reader should see how to modify the original proof to fit this new setting. (Start from an arbitrary partition, if it is not good, use the sets

*S*and*T*to refine, and so on; now each refinement step increases the size of the partition only by a constant factor, and, as before, we only have poly(1/eps) refinements.)What about the information-theoretic proof? It will be a simplification of Tao's proof to the case of the weak Regularity Lemma. Define the following random variables: X and Y are two independent and uniformly distributed vertices, and E is the 0/1 indicator of the event that (X,Y) is an edge.

We use the notation H(Z) for the entropy of a random variable Z, H(Z|W) as the entropy of Z conditioned on W, I(Z,W):= H(Z)-H(Z|W) for the

*mutual information*of Z and W and I(Z,W|R):= H(Z|R)-H(Z|R,W) for the mutual information of Z and W conditioned on R.Finally, we come to the proof. We ask the question: Are there boolean functions f1,g1:V -> {0,1} such that I(E, (f1(X),g1(Y))) > eps? That is, after someone picks X and Y at random, is there one bit of information we can ask about X and one bit of information we ask about Y that helps us guess whether there is an edge between X and Y? Equivalently, are there sets S and T such that knowing whether X is in S and whether Y is in T helps us guess whether there is an edge between X and Y? If not, it can be checked that the number of edges between any two sets S and T essentially depends only on the size of S and T, and V itself is a weakly poly(eps)-regular partition. (It is a "partition" with only one block.)

Otherwise, we ask whether there are functions f2,g2 such that

I(E,(f1(X),g1(X),f1(Y),g1(Y),f2(X),g2(Y))> 2eps

If not, then for every two boolean functions f2,g2, once we know f1(X),g1(X),f1(Y),g1(Y), it does not help to know f2(X) and g2(Y) in order to guess whether (X,Y) is an edge. This means that if we partition V into 4 blocks (corresponding to possible values of f1,g1), then for every two sets S,T (corresponding to f2,g2), the number of edges between S and T essentially only depends on the sizes of the intersections of S and T with the four blocks of the partition. Hence, the partition is weakly regular.

Alternatively, we keep going, but, if we increase the mutual information by eps at each step, we cannot go for more than 1/eps steps. So there must be a k < 1/eps and 2k functions f1,...,fk,g1,...,gk such that, for all functions f,g, if we know the 4k bits f1(X),f1(Y),...,gk(X),gk(Y), then we gain less than another eps bits of information about E if we are also told f(X) and g(Y). Someone familiar with inequalities about entropy (this is the step I am missing) should be able to infer that if we partition V into the 2

^{2k}subsets corresponding to all possible values of fi and gi, such a partition must be eps'-weakly regular with eps'=poly(eps). (Probably about eps^{1/2}.)Maybe this is not really three lines (in fact it is possibly more complicated than the "variance" argument), but I feel that one can

*see*what is going on much better than in the "variance" argument.