Entropy and the weak Regularity Lemma
If you remember my discussion of Szemeredi's proof of the Szemeredi Regularity Lemma, the proof goes by progressively refining an initial arbitrary partition until one gets an eps-regular partition. A potential function is used to prove that the number of refinement steps is at most poly(1/eps). I said the potential function measures "variance" or "disorder" in the densities of edges between blocks of the partition. I used this terminology because I was aware of an alternate proof by Tao where the Regularity Lemma is reformulated as a statement about random variables and the proof uses conditional entropy (but not in a straightforward way) as a potential function. Tao's proof gives a stronger version of the Regularity Lemma, similar to another strong version that was proved by Alon, Fischer, Krivelevich and Szegedy, motivated by applications to property testing.
I still cannot understand the intuition of the stronger statement, the role of the parameter "F" in Tao's proof, and the role of the "fine" partition. The probabilistic interpretation, however, leads to an amusing three-line proof of the weak Regularity Lemma. (As we shall see, unfortunately, one of the three lines is too hard for me to figure out.)
Before stating the weak Regularity Lemma, let us define the notion of weakly regular partition. Let G=(V,E) be a graph, (V1,...,Vk) be a partition of the vertices, and for any two blocks of the partition define the density
We say that the partition (V1,...,Vk) is weakly eps-regular if for every two disjoint sets S,T of vertices we have that the quantity
approximates e(S,T) up to an additive error of at most eps |V|2.
Finally we can state the weak Regularity Lemma
Note that the number of elements of the partition is singly exponential in 1/eps, instead of being a tower of exponentials. This is a considerably improvement, but the weaker property of the partition is not suitable for many applications, including to property testing and to additive combinatorics.
The observant reader should see how to modify the original proof to fit this new setting. (Start from an arbitrary partition, if it is not good, use the sets S and T to refine, and so on; now each refinement step increases the size of the partition only by a constant factor, and, as before, we only have poly(1/eps) refinements.)
What about the information-theoretic proof? It will be a simplification of Tao's proof to the case of the weak Regularity Lemma. Define the following random variables: X and Y are two independent and uniformly distributed vertices, and E is the 0/1 indicator of the event that (X,Y) is an edge.
We use the notation H(Z) for the entropy of a random variable Z, H(Z|W) as the entropy of Z conditioned on W, I(Z,W):= H(Z)-H(Z|W) for the mutual information of Z and W and I(Z,W|R):= H(Z|R)-H(Z|R,W) for the mutual information of Z and W conditioned on R.
Finally, we come to the proof. We ask the question: Are there boolean functions f1,g1:V -> {0,1} such that I(E, (f1(X),g1(Y))) > eps? That is, after someone picks X and Y at random, is there one bit of information we can ask about X and one bit of information we ask about Y that helps us guess whether there is an edge between X and Y? Equivalently, are there sets S and T such that knowing whether X is in S and whether Y is in T helps us guess whether there is an edge between X and Y? If not, it can be checked that the number of edges between any two sets S and T essentially depends only on the size of S and T, and V itself is a weakly poly(eps)-regular partition. (It is a "partition" with only one block.)
Otherwise, we ask whether there are functions f2,g2 such that
If not, then for every two boolean functions f2,g2, once we know f1(X),g1(X),f1(Y),g1(Y), it does not help to know f2(X) and g2(Y) in order to guess whether (X,Y) is an edge. This means that if we partition V into 4 blocks (corresponding to possible values of f1,g1), then for every two sets S,T (corresponding to f2,g2), the number of edges between S and T essentially only depends on the sizes of the intersections of S and T with the four blocks of the partition. Hence, the partition is weakly regular.
Alternatively, we keep going, but, if we increase the mutual information by eps at each step, we cannot go for more than 1/eps steps. So there must be a k < 1/eps and 2k functions f1,...,fk,g1,...,gk such that, for all functions f,g, if we know the 4k bits f1(X),f1(Y),...,gk(X),gk(Y), then we gain less than another eps bits of information about E if we are also told f(X) and g(Y). Someone familiar with inequalities about entropy (this is the step I am missing) should be able to infer that if we partition V into the 22k subsets corresponding to all possible values of fi and gi, such a partition must be eps'-weakly regular with eps'=poly(eps). (Probably about eps1/2.)
Maybe this is not really three lines (in fact it is possibly more complicated than the "variance" argument), but I feel that one can see what is going on much better than in the "variance" argument.
I still cannot understand the intuition of the stronger statement, the role of the parameter "F" in Tao's proof, and the role of the "fine" partition. The probabilistic interpretation, however, leads to an amusing three-line proof of the weak Regularity Lemma. (As we shall see, unfortunately, one of the three lines is too hard for me to figure out.)
Before stating the weak Regularity Lemma, let us define the notion of weakly regular partition. Let G=(V,E) be a graph, (V1,...,Vk) be a partition of the vertices, and for any two blocks of the partition define the density
d(Vi,Vj) := e(Vi,Vj) / |Vi| * |Vj|where e(Vi,Vj) is the number of edges between Vi and Vj; it is also convenient to have the notation
d(Vi,Vi) := 2*e(Vi) / |Vi|2where e(Vi) is the number of edges within Vi.
We say that the partition (V1,...,Vk) is weakly eps-regular if for every two disjoint sets S,T of vertices we have that the quantity
sum{i,j} |V_i intersection S| * |V_j intersection T| * d(Vi,Vj)
approximates e(S,T) up to an additive error of at most eps |V|2.
Finally we can state the weak Regularity Lemma
Weak Szemeredi Regularity Lemma. For every eps there is a c(eps) such that every graph G=(V,E) admits a weakly eps-regular partition (V1,...,Vk) where k=2{poly(1/eps)}.
Note that the number of elements of the partition is singly exponential in 1/eps, instead of being a tower of exponentials. This is a considerably improvement, but the weaker property of the partition is not suitable for many applications, including to property testing and to additive combinatorics.
The observant reader should see how to modify the original proof to fit this new setting. (Start from an arbitrary partition, if it is not good, use the sets S and T to refine, and so on; now each refinement step increases the size of the partition only by a constant factor, and, as before, we only have poly(1/eps) refinements.)
What about the information-theoretic proof? It will be a simplification of Tao's proof to the case of the weak Regularity Lemma. Define the following random variables: X and Y are two independent and uniformly distributed vertices, and E is the 0/1 indicator of the event that (X,Y) is an edge.
We use the notation H(Z) for the entropy of a random variable Z, H(Z|W) as the entropy of Z conditioned on W, I(Z,W):= H(Z)-H(Z|W) for the mutual information of Z and W and I(Z,W|R):= H(Z|R)-H(Z|R,W) for the mutual information of Z and W conditioned on R.
Finally, we come to the proof. We ask the question: Are there boolean functions f1,g1:V -> {0,1} such that I(E, (f1(X),g1(Y))) > eps? That is, after someone picks X and Y at random, is there one bit of information we can ask about X and one bit of information we ask about Y that helps us guess whether there is an edge between X and Y? Equivalently, are there sets S and T such that knowing whether X is in S and whether Y is in T helps us guess whether there is an edge between X and Y? If not, it can be checked that the number of edges between any two sets S and T essentially depends only on the size of S and T, and V itself is a weakly poly(eps)-regular partition. (It is a "partition" with only one block.)
Otherwise, we ask whether there are functions f2,g2 such that
I(E,(f1(X),g1(X),f1(Y),g1(Y),f2(X),g2(Y))> 2eps
If not, then for every two boolean functions f2,g2, once we know f1(X),g1(X),f1(Y),g1(Y), it does not help to know f2(X) and g2(Y) in order to guess whether (X,Y) is an edge. This means that if we partition V into 4 blocks (corresponding to possible values of f1,g1), then for every two sets S,T (corresponding to f2,g2), the number of edges between S and T essentially only depends on the sizes of the intersections of S and T with the four blocks of the partition. Hence, the partition is weakly regular.
Alternatively, we keep going, but, if we increase the mutual information by eps at each step, we cannot go for more than 1/eps steps. So there must be a k < 1/eps and 2k functions f1,...,fk,g1,...,gk such that, for all functions f,g, if we know the 4k bits f1(X),f1(Y),...,gk(X),gk(Y), then we gain less than another eps bits of information about E if we are also told f(X) and g(Y). Someone familiar with inequalities about entropy (this is the step I am missing) should be able to infer that if we partition V into the 22k subsets corresponding to all possible values of fi and gi, such a partition must be eps'-weakly regular with eps'=poly(eps). (Probably about eps1/2.)
Maybe this is not really three lines (in fact it is possibly more complicated than the "variance" argument), but I feel that one can see what is going on much better than in the "variance" argument.