Saturday, September 09, 2006

Entropy and the weak Regularity Lemma

If you remember my discussion of Szemeredi's proof of the Szemeredi Regularity Lemma, the proof goes by progressively refining an initial arbitrary partition until one gets an eps-regular partition. A potential function is used to prove that the number of refinement steps is at most poly(1/eps). I said the potential function measures "variance" or "disorder" in the densities of edges between blocks of the partition. I used this terminology because I was aware of an alternate proof by Tao where the Regularity Lemma is reformulated as a statement about random variables and the proof uses conditional entropy (but not in a straightforward way) as a potential function. Tao's proof gives a stronger version of the Regularity Lemma, similar to another strong version that was proved by Alon, Fischer, Krivelevich and Szegedy, motivated by applications to property testing.

I still cannot understand the intuition of the stronger statement, the role of the parameter "F" in Tao's proof, and the role of the "fine" partition. The probabilistic interpretation, however, leads to an amusing three-line proof of the weak Regularity Lemma. (As we shall see, unfortunately, one of the three lines is too hard for me to figure out.)

Before stating the weak Regularity Lemma, let us define the notion of weakly regular partition. Let G=(V,E) be a graph, (V1,...,Vk) be a partition of the vertices, and for any two blocks of the partition define the density
d(Vi,Vj) := e(Vi,Vj) / |Vi| * |Vj|
where e(Vi,Vj) is the number of edges between Vi and Vj; it is also convenient to have the notation
d(Vi,Vi) := 2*e(Vi) / |Vi|2
where e(Vi) is the number of edges within Vi.

We say that the partition (V1,...,Vk) is weakly eps-regular if for every two disjoint sets S,T of vertices we have that the quantity
sum{i,j} |V_i intersection S| * |V_j intersection T| * d(Vi,Vj)

approximates e(S,T) up to an additive error of at most eps |V|2.

Finally we can state the weak Regularity Lemma
Weak Szemeredi Regularity Lemma. For every eps there is a c(eps) such that every graph G=(V,E) admits a weakly eps-regular partition (V1,...,Vk) where k=2{poly(1/eps)}.

Note that the number of elements of the partition is singly exponential in 1/eps, instead of being a tower of exponentials. This is a considerably improvement, but the weaker property of the partition is not suitable for many applications, including to property testing and to additive combinatorics.

The observant reader should see how to modify the original proof to fit this new setting. (Start from an arbitrary partition, if it is not good, use the sets S and T to refine, and so on; now each refinement step increases the size of the partition only by a constant factor, and, as before, we only have poly(1/eps) refinements.)

What about the information-theoretic proof? It will be a simplification of Tao's proof to the case of the weak Regularity Lemma. Define the following random variables: X and Y are two independent and uniformly distributed vertices, and E is the 0/1 indicator of the event that (X,Y) is an edge.

We use the notation H(Z) for the entropy of a random variable Z, H(Z|W) as the entropy of Z conditioned on W, I(Z,W):= H(Z)-H(Z|W) for the mutual information of Z and W and I(Z,W|R):= H(Z|R)-H(Z|R,W) for the mutual information of Z and W conditioned on R.

Finally, we come to the proof. We ask the question: Are there boolean functions f1,g1:V -> {0,1} such that I(E, (f1(X),g1(Y))) > eps? That is, after someone picks X and Y at random, is there one bit of information we can ask about X and one bit of information we ask about Y that helps us guess whether there is an edge between X and Y? Equivalently, are there sets S and T such that knowing whether X is in S and whether Y is in T helps us guess whether there is an edge between X and Y? If not, it can be checked that the number of edges between any two sets S and T essentially depends only on the size of S and T, and V itself is a weakly poly(eps)-regular partition. (It is a "partition" with only one block.)

Otherwise, we ask whether there are functions f2,g2 such that
I(E,(f1(X),g1(X),f1(Y),g1(Y),f2(X),g2(Y))> 2eps

If not, then for every two boolean functions f2,g2, once we know f1(X),g1(X),f1(Y),g1(Y), it does not help to know f2(X) and g2(Y) in order to guess whether (X,Y) is an edge. This means that if we partition V into 4 blocks (corresponding to possible values of f1,g1), then for every two sets S,T (corresponding to f2,g2), the number of edges between S and T essentially only depends on the sizes of the intersections of S and T with the four blocks of the partition. Hence, the partition is weakly regular.

Alternatively, we keep going, but, if we increase the mutual information by eps at each step, we cannot go for more than 1/eps steps. So there must be a k < 1/eps and 2k functions f1,...,fk,g1,...,gk such that, for all functions f,g, if we know the 4k bits f1(X),f1(Y),...,gk(X),gk(Y), then we gain less than another eps bits of information about E if we are also told f(X) and g(Y). Someone familiar with inequalities about entropy (this is the step I am missing) should be able to infer that if we partition V into the 22k subsets corresponding to all possible values of fi and gi, such a partition must be eps'-weakly regular with eps'=poly(eps). (Probably about eps1/2.)

Maybe this is not really three lines (in fact it is possibly more complicated than the "variance" argument), but I feel that one can see what is going on much better than in the "variance" argument.

Road trip!

On Wednesday, I took my legally polluting car to a trip to Los Angeles. Driving from San Francisco to Los Angeles involves mostly driving on highway 5, or, more or less, going straight for a few hundred miles. One starts from San Francisco, with its 65 degrees in the summer, its environmentalist car mechanics, and its dramatic hills, and suddenly one finds himself into a plain desert (not desert as in having few people, but desert as in sandy desert) with 105 degrees. Further ahead, highway 5 cuts through more fertile, but still very hot, rural areas. It is impossible to understand California politics, the election of Reagan and Schwarzenegger as governors, the passing of Proposition 22 and Proposition 209, the 45% of votes for Bush in 2004, without realizing that there are a lot of Californians who do not live in big cities or in University towns, and who do not hate our freedoms.

This may not be a novel observation, but Los Angeles is big. In San Francisco, and certainly in Manhattan, the size of a neighborhood is defined, in part, by walking distance. If you cannot walk between two places, then it would not occur to consider them "close." Here an apartment advertised as being close to the beach, say, may easily be farther from the beach than the Mission is far the Marina (these are two neighborhoods in San Francisco that are antipodal in character and quite geographically far as well), or even farther than San Francisco is from Oakland. So far I have driven just between Santa Monica, Westwood, Beverly Hills, West Hollywood and Fairfax, already an exceedingly vast area, and, on an LA map, this is all but a sliver of the city. (In fact, these are all places that are "close" to each other.)

I have decided to live in Santa Monica, near the 3rd Street Promenade, a stretch of a few blocks that is closed to traffic and is pedestrian-only. Pedestrians? In LA? I was surprised too, but I am getting the impression that it's all tourists. By the way, in Santa Monica there is a group of five streets named after famous universities: there is Yale, Harvard, Princeton, Stanford and Berkeley. No MIT.

Tuesday, September 05, 2006

Worst invention ever

In California, all cars must periodically submit to a "smog check" to verify that the catalytic converter works and that emissions are within a certain range.

Today I bring my car in for the check. When I pick it up, I ask "Is the car ok? Does it pollute?" "Of course it pollutes," says the car technician guy who just charged me \$142.99 for ten minutes of work, "This [the car] is the worst invention ever made. It's destroying our planet."