in theory

in theory moves

2008-02-16T23:15:00.000-08:00

We ring in the year of the rat with a move to wordpress, and to its superior handling of latex.

Please update your bookmarks, your RSS readers, and your blogrolls, to

http://lucatrevisan.wordpress.com/

While all old posts and comments are there, the move has broken the latex hacks, the videos, and the cross-links between posts. This will be taken care of in the "near" future.

恭喜发财!

2008-02-06T16:53:00.000-08:00

Overheard in San Francisco

2008-01-27T23:23:00.000-08:00

Young Homeless Guy is sitting on the floor with a cardboard sign. Another guy walks by, holding what look like large leftover bags from a restaurant.

Guy With Bags: [stops and offers the bags] would you like something to eat?
Young Homeless Guy: is there garlic or avocado in it?
GWB: I don't think so, why?
YHG: I am allergic to both. Especially avocado: when I eat it, my throat gets all scratchy.

An Unusual Recruiting Pitch

2008-01-26T18:56:00.000-08:00

Women in their sophomore or junior year of college who are thinking about doing research and going to graduate school should read this article (via Andrew Sullivan). Living the life of the mind is very rewarding, and, apparently, the chances of dating male models are not bad either. (If the author could get some mileage out of being an undergrad at Harvard, just imagine what it can do for you to be a grad student at Berkeley!)

Finally!

2008-01-24T17:58:00.000-08:00

After a hiatus of almost four year, the graduate computational complexity course returns to Berkeley.

To get started, I proved Cook's non-deterministic hierarchy theorem, a 1970s result with a beautifully clever proof, which I first learned from Sanjeev Arora. (And that is not very well known.)

Though the full result is more general, say we want to prove that there is a language in NP that cannot be solved by non-deterministic Turing machines in time $o(n^3)$.

(If one does not want to talk about non-deterministic Turing machines, the same proof will apply to other quantitative restrictions on NP, such as bounding the length of the witness and the running time of the verification.)

In the deterministic case, where we want to find a language in P not solvable in time $o(n^3)$, it's very simple. We define the language $L$ that contains all pairs $(\langle T\rangle,x)$ where: (i) $T$ is a Turing machine, (ii) $x$ is a binary string, (iii) $T$ rejects the input $(\langle T\rangle,x)$ within $|(\langle T\rangle,x)|^3$ steps, where $|z|$ denotes the length of a string $z$.

It's easy to see that $L$ is in P, and it is also easy to see that if a machine $M$ could decide this problem in time $\leq n^3$ on all sufficiently large inputs, then the behavior of $M$ on input $\langle M\rangle,x$, for every $x$ long enough, leads to a contradiction.

We could try the same with NP, and define $L$ to contain pairs $(\langle T\rangle,x)$ such that $T$ is a non-deterministic Turing machine that has no accepting path of length $\leq |\langle T\rangle,x|^3$ on input $(\langle T\rangle,x)$. It would be easy to see that $L$ cannot be solved non-deterministically in time $o(n^3)$, but it's hopeless to prove that $L$ is in NP, because in order to solve $L$ we need to decide whether a given non-deterministic Turing machine rejects, which is, in general, a coNP-complete problem.

Here is Cook's argument. Define the function $f(k)$ as follows: $f(1):=2$, $f(k):= 2^{(1+f(k-1))^3}$. Hence, $f(k)$ is a tower of exponentials of height $k$. Now define the language $L$ as follows.

$L$ contains all pairs $\langleT \rangle,0^t$ where $\langle T\rangle$ is a non-deterministic Turing machine and $0^t$ is a sequence of $t$ zeroes such that one of the following conditions is satisfied

There is a $k$ such that $f(k)=t$, and $T$ has no accepting computation on input $\langle T\rangle,0^{1+f(k-1)}$ of running time $\leq (1+(f(k-1))^3$;
$t$ is not of the form $f(k)$ for any $k$, and $T$ has an accepting computation on input $\langle T\rangle,0^{1+t}$ of running time $\leq (t+1)^3$.

Now let's see that $L$ is in NP. When we are given an input $\langle T\rangle,0^t$ we can first check if there is a $k$ such that $f(k)=t$.

If there is, we can compute $t':=f(k-1)$ and deterministically simulate all computations of $T$ on inputs $\langle T\rangle,0^{t'}$ up to running time $t'^3$. This takes time $2^{O(t'^3)}$ which is polynomial in $t$.
Otherwise, we non-deterministically simulate $T$ on input $\langle T\rangle,0^{t+1}$ for up to $(t+1)^3$ steps. (And reject after time-out.)

In either case, we are correctly deciding the language.

Finally, suppose that $L$ could be decided by a non-deterministic Turing machine $M$ running in time $o(n^3)$. In particular, for all sufficiently large $t$, the machine runs in time $\leq t^3$ on input $\langle M\rangle,0^t$.

Choose $k$ to be sufficiently large so that for every $t$ in the interval $1+f(k-1),...,f(k)$ the above property is true.

Now we can see that $M$ accepts $(\langle M\rangle,0^{f(k-1)+1})$ if and only if $M$ accepts $(\langle M\rangle,0^{f(k-1)+2})$ if and only if ... if and only if $M$ accepts $(\langle M\rangle,0^{f(k)})$ if and only if $M$ rejects $(\langle M\rangle,0^{f(k-1)+1})$, and we have our contradiction.

Please, no pigs in the subway

2008-01-19T01:46:00.000-08:00

And that includes you!

I could not figure out what's the item on the bottom left.

Incidentally, the recent spike in the price of pork was a major news item.

Mmmm... Dangerously Delicious...

2008-01-13T06:29:00.001-08:00

Pseudorandomness for Polynomials

2008-01-10T22:29:00.000-08:00

I am currently in Hong Kong for my second annual winter break visit to the Chinese University of Hong Kong. If you are around, come to CUHK on Tuesday afternoon for a series of back-to-back talks by Andrej Bogdanov and me.

First, I'd like to link to this article by Gloria Steinem. (It's old but I have been behind with my reading.) I believe this presidential campaign will bring up serious reflections on issues of gender and race, and I look forward to the rest of it.

Secondly, I'd like to talk about pseudorandomness against low-degree polynomials.

Naor and Naor constructed in 1990 a pseudorandom generator whose output is pseudorandom against tests that compute affine functions in $F_2$. Their construction maps a seed of length $O(\log n /\epsilon)$ into an $n$-bit string in $F_2^n$ such that if $L: F_2^n \to F_2$ is an arbitrary affine function, $X$ is the distribution of outputs of the generator, and $U$ is the uniform distribution over $F_2^n$, we have

(1) $ | Pr [ L(X)=1] - Pr [ L(U)=1] | \leq \epsilon $

This has numerous applications, and it is related to other problems. For example, if $C\subseteq F_2^m$ is a linear error-correcting code with $2^k$ codewords, and if it is such that any two codewords differ in at least a $\frac 12 - \epsilon$ fraction of coordinates, and in at most a $\frac 12 + \epsilon$ fraction, then one can derive from the code a Naor-Naor generator mapping a seed of length $\log m$ into an output of length $k$. (It is a very interesting exercise to figure out how.) Here is another connection: Let $S$ be the (multi)set of outputs of a Naor-Naor generator over all possible seeds, and consider the Cayley graph constructed over the additive group of $F_2^n$ using $S$ as a set of generators. (That is, take the graph that has a vertex for every element of $\{0,1\}^n$, and edge between $u$ and $u+s$ for every $s\in S$, where operations are mod 2 and componentwise.) Then this graph is an expander: the largest eigenvalue is $|S|$, the degree, and all other eigenvalues are at most $\epsilon |S|$ in absolute value. (Here too it's worth figuring out the details by oneself. The hint is that in a Cayley graph the eigenvectors are always the characters, regardless of what generators are chosen.) In turn this means that if we pick $X$ uniformly and $Y$ according to a Naor-Naor distribution, and if $A\subseteq F_2^n$ is a reasonably large set, then the events $X\in A$ and $X+Y \in A$ are nearly independent. This wouldn't be easy to argue directly from the definition (1), and it is an example of the advantages of this connection.

There is more. If $f: \{0,1\}^n \rightarrow \{0,1\}$ is such that the sum of the absolute values of the Fourier coefficients is $t$, $X$ is a Naor-Naor distribution, and $U$ is uniform, we have
$ | Pr [ f(X)=1] - Pr [ f(U)=1] | \leq t \epsilon |$
and so a Naor-Naor distribution is pseudorandom against $f$ too, if $t$ is not too large. This has a number of applications: Naor-Naor distribution are pseudorandom against tests that look only at a bounded number of bits, it is pseudorandom against functions computable by read-once branching programs of width 2, and so on.

Given all these wonderful properties, it is natural to ask whether we can construct generators that are pseudorandom against quadratic polynomials over $F_2^n$, and, in general, low-degree polynomials. This question has been open for a long time. Luby, Velickovic, and Wigderson constructed such a generator with seed length $2^{(\log n)^{1/2}}$, using the Nisan-Wigderson methodology, and this was not improved upon for more than ten years.

When dealing with polynomials, several difficulties arise that are not present when dealing with linear functions. One is the correspondence between pseudorandomness against linear functions and Fourier analysis; until the development of Gowers uniformity there was no analogous analytical tool to reason about pseudorandomness against polynomials (and even Gowers uniformity is unsuitable to reason about very small sets). Another difference is that, in Equation (1), we know that $Pr [L(U)=1] = \frac 12$, except for the constant function (against which, pseudorandomness is trivial). This means that in order to prove (1) it suffices to show that $Pr[L(X)=1] \approx \frac 12$ for every non-constant $L$. When we deal with a quadratic polynomial $p$, the value $Pr [p(U)=1]$ can be all over the place between $1/4$ and $3/4$ (for non-constant polynomials), and so we cannot simply prove that $Pr[p(X)=1]$ is close to a certain known value.

A first breakthrough with this problem came with the work of Bogdanov on the case of large fields. (Above I stated the problem for $F_2$, but it is well defined for every finite field.) I don't completely understand his paper, but one of the ideas is that if $p$ is an absolutely irreducible polynomial (meaning it does not factor even in the algebraic closure of $F$), then $p(U)$ is close to uniform over the field $F$; so to analyze his generator construction in this setting one "just" has to show that $p(X)$ is nearly uniform, where $X$ is the output of his generator. If $p$ factors then somehow one can analyze the construction "factor by factor," or something to this effect. This approach, however, is not promising for the case of small fields, where the absolutely irreducible polynomial $x_1 + x_2 x_3$ has noticeable bias.

The breakthrough for the boolean case came with the recent work of Bogdanov and Viola. Their starting point is the proof that if $X$ and $Y$ are two independent Naor-Naor generators, then $X+Y$ is pseudorandom for quadratic polynomials. To get around the unknown bias problem, they divide the analysis into two cases. First, it is known that, up to affine transformations, a quadratic polynomial can be written as $x_1x_2 + x_3x_4 + \cdots + x_{k-1} x_k$, so, since applying an affine transformation to a Naor-Naor generator gives a Naor-Naor generator, we may assume our polynomial is in this form.

Case 1: if $k$ is small, then the polynomial depends on few variables, and so even just one Naor-Naor distribution is going to be pseudorandom against it;
Case 2: if $k$ is large, then the polynomial has very low bias, that is, $Pr[p(U)] \approx \frac 12$. This means that it is enough to prove that $Pr[p(X+Y)] \approx \frac 12$, which can be done using (i) Cauchy-Schwartz, (ii) the fact that $U$ and $U+X$ are nearly independent if $U$ is uniform and $X$ is Naor-Naor, and (iii) the fact that for fixed $x$ the function $y \rightarrow p(x+y) - p(x)$ is linear.

Now, it would be nice if every degree-3 polynomial could be written, up to affine transformations, as $x_1x_2 x_3 + x_4x_5x_6 + \cdots$, but there is no such characterization, so one has to find the right way to generalize the argument.

In the Bogdanov-Viola paper, they prove

Case 1: if $p$ of degree $d$ is correlated with a degree $d-1$ polynomial, and if $R$ is a distribution that is pseudorandom against degree $d-1$ polynomials, then $R$ is also pseudorandom against $p$;
Case 2: if $p$ of degree $d$ has small Gowers uniformity norm of dimension $d$, then $Pr [p(U)=1] \approx \frac 12$, which was known, and if $R$ is pseudorandom for degree $d-1$ and $X$ is a Naor-Naor distribution, then $Pr[p(R+X)=1] \approx \frac 12$ too.

There is a gap between the two cases, because Case 1 requires correlation with a polynomial of degree $d-1$ and Case 2 requires small Gowers uniformity $U^d$. The Gowers norm inverse conjecture of Green Tao is that a noticeably large $U^d$ norm implies a noticeable correlation with a degree $d-1$ polynomial, and so it fills the gap. The conjecture was proved by Samorodnitsky for $d=3$ in the boolean case and for larger field and $d=3$ by Green and Tao. Assuming the conjecture, the two cases combine to give an inductive proof that if $X_1,\ldots X_d$ are $d$ independent Naor-Naor distributions then $X_1+\ldots+X_d$ is pseudorandom for every degree-$d$ polynomial.

Unfortunately, Green and Tao and Lovett, Meshulam, and Samorodnitsky prove that the Gowers inverse conjecture fails (as stated above) for $d\geq 4$ in the boolean case.

Lovett has given a different argument to prove that the sum of Naor-Naor generators is pseudorandom for low-degree polynomials. His analysis also breaks down in two cases, but the cases are defined based on the largest Fourier coefficient of the polynomial, rather than based on its Gowers uniformity. (Thus, his analysis does not differ from the Bogdanov-Viola analysis for quadratic polynomials, because the dimension-2 Gowers uniformity measures the largest Fourier coefficient, but it differs when $d\geq 3$.) Lovett's analysis only shows that $X_1 +\cdots + X_{2^{d-1}}$ is pseudorandom for degree-$d$ polynomials, where $X_1,\ldots,X_{2^{d-1}}$ are $2^{d-1}$ independent Naor-Naor generators, compared to the $d$ that would have sufficed in the conjectural analysis of Bogdanov and Viola.

The last word on this problem (for now) is this paper by Viola, where he shows that the sum of $d$ independent Naor-Naor generators is indeed pseudorandom for degree-$d$ polynomials.

Again, there is a case analysis, but this time the cases depend on whether or not $Pr [p(U)=1] \approx \frac 12$.

If $p(U)$ is noticeably biased (this corresponds to a small $k$ in the quadratic model case), then it follows from the previous Bogdanov-Viola analysis that a distribution that is pseudorandom against degree $d-1$ polynomials will also be pseudorandom against $p$.

The other case is when $p(U)$ is nearly unbiased, and we want to show
that $p(X_1+\ldots +X_d)$ is nearly unbiased. Note how weak is the assumption, compared to the assumption that $p$ has small dimension-$d$ Gowers norm (in Bogdanov-Viola) or that all Fourier coefficients of $p$ are small (in Lovett). The same three tools that work in the quadratic case, however, work here too, in a surprisingly short proof.

Don Knuth is 70

2008-01-10T01:00:00.000-08:00

Alonzo Church and Alan Turing imagined programming languages and computing machines, and studied their limitations, in the 1930s; computers started appearing in the 1940s; but it took until the 1960s for computer science to become its own discipline, and to provide a common place for the logicians, combinatorialists, electrical engineers, operations researchers, and others, who had been studying the uses and limitations of computers. That was a time when giants were roaming the Earth, and when results that we now see as timeless classics were discovered.

Don Knuth is one of the most revered of the great researchers of that time. A sort of pop-culture icon to a certain geek set (see for example these two xkcd comics here and here, and this story). Beyond his monumental accomplishments, his eccentricities, and humor are the stuff of legends. (Like, say, the fact that he does not use email, or how he optmized the layout of his kitchen.)

As a member of a community whose life is punctuated by twice-yearly conferences, what I find most inspiring about Knuth is his dedication to perfection, whatever time it might take to achieve it.

As the well known story goes, more than forty years ago Knuth was asked to write a book about compilers. As initial drafts started to run into the thousands of pages, it was decided the "book" would become a seven-volume series, The Art of Computer Programming, the first three of which appeared between 1968 and 1973. An unparalleled in-depth treatment of algorithms and data structures, the books defined the field of analysis of algorithms.

At this point Knuth became frustrated with the quality of electronic typesetting systems, and decided he had to take matters in his own hands. In 1977 he started working on what would become TeX and METAFONT, a development that was completed only in 1989. Starting from scratch, he created a complete document preparation system (TeX) which became the universal standard for writing documents with mathematical content, along the way devising new algorithms for formatting paragraphs of texts. To generate the fonts to go with it, he created METAFONT, which is a system that converts a geometric description of a character into a bit-map representation usable by TeX. (New algorithmic work arose from METAFONT too.) And since he was not satisfied with the existing tools available to write a large program involving several non-trivial algorithms, he came up with the notion of "literate programming" and wrote an environment to support it. It is really too bad that he was satisfied enough with the operating system he was using.

One now takes TeX for granted, but try to imagine a world without it. One shudders at the thought. We would probably be writing scientific articles in Word, and I would have probably spent the last month reading STOC submissions written in Comic Sans.

Knuth has made mathematical exposition his life work. We may never see again a work of the breadth, ambition, and success of The Art of Computer Programming, but as theoretical computer science broadens and deepens, it is vital that each generation cherishes the work of accumulating, elaborating, systematizing and synthesizing knowledge, so that we may preserve the unity of our field.

Don Knuth turns 70 tomorrow. I would send him my best wishes by email, but that wouldn't work...

[This post is part of a "blogfest" conceived and coordinated by Jeff Shallit, with posts by Jeff, Scott Aaronson, Mark Chu-Carroll, David Eppstein, Bill Gasarch, Suresh Venkatasubramanian, and Doron Zeilberger.]

Math is for boys, but not in Italy

2007-12-29T12:35:00.000-08:00

Why are women so under-represented in computer science research in the United States? And what can we do about it?

The conventional wisdom is that most of the damage is done in kindergarten or earlier, when parents teach their young sons to play chess, but not their young daughters, when a competitive and aggressive attitude is encouraged in boys and repressed in girls, and so on.

I do subscribe to this theory, but how do I reconcile it with the fact that, as observed by Luca Aceto, women are well represented in the Italian computer science academia? It's not like Italy is a post-gendered feminist utopia, after all.

As someone who has not lived in Italy in 11 years, and who has no training in social sciences, I'd like to offer my uninformed opinions.

For starters, although Italian society can appear shockingly sexist to one used to American political correctness, in practice things are more complex. I have heard Italian women in position of authority complain that they are not treated with the same respect as their male colleagues (an issue that is not very critical in hierarchy-free academia), but I have rarely, if ever, heard an Italian woman say that men are afraid of highly educated, smart women, an issue that seems to come up a lot here in the US. That is, although it may not be considered "feminine" in Italy to be a manager, it is ok to be smart and have a PhD (to the extent that people have any idea what a PhD is).

I'd like my people to take credit for this, but there is actually a "darker" side to this attitude. In Italy, academic research is chocked by a perennial funding crisis. Salaries are very low, and promotions are slow and unpredictable, because of frequent hiring freezes. It is common for a prospective academic to be in his or her mid-30s and still not be in the equivalent of a tenure-track position.

And so, I suspect, academia is something of a "woman's job," because it is ok for a woman to be in a career that is uncertain and does not pay well, but that moves on slowly, allows for maternity leaves, and is personally fulfilling. It is a bit like being an artist, or a writer. A man, however, has to provide for the family and so this is not so good for him.

My spaghetti-sociology may be completely off, but I think it's possible that the representation of women in computer science (and math) in Italy is indeed happening for all the wrong reasons. (A case of two wrongs making a right.)

If I am right, what lessons could we take about attracting more talented women to math, science and engineering in the short term, without having to wait for the revolution to come and for gender roles to be abolished or at least more fairly re-shuffled? Decreasing salaries and abolishing tenure could work, but I would rather not advocate such steps. Some of the proposals that have been around for a while, however, seem entirely reasonable: make the tenure clock more flexible, allow for longer parental leaves, and recognize that the current system, which puts a lot of pressure on people when they are in their late 20s to mid-30s puts a great strain on people who want to have, and actively rear, children before they are in their late 30s. (And that, in the current pre-revolutionary times, this is a concern that hits women disproportionately more than man.) In addition, whatever can be done to decrease a perception of math, science and engineering as "boys' subjects" should be done. I understand that CMU's spectacularly successful initiative to increase women's representation in undergraduate computer science education started from a similar, if more sophisticated, premise.

The Princeton Workshop on Women in Theory

2007-12-20T07:09:00.000-08:00

[I'd like to pass along the following announcement from Tal Rabin. Spread the word. -L.]

We will have a "Women in Theory" student workshop in Princeton on June 14-18, 2008. The goal is to have a great technical program and a chance for the (far too few) women in TCS to get together. Female graduate students are encouraged to apply - we also have a few slots for outstanding CS/math undergraduates and may be able to offer travel support. See http://www.cs.princeton.edu/theory/index.php/Main/WIT08
for more details and list of confirmed speakers.

Italian Professors to Blockade Highways Next Year

2007-12-20T06:52:00.000-08:00

It's never a good sign when the New York Times has an article about Italy. Though they rarely get as bad as the one about the Lady Chatterly of Calitri, there is always a sense that one would get more acute social analysis from a Lonely Planet guide.

Last week's article by Ian Fisher on the Italian malaise was not bad. It starts, inauspiciously, with "[Italy] is the place [...] where people still debate [...] what, really, the red in a stoplight might mean," while, ever since the point system for driver's licenses was introduced, everybody stops at red lights. It is what a stop sign means at an intersection which is a matter of debate. (The debate being on whether or not one should slow down before cutting into incoming traffic.) But the rest of the article competently reports on a series of worrying signs about Italian society, economy, and politics.

In an embarrassing show of provincialism, this has been enough to create the mother of all media storms. For the past seven days, talk shows, newspapers, politicians, and "intellectuals" have done little more than discuss and rebut what "The New York Times Says" about Italy's supposed funk.

I do get myself into a funk when I come to Italy and read newspapers every day. Most of the stories, apart from the one about What The New York Times Says, are too complicated for me to try and summarize, but there is one that has great symbolic value. For several days last week, truck drivers have been on strike, have blocked highways and stopped delivery of gasoline and some food items. In the last round of shuffle of the budget law before it was to be voted by the House (which, amusingly, is called the Room of Representative in Italy) and the Senate, the government added 30 million euros for provisions that benefited truck drivers. This, and a few last-minute other expenses, where compensated by a series of cuts. Research and universities lost 90 million euros. This despite the fact that the Italian government signed a European agreement that sets for all states a goal of spending 3% of their GDP on universities and research, and Italy is currently spending around 1%. This is why, next year, Italian professors should take to highways on their scooters and do a blockade.

On the positive side, the European Research Council has started operations. This is an NSF-like grant-making institution that is going through its first round of funding this year. Italy, at the time of Berlusconi's government, was one of the states who opposed the creation of the ERC, on the grounds that, if I may rephrase, the ERC was going to take money from member states and assign it on the basis of quality, which is something for which the Italian government would not stand. Italian researchers, meanwhile, did very well on this first round of funding, showing that despite all the best efforts of governments of both political sides, quality has not yet been eradicated from Italian universities.

New York in Grainy Pictures

2007-12-14T13:28:00.000-08:00

My three months in New York are over, so no more xiaolongbao at Yeah Shanghai, long rides on the New Jersey Transit trains, seminars on additive combinatorics, and hot pot at Minni's Shabu Shabu for me.

The other night, the traffic information board on 110th and Amsterdam was saying "CCCOOORRR... FFFFFF... AAAUUUU" which is how I too felt about the weather.

This is probably meant to lure in Italian tourists

The Apple store on 5th avenue open at 2:40am (and through the night), because it's never too late (or too early) to buy an iPhone

In the Canal stop of the N-Q-R-W. There were no signs in other languages.

I agree, it's good.

Happy Belated Birthday!

2007-12-11T18:33:00.000-08:00

I just discovered, via CNN, that the Commodore 64 turned 25 last summer.

I received a Commodore 64 as a much appreciated gift for my confirmation, when I was in my first year of high school (9th grade). It was named after its then remarkable 64kB of memory; its operating system and its Basic interpreter fit into an additional 32kB of ROM. It had a graphic and a music processor that were not bad for the time, and it was endlessly fun to play with it. Its Basic language had instructions to read (peek) and write (poke) directly onto memory locations, and this was how pretty much everything was done. To draw on the screen one would set a certain bit of a certain memory location to a certain value to switch to a graphic mode, and then one would directly write on the bitmap. Similarly one could play a note on a certain channel, with a certain waveform for a certain amount of time by setting certain other memory locations. In 6th to 8th grade (middle school) we studied music, which consisted in part of learning how to play a recorder. The least said about my recorder-playing skills the better, but I left 8th grade with a stack of very simplified music scores of famous songs, which I then proceeded to translate into the numerical codes required by the C=64 music card so that I could make it play the songs. I also amused myself with more complicated projects, usually involving the drawing of 3-dimensional objects on the screen.

People that have met me later in life may be surprised to learn that I spent long hours programming for fun. Not that I need to be defensive or anything, and I certainly did not know so then, but, at the time, programming, even in the very basic Basic that came with the computer, was the closest thing I could do to math. Certainly, it was much closer than the "math" I was getting in school, which consisted in learning how to run certain numerical and algebraic algorithms by hand. Indeed I don't think I encountered anything closer to math than programming until the first year of college, when the whole notion of axioms, theorems, proofs, and ``playing a game with meaningless symbols'' was unloaded on me in a course innocuously termed ``Geometry.'' (Nominally a course on linear algebra, the course was a parody of Bourbakism as a teaching style. In the first class the professor came in and said, a vector space is a set with two operations that satisfy the following nine axioms. Now I should like to prove the following proposition... I am not joking when I say that the fact that the elements of a $k$-dimensional vector space are $k$-tuples of numbers came as a revelation near the very end of the course.)

The fact that the ``type'' of a program is similar to a statement and the
implementation of a program is similar to a proof should be familiar to anybody who has written both. In both cases, one needs to break down an idea into basic steps, be very precise about how each step is realized, if a sequence of steps is repeated twice in a similar way one should abstract the similarity, write the abstracted version separately, and then use the abstracted version twice, and so on. The Curry-Howard isomorphism establishes this connection in a formal way, between a certain way of writing proof (say, Gentzen proof system with no cut in intuitionistic logic) and a certain way of writing programs (say, typed $\lambda$-calculus). I know because I once took a course based on the totally awesome book Proofs and Types by Girard, which is out of print but available for free on the web.

But we were talking about the Commodore 64. There was something amazing about a functional computer with an operating system fitting into a few kilobytes, and many people could understand it inside out. One could buy magazines that were in good part devoted to Basic programs that one could copy, typically video games. Naturally, one would then be able to change the game and to see what a reasonably non-trivial program would look like. The operating system I am using now has a source code that is probably millions of lines long, there is probably no person that has a complete understanding of it, and it sometimes does mysterious things. It is also able to handle more than one program running at a time. It was fun to turn on a computer, instantly get a prompt, type 10 PRINT "HELLO WORLD" and then RUN, while now one has to do this. Of course riding a bike is simpler than driving a car which is simpler than piloting an airplane, but they have different ranges.

Under the Curry-Howard isomorphism, programming in the modern sense is more like Algebraic Geometry. One has to spend a lot of time learning how to use an expansive set of libraries, and in one's lifetime it would be impossible to reconstruct how everything works from first principles, but then one has really powerful tools. I prefer the hands-on ethos of Combinatorics, where the big results are not general theorems, but rather principles, or ways of doing things, that one learns by reading other people's papers, and replicating their arguments to apply them to new settings, changing them as needed.

And before I get distracted once more away from what is nominally the subject of this post, happy birthday to the Commodore 64 and to whoever is turning 30 tomorrow.

Time to go back to California?

2007-12-02T14:06:00.000-08:00

Why Mathematics?

2007-11-28T16:10:00.000-08:00

Terry Tao points to a beautiful article written by Michael Harris for the Princeton Companion to Mathematics, titled Why Mathematics, You Might Ask.

The titular question is the point of departure for a fascinating discussion on the foundations of mathematics, on the philosophy of mathematics, on post-modernism, on the "anthropology" approach to social science studies of mathematics, and on what mathematicians think they are doing, and why.

In general, I find articles on philosophical issues in mathematics to be more readable and enlightening when written by mathematicians. Perhaps it's just that they lack the sophistication of the working philosopher, a sophistication which I mistake for unreadability. But I also find that mathematicians tend to bring up issues that matter more to me.

For example, the metaphysical discussions on the "reality" of mathematical objects and the "truth" of theorems are all well and good, but the really interesting questions seem to be different ones.

The formalist view of mathematics, for example, according to which mathematics is the derivation of theorems from axioms via formal proofs, or as Hilbert apparently put it, "a game played according to certain simple rules with meaningless marks on paper," does not begin to capture what mathematics, just as "writing one sentence after another" does not capture what poetry is. (The analogy is due to Giancarlo Rota.) Indeed one of the main fallacies that follow by taking the extreme formalist position as anything more than a self-deprecating joke is to consider mathematical work as tautological. That is, to see a mathematical theorem as implicit in the axioms and so its proof as not a discovery. (Some of the comments in this thread relate to this point.) Plus, the view does not account for the difference between "recreational" mathematics and "real" mathematics, a difference that I don't find it easy to explain in a few words, probably because I don't have a coherent view of what mathematics really is.

It's not quite related, but I am reminded of a conversation I had a long time ago with Professor X about faculty candidate Y.

[Not an actual transcript, but close enough]

X: so what do you think of theory candidate Y?
Me: he is not a theory candidate.
X: but his results have no conceivable application.
Me: there is more to doing theory than proving useless theorems.
X: that's interesting! Tell me more

I enjoyed Harris's suggestion that "ideas" are the basic units of mathematical work, and his semi-serious discussion of whether ideas "exist" and on their importance.

There are indeed a number of philosophical questions about mathematics that I think are extremely interesting and do not seem to figure prominently in the social studies of mathematics.

For example, and totally randomly:

When are two proofs essentially the same, and when are they genuinely different?
What makes a problem interesting? What is the role of connections in this determination?
What makes a theorem deep?
What does it mean when mathematicians say that a certain proof explains something, or when they say that it does not?

Terminology

2007-11-27T20:51:00.000-08:00

Different communities have different traditions for terminology. Mathematicians appropriate common words, like ring, field, scheme, ideal,... and the modern usage of the term bears no connection with the everyday meaning of the word. Physicists have a lot of fun with their sparticles and their strangeness and charm and so on. Theoretical computer scientists, like the military, and NASA, prefer acronyms.

We have some isolated examples of felicitous naming. Expander, for example, is great: it sounds right and it is suggestive of the technical meaning. Extractor is my favorite, combining a suggestion of the meaning with a vaguely threatening sound. I think it's too bad that seedless extractor has come to pass, because it evokes some kind of device to get grape juice. (I was on the losing side that supported deterministic extractor.)

Unfortunate namings are of course more common. Not only is the complexity class PP embarrassing to pronounce, but its name, derived from Probabilistic Polynomial time, is a poor description of it. By analogy with #P and $\oplus$P, it should be called MajP.

I heard the story of a famous (and famously argumentative) computer scientist complaining to one of the authors of the PCP theorem about the term PCP, which stands for Probabilistically Checkable Proof. "I too can define a probabilistic checker for SAT certificates," he supposedly said, "with probability half check the certificate, with probability half accept without looking at it." The point being that the terminology emphasizes a shortcoming of the construction (the probabilistic verification) instead of the revolutionary feature (the constant query complexity). Personally, I would prefer Locally Testable Proof.

Of course we will never change the name of PP or PCP, and the seedless extractors are here to stay, but there is one terminology change for which I'd like to start a campaign.

Naor and Naor constructed in 1990 a pseudorandom generator whose output is secure against linear tests. They called such a generator $\epsilon$-biased if the distinguishing probability of every linear test is at most $\epsilon$. Such generators have proved to be extremely useful in a variety of applications, most recently in the Bogdanov-Viola construction of pseudorandom generators again degree-2 polynomials.

Shall we start calling such generators $\epsilon$-unbiased? Seeing as it is the near lack of bias, rather than its presence, which is the defining feature of such generators?

(I know the reason for the Naor-Naor terminology: zero-bias generator makes perfect sense, while zero-unbiased makes no sense. But how about the fact that it is technically correct to say that the uniform distribution is $\frac {1}{10}$-biased?)

[Update: earlier posts on the same topic here and here]

Impagliazzo Hard-Core Sets via "Finitary Ergodic-Theory"

2007-11-12T12:53:00.000-08:00

In the Impagliazzo hard-core set theorem we are a given a function $g:\{ 0, 1 \}^n \rightarrow \{ 0,1\}$ such that every algorithm in a certain class makes errors at least a $\delta$ fraction of the times when given a random input. We think of $\delta$ as small, and so of $g$ as exhibiting a weak form of average-case complexity. We want to find a large set $H\subseteq \{ 0,1 \}^n$ such that $g$ is average-case hard in a stronger sense when restricted to $H$. This stronger form of average-case complexity will be that no efficient algorithm can make noticeably fewer errors while computing $g$ on $H$ than a trivial algorithm that always outputs the same value regardless of the input. The formal statement of what we are trying to do (see also the discussion in this previous post) is:

Impagliazzo Hard-Core Set Theorem, "Constructive Version"
Let $g:\{0,1\}^n \rightarrow \{0,1\}$ be a boolean function, $s$ be a size parameter, $\epsilon,\delta>0$ be given. Then there is a size parameter $s' = poly(1/\epsilon,1/\delta) \cdot s + exp(poly(1/\epsilon,1/\delta))$ such that the following happens.

Suppose that for every function $f:\{0,1\}^n \rightarrow \{0,1\}$ computable by a circuit of size $s'$ we have

$Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta$

Then there is a set $H$ such that: (i) $H$ is recognizable by circuits of size $\leq s'$; (ii) $|H| \geq \delta 2^n$, and in fact the number of $x$ in $H$ such that $g(x)=0$ is at least $\frac 12 \delta 2^n$, and so is the number of $x$ in $H$ such that $g(x)=1$; and (iii) for every $f$ computable by a circuit of size $\leq s$,

$Pr_{x\in H} [ g(x) = f(x) ] \leq max \{ Pr_{x\in H}[ g(x) = 0] , Pr_{x\in H} [g(x)=1] \} + \epsilon$

Our approach will be to look for a "regular partition" of $\{0,1\}^n$. We shall construct a partition $P= (B_1,\ldots,B_m)$ of $\{0,1\}^n$ such that: (i) given $x$, we can efficiently compute what is the block $B_i$ that $x$ belongs to; (ii) the number $m$ of blocks does not depend on $n$; (iii) $g$ restricted to most blocks $B_i$ behaves like a random function of the same density. (By "density" of a function we mean the fraction of inputs on which the function evaluates to one.)

In particular, we will use the following form of (iii): for almost all the blocks $B_i$, no algorithm has advantage more than $\epsilon$ over a constant predictor in computing $g$ in $B_i$.

Let $M_0$ be the union of all majority-0 blocks (that is, of blocks $B_i$ such that $g$ takes the value 0 on a majority of elements of $B_i$) and let $M_1$ be the union of all majority-1 blocks.

I want to claim that no algorithm can do noticeably better on $M_0$ than the constant algorithm that always outputs 0. Indeed, we know that within (almost) all of the blocks that compose $M_0$ no algorithm can do noticeably better than the always-0 algorithm, so this must be true for a stronger reason for the union. The same is true for $M_1$, with reference to the constant algorithm that always outputs 1. Also, if the partition is efficiently computable, then(in a non-uniform setting) $M_0$ and $M_1$ are efficiently recognizable. It remains to argue that either $M_0$ or $M_1$ is large and not completely unbalanced.

Recalling that we are in a non-uniform setting (where by "algorithms" we mean "circuits") and that the partition is efficiently computable, the following is a well defined efficient algorithm for attempting to compute $g$:

Algorithm. Local Majority
On input $x$:
determine the block $B_i$ that $x$ belongs to;
output $1$ if $Pr_{z\in B_i} [g(z)=1] \geq \frac 12$;
otherwise output $0$

(The majority values of $g$ in the various blocks are just a set of $m$ bits that can be hard-wired into the circuit.)

We assumed that every efficient algorithm must make at least a $\delta$ fraction of errors. The set of $\geq \delta 2^n$ inputs where the Local Majority algorithm makes mistakes is the union, over all blocks $B_i$, of the "minority inputs" of the block $B_i$. (If $b$ is the majority value of $g$ in a block $B$, then the "minority inputs" of $B$ are the set of inputs $x$ such that $g(x) = 1-b$.)

Let $E_0$ be the set of minority inputs (those where our algorithm makes a mistake) in $M_0$ and $E_1$ be the set of minority inputs in $M_1$. Then at least one of $E_0$ and $E_1$ must have size at least $\frac {\delta}{2} 2^n$, because the size of their union is at least $\delta 2^n$. If $E_b$ has size at least $\frac {\delta}{2} 2^n$, then $M_b$ has all the properties of the set $H$ we are looking for.

It remains to construct the partition. We describe an iterative process to construct it. We begin with the trivial partition $P = (B_1)$ where $B_1 = \{ 0,1\}^n$. At a generic step of the construction, we have a partition $P = (B_1,\ldots,B_m)$, and we consider $M_0, M_1,E_0,E_1$ as above. Let $b$ be such that $E_b \geq \frac 12 \delta 2^n$. If there is no algorithm that has noticeable advantage in computing $g$ over $M_b$, we are done. Otherwise, if there is such an algorithm $f$, we refine the partition by splitting each block according to the values that $f$ takes on the elements of the block.

After $k$ steps of this process, the partition has the following form: there are $k$ functions $f_1,\ldots,f_k$ and each of the (at most) $2^k$ blocks of the partition corresponds to a bit string $b_1,\ldots,b_k$ and it contains all inputs $x$ such that $f_1(x)=b_1,\ldots,f_k(x)=b_k$. In particular, the partition is efficiently computable.

We need to argue that this process terminates with $k=poly(1/\epsilon,1/\delta)$. To this end, we define a potential function that measures the "imbalance" of $g$ inside the blocks the partition

$\Psi(B_1,\ldots,B_m) := \sum_{i=1}^m \frac {|B_i|}{2^n} \left( Pr_{x\in B_i} [g(x) = 1] \right)^2 $

and we can show that this potential function increases by at least $poly(\epsilon,\delta)$ at each step of the iteration. Since the potential function can be at most 1, the bound on the number of iterations follows.

A reader familiar with the proof of the Szemeredi Regularity Lemma will recognize the main ideas of iterative partitioning, of using a "counterexample" to the regularity property required of the final partition to do a refinement step, and of using a potential function argument to bound the number of refinement steps.

In which way can we see them as "finitary ergodic theoretic" techniques? As somebody who does not know anything about ergodic theory, I may not be in an ideal position to answer this question. But this kind of difficulty has not stopped me before, so I may attempt to answer this question in a future post.

The December Issue of the Notices of the AMS

2007-11-09T13:53:00.000-08:00

The December issue of the Notices of the AMS is now available online, and it includes letters written by Oded Goldreich, Boaz Barak, Jonathan Katz, and Hugo Krawczyk in response to Neal Koblitz's article which appeared in the September issue.

Despite this, the readers of the Notices remain the losers in this "controversy." Koblitz's petty personal attacks and straw man arguments appeared in the same space that is usually reserved, in the Notices, for expository articles and obituaries of mathematicians. It is from those pages that I learned about the Kakeya problem and about the life of Grothendieck (who, I should clarify, is not dead, except perhaps in Erdos' use of the word).

I find it strange enough that Koblitz would submit his piece to such a venue, but I find it as mind-boggling that the editors would run his piece as if they had commissioned Grothendieck's biographical article to a disgruntled ex-lover, who would focus most of the article on fabricated claims about his personal hygiene.

I can only hope that the editors will soon run on those pages one or more expository articles on modern cryptography, not as rebuttals to Koblitz's piece (which has already been discussed more than enough), but as a service to the readers.

And while I am on the subject of Notices article, let me move on to this article on how to write papers.

All beginning graduate students find the process of doing research mystifying, and I do remember feeling that way. (Not that such feelings have changed much in the intervening years.) One begins with a sense of hopelessness, how am I going to solve a problem that people who know much more than I do and who are smarter than me have not been able to solve?; then a breakthrough comes, out of nowhere, and one wonders, how is this ever going to happen again? Finally it's time to write up the results, and mathematical proofs definitely don't write themselves, not to mention coherent and compelling introductory sections. I think it's great when more experienced scholars take time to write advice pieces that can help students navigate these difficulties. And the number of atrociously badly written papers in circulation suggests that such pieces are good not just for students, but for many other scholars as well.

But I find that advice on "how to publish," rather than "how to write well" (like advice on "how to get a job" rather than "how to do research") misses the point (I am thinking of one of the few times I thought Lance Fortnow gave bad advice). For this reason, I found the first section of the Notices article jarring, and the following line (even if it was meant as a joke) made me cringe

I have written more than 150 articles myself. (...) I have never written an article and then been unable to publish it.

I think that this calls for an Umeshism in response.

The Impagliazzo Hard-Core-Set Theorem

2007-11-06T13:45:00.001-08:00

The Impagliazzo hard-core set theorem is one of the bits of magic of complexity theory. Say you have a function $g:\{ 0, 1 \}^n \rightarrow \{ 0,1\}$ such that every efficient algorithm makes errors at least $1%$ of the times when computing $g$ on a random input. (We'll think of $g$ as exhibiting a weak form of average-case complexity.) Clearly, different algorithms will fail on a different $1%$ of the inputs, and it seems that, intuitively, there should be functions for which no particular input is harder than any particular other input, per se. It's just that whenever you try to come up with an algorithm, some set of mistakes, dependent on the algorithmic technique, will arise.

As a good example, think of the process of generating $g$ at random, by deciding for every input $x$ to set $g(x)=1$ with probability $99%$ and $g(x)=0$ with probability $1%$. (Make the choices independently for different inputs.) With very high probability, every efficient algorithm fails with probability at least about $1%$, but, if we look at every efficiently recognizable large set $H$, we see that $g$ takes the value 1 on approximately $99%$ of the elements of $H$, and so the trivial algorithm that always outputs 1 has a pretty good success probability.

Consider, however, the set $H$ of size $\frac {2}{100} 2^n$ that you get by taking the $\approx \frac{1}{100} 2^n$ inputs $x$ such that $g(x)=0$ plus a random sample of $\frac{1}{100} 2^n$ inputs $x$ such that $g(x)=1$. Then we can see that no efficient algorithm can compute $g$ on much better than $50%$ of the inputs of $H$. This is the highest form of average-case complexity for a boolean function: on such a set $H$ no algorithm does much better in computing $g$ than an algorithm that makes a random guess.

The Impagliazzo hard-core theorem states that it is always possible to find such a set $H$ where the average-case hardness is "concentrated." Specifically, it states that if every efficient algorithm fails to compute $g$ on a $\geq \delta$ fraction of inputs, then there is a set $H$ of size $\geq \delta 2^n$ such that every efficient algorithm fails to compute $g$ on at least a $\frac 12 - \epsilon$ fraction of the elements of $H$. This is true for every $\epsilon,\delta$, and if "efficient" is quantified as "circuits of size $s$" in the premise, then "efficient" is quantified as "circuits of size $poly(\epsilon,\delta) \cdot s$" in the conclusion.

The example of the biased random function given above implies that, if one wants to prove the theorem for arbitrary $g$, then the set $H$ cannot be efficiently computable itself. (The example does not forbid, however, that $H$ be efficiently computable given oracle access to $g$, or that a random element of $H$ be samplable given a sampler for the distribution $(x,g(x))$ for uniform $x$.)

A number of proofs of the hard core theorem are known, and connections have been found with the process of boosting in learning theory and with the construction and the decoding of certain error-correcting codes. Here is a precise statement.

Impagliazzo Hard-Core Set Theorem
Let $g:\{0,1\}^n \rightarrow \{0,1\}$ be a boolean function, $s$ be a size parameter, $\epsilon,\delta>0$ be given. Then there is a $c(\epsilon,\delta) = poly(1/\epsilon,1/\delta)$ such that the following happens.

Suppose that for every function $f:\{0,1\}^n \rightarrow \{0,1\}$ computable by a circuit of size $\leq c\cdot s$ we have

$Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta$

Then there is a set $H$ of size $\geq \delta 2^n$ such that for every function $f$ computable by a circuit of size $\leq s$ we have

$Pr_{x\in H} [ f(x) = g(x) ] \leq \frac 12 + \epsilon$

Using the "finitary ergodic theoretic" approach of iterative partitioning, we (Omer Reingold, Madhur Tulsiani, Salil Vadhan and I) are able to prove the following variant.

Impagliazzo Hard-Core Set Theorem, "Constructive Version"
Let $g:\{0,1\}^n \rightarrow \{0,1\}$ be a boolean function, $s$ be a size parameter, $\epsilon,\delta>0$ be given. Then there is a $c(\epsilon,\delta) = exp(poly(1/\epsilon,1/\delta))$ such that the following happens.

Suppose that for every function $f:\{0,1\}^n \rightarrow \{0,1\}$ computable by a circuit of size $\leq c\cdot s$ we have

$Pr_{x \in \{0,1\}^n} [ f(x) = g(x) ] \leq 1-\delta$

Then there is a set $H$ such that: (i) $H$ is recognizable by circuits of size $\leq c\cdot s$; (ii) $|H| \geq \delta 2^n$, and in fact the number of $x$ in $H$ such that $g(x)=0$ is at least $\frac 12 \delta 2^n$, and so is the number of $x$ in $H$ such that $g(x)=1$; and (iii) for every $f$ computable by a circuit of size $\leq s$,

$Pr_{x\in H} [ g(x) = f(x) ] \leq max \{ Pr_{x\in H}[ g(x) = 0] , Pr_{x\in H} [g(x)=1] \} + \epsilon$

The difference is that $H$ is now an efficiently recognizable set (which is good), but we are not able to derive the same strong average-case complexity of $g$ in $H$ (which, as discussed as the beginning, is impossible in general). Instead of proving that a "random guess algorithm" is near-optimal on $H$, we prove that a "fixed answer algorithm" is near-optimal on $H$. That is, instead of saying that no algorithm can do better than a random guess, we say that no algorithm can do better than either always outputting 0 or always outputting 1. Note that this conclusion is meaningless if $g$ is, say, always equal to 1 on $H$, but in our construction we have that $g$ is not exceedingly biased on $H$, and if $\epsilon < \delta/2$, say, then the conclusion is quite non-trivial.

One can also find a set $H'$ with the same type of average-case complexity as in the original Impagliazzo result by putting into $H'$ a $\frac 12 \delta 2^n$ size sample of elements $x$ of $H$ such that $g(x)=0$ and an equal size sample of elements of $H$ such that $g$ equals 1. (Alternatively, put in $H'$ all the elements of $H$ on which $g$ achieves the minority value of $g$ in $H$, then add a random sample of as many elements achieving the majority value.) Then we recover the original statement except that $c(\epsilon,\delta)$ is exponential instead of polynomial.

Coming up next, the proof of the "constructive hard core set theorem" and my attempt at explaining what the techniques have to do with "finitary ergodic theory."

Harder, Better, Faster, Stronger

2007-11-03T14:58:00.000-07:00

An amazing video to Daft Punk's Harder, Better, Faster, Stronger

Don't be discouraged by the slow first minute; it does get better, faster, and harder.

Doing the same with a different Daft Punk song, however, can be less impressive.

The "Complexity Theory" Proof of a Theorem of Green-Tao-Ziegler

2007-11-01T10:51:00.000-07:00

We want to prove that a dense subset of a pseudorandom set is indistinguishable from a truly dense set.

Here is an example of what this implies: take a pseudorandom generator of output length $n$, choose in an arbitrary way a 1% fraction of the possible seeds of the generator, and run the generator on a random seed from this restricted set; then the output of the generator is indistinguishable from being a random element of a set of size $\frac 1 {100} \cdot 2^n$.

(Technically, the theorem states the existence of a distribution of min-entropy $n - \log_2 100$, but one can also get the above statement by standard "rounding" techniques.)

As a slightly more general example, if you have a generator $G$ mapping a length-$t$ seed into an output of length $n$, and $Z$ is a distribution of seeds of min-entropy at least $t-d$, then $G(Z)$ is indistinguishable from a distribution of min-entropy $n-d$. (This, however, works only if $d = O(\log n)$.)

It's time to give a formal statement. Recall that we say that a distribution $D$ is $\delta$-dense in a distribution $R$ if

$\forall x. Pr[R=x] \geq \delta \cdot Pr [D=x]$

(Of course I should say "random variable" instead of "distribution," or write things differently, but we are between friends here.)

We want to say that if $F$ is a class of tests, $R$ is pseudorandom according to a moderately larger class $F'$, and $D$ is $\delta$-dense in $R$, then there is a distribution $M$ that is indistinguishable from $D$ according to $F$ and that is $\delta$-dense in the uniform distribution.

The Green-Tao-Ziegler proof of this result becomes slightly easier in our setting of interest (where $F$ contains boolean functions) and gives the following statement:

Theorem (Green-Tao-Ziegler, Boolean Case)
Let $\Sigma$ be a finite set, $F$ be a class of functions $f:\Sigma \to \{0,1\}$, $R$ be a distribution over $\Sigma$, $D$ be a $\delta$-dense distribution in $R$, $\epsilon>0$ be given.

Suppose that for every $M$ that is $\delta$-dense in $U_\Sigma$ there is an $f\in F$ such that
$| Pr[f(D)=1] - Pr[f(M)] = 1| >\epsilon$

Then there is a function $h:\Sigma \rightarrow \{0,1\}$ of the form $h(x) = g(f_1(x),\ldots,f_k(x))$ where $k = poly(1/\epsilon,1/\delta)$ and $f_i \in F$ such that
$| Pr [h(R)=1] - Pr [ h(U_\Sigma) =1] | > poly(\epsilon,\delta)$

Readers should take a moment to convince themselves that the above statement is indeed saying that if $R$ is pseudorandom then $D$ has a model $M$, by equivalently saying that if no model $M$ exists then $R$ is not pseudorandom.

The problem with the above statement is that $g$ can be arbitrary and, in particular, it can have circuit complexity exponential in $k$, and hence in $1/\epsilon$.

In our proof, instead, $g$ is a linear threshold function, realizable by a $O(k)$ size circuit. Another improvement is that $k=poly(1/\epsilon,\log 1/\delta)$.

Here is the proof by Omer Reingold, Madhur Tulsiani, Salil Vadhan, and me. Assume $F$ is closed under complement (otherwise work with the closure of $F$), then the assumption of the theorem can be restated without absolute values

for every $M$ that is $\delta$-dense in $U_\Sigma$ there is an $f\in F$ such that
$Pr[f(D)=1] - Pr[f(M) = 1] >\epsilon$

We begin by finding a "universal distinguisher."

Claim
There is a function $\bar f:\Sigma \rightarrow [0,1]$ which is a convex combination of functions from $F$ and such that that for every $M$ that is $\delta$-dense in $U_\Sigma$,
$E[\bar f(D)] - E[\bar f(M)] >\epsilon$

This can be proved via the min-max theorem for two-players games, or, equivalently, via linearity of linear programming, or, like an analyst would say, via the Hahn-Banach theorem.

Let now $S$ be the set of $\delta |\Sigma|$ elements of $\Sigma$ where $\bar f$ is largest. We must have
(1) $E[\bar f(D)] - E[\bar f(U_S)] >\epsilon$
which implies that there must be a threshold $t$ such that
(2) $Pr[\bar f(D)\geq t] - Pr[\bar f(U_S) \geq t] >\epsilon$
So we have found a boolean distinguisher between $D$ and $U_S$. Next,
we claim that the same distinguisher works between $R$ and $U_\Sigma$.

By the density assumption, we have
$Pr[\bar f(R)\geq t] \geq \delta \cdot Pr[\bar f(D) \geq t]$

and since $S$ contains exactly a $\delta$ fraction of $\Sigma$, and since the condition $\bar f(x) \geq t$ always fails outside of $S$ (why?), we then have
$Pr[\bar f(U_\Sigma)\geq t] = \delta \cdot Pr[\bar f(U_S) \geq t]$
and so
(3) $Pr[\bar f(R)\geq t] - Pr[\bar f(U_\Sigma) \geq t] >\delta \epsilon $

Now, it's not clear what the complexity of $\bar f$ is: it could be a convex combination involving all the functions in $F$. However, by Chernoff bounds, there must be functions $f_1,\ldots,f_k$ with $k=poly(1/\epsilon,\log 1/\delta)$ such that $\bar f(x)$ is well approximated by $\sum_i f_i(x) / k$ for all $x$ but for an exceptional set having density less that, say, $\delta\epsilon/10$, according to both $R$ and $U_\Sigma$.

Now $R$ and $U_\Sigma$ are distinguished by the predicate $\sum_{i=1}^k f_i(x) \geq tk$, which is just a linear threshold function applied to a small set of functions from $F$, as promised.

Actually I have skipped an important step: outside of the exceptional set, $\sum_i f_i(x)/k$ is going to be close to $\bar f(x)$ but not identical, and this could lead to problems. For example, in (3) $\bar f(R)$ might typically be larger than $t$ only by a tiny amount, and $\sum_i f_i(x)/k$ might consistently underestimate $\bar f$ in $R$. If so, $Pr [ \sum_{i=1}^k f_i(R) \geq tk ]$ could be a completely different quantity from $Pr [\bar f(R)\geq t]$.

To remedy this problem, we note that, from (1), we can also derive the more "robust" distinguishing statement
(2') $Pr[\bar f(D)\geq t+\epsilon/2] - Pr[\bar f(U_S) \geq t] >\epsilon/2$
from which we get
(3') $Pr[\bar f(R)\geq t+\epsilon/2] - Pr[\bar f(U_\Sigma) \geq t] >\delta \epsilon/2 $

And now we can be confident that even replacing $\bar f$ with an approximation we still get a distinguisher.

The statement needed in number-theoretic applications is stronger in a couple of ways. One is that we would like $F$ to contain bounded functions $f:\Sigma \rightarrow [0,1]$ rather than boolean-valued functions. Looking back at our proof, this makes no difference. The other is that we would like $h(x)$ to be a function of the form $h(x) = \Pi_{i=1}^k f_i(x)$ rather than a general composition of functions $f_i$. This we can achieve by approximating a threshold function by a polynomial of degree $poly(1/\epsilon,1/\delta)$ using the Weierstrass theorem, and then choose the most distinguishing monomial. This gives a proof of the following statement, which is equivalent to Theorem 7.1 in the Tao-Ziegler paper.

Theorem (Green-Tao-Ziegler, General Case)
Let $\Sigma$ be a finite set, $F$ be a class of functions $f:\Sigma \to [0,1]$, $R$ be a distribution over $\Sigma$, $D$ be a $\delta$-dense distribution in $R$, $\epsilon>0$ be given.

Suppose that for every $M$ that is $\delta$-dense in $U_\Sigma$ there is an $f\in F$ such that
$| Pr[f(D)=1] - Pr[f(M)] = 1| >\epsilon$

Then there is a function $h:\Sigma \rightarrow \{0,1\}$ of the form $h(x) = \Pi_{i=1}^k f_i(x)$ where $k = poly(1/\epsilon,1/\delta)$ and $f_i \in F$ such that
$| Pr [f(R)=1] - Pr [ f(U_\Sigma) =1] | > exp(-poly(1/\epsilon,1/\delta))$

In this case, we too lose an exponential factor. Our proof, however, has some interest even in the number-theoretic setting because it is somewhat simpler than and genuinely different from the original one.

Dense Subsets of Pseudorandom Sets

2007-10-30T18:39:00.000-07:00

The Green-Tao theorem states that the primes contain arbitrarily long arithmetic progressions; its proof can be, somewhat inaccurately, broken up into the following two steps:

Thm1: Every constant-density subset of a pseudorandom set of integers contains arbitrarily long arithmetic progressions.

Thm2: The primes have constant density inside a pseudorandom set.

Of those, the main contribution of the paper is the first theorem, a "relative" version of Szemeredi's theorem. In turn, its proof can be (even more inaccurately) broken up as

Thm 1.1: For every constant density subset D of a pseudorandom set there is a "model" set M that has constant density among the integers and is indistinguishable from D.

Thm 1.2 (Szemeredi) Every constant density subset of the integers contains arbitrarily long arithmetic progressions, and many of them.

Thm 1.3 A set with many long arithmetic progressions cannot be indistinguishable from a set with none.

Following this scheme is, of course, easier said than done. One wants to work with a definition of pseudorandomness that is weak enough that (2) is provable, but strong enough that the notion of indistinguishability implied by (1.1) is in turn strong enough that (1.3) holds. From now on I will focus on (1.1), which is a key step in the proof, though not the hardest.

Recently, Tao and Ziegler proved that the primes contain arbitrarily long "polynomial progressions" (progressions where the increments are given by polynomials rather than linear functions, as in the case of arithmetic progressions). Their paper contains a very clean formulation of (1.1), which I will now (accurately, this time) describe. (It is Theorem 7.1 in the paper. The language I use below is very different but equivalent.)

We fix a finite universe $\Sigma$; this could be $\{ 0,1\}^n$ in complexity-theoretic applications or $Z/NZ$ in number-theoretic applications. Instead of working with subsets of $\Sigma$, it will be more convenient to refer to probability distributions over $\Sigma$; if $S$ is a set, then $U_S$ is the uniform distribution over $S$. We also fix a family $F$ of "easy" function $f: \Sigma \rightarrow [0,1]$. In a complexity-theoretic applications, this could be the set of boolean functions computed by circuits of bounded size. We think of two distributions $X,Y$ as being $\epsilon$-indistinguishable according to $F$ if for every function $f\in F$ we have

$| E [f(X)] - E[f(Y)] | \leq \epsilon$

and we think of a distribution as pseudorandom if it is indistinguishable from the uniform distribution $U_\Sigma$. (This is all standard in cryptography and complexity theory.)

Now let's define the natural analog of "dense subset" for distributions. We say that a distribution $A$ is $\delta$-dense in $B$ if for every $x\in \Sigma$ we have

$Pr [ B=x] \geq \delta Pr [A=x]$

Note that if $B=U_T$ and $A=U_S$ for some sets $S,T$, then $A$ is $\delta$-dense in $B$ if and only if $S\subseteq T$ and $|S| \geq \delta |T|$.

So we want to prove the following:

Theorem (Green, Tao, Ziegler)
Fix a family $F$ of tests and an $\epsilon>0$; then there is a "slightly larger" family $F'$ and an $\epsilon'>0$ such that if $R$ is an $\epsilon'$-pseudorandom distribution according to $F'$ and $D$ is $\delta$-dense in $R$, then there is a distribution $M$ that is $\delta$-dense in $U_\Sigma$ and that is $\epsilon$-indistinguishable from $D$ according to $F$.

[The reader may want to go back to (1.1) and check that this is a meaningful formalization of it, up to working with arbitrary distributions rather than sets. This is in fact the "inaccuracy" that I referred to above.]

In a complexity-theoretic setting, we would like to say that if $F$ is defined as all functions computable by circuits of size at most $s$, then $\epsilon'$ should be $poly (\epsilon,\delta)$ and $F'$ should contain only functions computable by circuits of size $s\cdot poly(1/\epsilon,1/\delta)$. Unfortunately, if one follows the proof and makes some simplifications asuming $F$ contains only boolean functions, one sees that $F'$ contains functions of the form $g(x) = h(f_1(x),\ldots,f_k(x))$, where $f_i \in F$, $k = poly(1/\epsilon,1/\delta)$, and $h$ could be arbitrary and, in general, have circuit complexity exponential in $1/\epsilon$ and $1/\delta$. Alternatively one may approximate $h()$ as a low-degree polynomial and take the "most distinguishing monomial." This will give a version of the Theorem (which leads to the actual statement of Thm 7.1 in the Tao-Ziegler paper) where $F'$ contains only functions of the form $\Pi_{i=1}^k f_i(x)$, but then $\epsilon'$ will be exponentially small in $1/\epsilon$ and $1/\delta$. This means that one cannot apply the theorem to "cryptographically strong" notions of pseudorandomness and indistinguishability, and in general to any setting where $1/\epsilon$ and $1/\delta$ are super-logarithmic (not to mention super-linear).

This seems like an unavoidable consequence of the "finitary ergodic theoretic" technique of iterative partitioning and energy increment used in the proof, which always yields at least a singly exponential complexity.

Omer Reingold, Madhur Tulsiani, Salil Vadhan and I have recently come up with a different proof where both $\epsilon'$ and the complexity of $F'$ are polynomial. This gives, for example, a new characterization of the notion of pseudoentropy. Our proof is quite in the spirit of Nisan's proof of Impagliazzo's hard-core set theorem, and it is relatively simple. We can also deduce a version of the theorem where, as in Green-Tao-Ziegler, $F'$ contains only bounded products of functions in $F$. In doing so, however, we too incur an exponential loss, but the proof is somewhat simpler and demonstrates the applicability of complexity-theoretic techniques in arithmetic combinatorics.

Since we can use (ideas from) a proof of the hard core set theorem to prove the Green-Tao-Ziegler result, one may wonder whether one can use the "finitary ergodic theory" techniques of iterative partitioning and energy increment to prove the hard-core set theorem. Indeed, we do this too. In our proof, the reduction loses a factor that is exponential in certain parameters (while other proofs are polynomial), but one also gets a more "constructive" result.

If readers can stomach it, a forthcoming post will describe the complexity-theory-style proof of the Green-Tao-Ziegler result as well as the ergodic-theory-style proof of the Impagliazzo hard core set theorem.

Discovering the Cyber-Transformations

2007-10-25T10:40:00.000-07:00

If memory serves me well, I have attended all STOC and FOCS conferences since STOC 1997 in El Paso, except STOC 2002 in Montreal (for visa problems), which should add up to 21 conferences. In most of those conferences I have also attended the "business meeting." This is a time when attendees convene after dinner, have beer, the local organizers talk about their local organization, the program committee chair talks about how they put the program together ("papers were submitted, then we reviewed them, finally we accepted some of those. Let me show you twenty slides of meaningless statistics about said papers"), organizers of future conferences talk about their ongoing organizing, David Johnson raises issues to be discussed, and so on. The SODA drinking game gives a good idea of what goes on.

A fixture of business meetings is also a presentation of the state of National Science Foundation (NSF) funding for theory in the US. In the first several conferences I attended, the NSF program director for theory would take the podium, show a series of incomprehensible slides, and go something like "there is no money; you should submit a lot of grant applications; I will reject all applications because there is no money, but low acceptance rates could bring us more money in future years; you should apply to non-theory programs, because there is no money in theory, but don't make it clear you are doing theory, otherwise they'll send your proposal to me, and I have no money. In conclusion, I have no money and we are all doomed."

Things hit rock bottom around 2004, when several issues (DARPA abandoning basic research, the end of the NSF ITR program, a general tightening of the NSF budget at a time of increased student tuition, a change in NSF accounting system requiring multi-year grants to be funded entirely from the budget of the year of the award, ....) conspired to create a disastrous funding season. At that point several people in the community, with Sanjeev Arora playing a leading role, realized that something had to be done to turn things around. A SIGACT committee was formed to understand what had gone wrong and how to repair it.

I don't know if it is an accurate way of putting it, but my understanding is that our community had done a very bad job in publicizing its results to a broader audience. Indeed I remember, in my job interviews, a conversation that went like "What do you do?" "Complexity theory" "Structural complexity or descriptive complexity?" "??". (I also got a "What complexity classes do you study?") And I understand that whenever people from the SIGACT committee went to talk to NSF higher-ups about theory, everybody was interested and the attitude was almost "why haven't you told us about this stuff before?"

For various reasons, it is easier at NSF to put funding into a new initiative than to increase funding of an existing one, and an idea that came up early on was to fund an initiative on "theory as a lens for the sciences," to explore work in economics, quantum mechanics, biology, statistical physics, etc., where the conceptual tools of theoretical computer science are useful to even phrase the right questions, as well as work towards their solution. This idea took on a life of its own, grew much more broad than initially envisioned (so that the lens thing is now a small part of it), received an appropriately cringe-inducing name, and is now the Cyber-Enabled Discovery and Innovation (CDI) program, that is soon accepting its first round of submissions.

Thanks to the work that Bill Steiger put in as program director in the last year and a half, and to the efforts of the SIGACT committee, the outlook for theory funding is now much optimistic.

At the FOCS 2007 business meeting last Monday, Bill talked about the increase in funding that happened under his watch, Sanjeev Arora talked about the work of the committee and the new funding opportunities (of which CDI is only one). In addition, as happened a few times in the last couple of years, Mike Foster from NSF gave his own, generally theory-friendly, presentation. Mike is a mid-level director at NSF (one or two levels above the theory program), and the regular presence of people in his position at STOC and FOCS is, I think, without precedent before 2005. (Or at least between 1997 and 2004.)

The NSF is relatively lean, efficient and competent for being a federal bureaucracy, but it is still a federal bureaucracy, with its quirks.

A few years ago, it started a much loathed requirement to explicitly state the "broader impact" of any proposed grant. I actually don't mind this requirement: it does not ask to talk about "applications," but rather of all the important research work that is not just establishing technical result. Disseminating results, for example, writing notes, expository work, and surveys and making them available, bringing research-level material to freshmen in a new format, doing outreach, doing something to increase representation of women and minority, and so on.

As reported by Sanjeev Arora in his presentation, however, NSF is now requiring to state how the research in a given proposal is "transformative." (I just got a spelling warning after typing it.) I am not sure this makes any sense. The person sitting next to me commented, "Oh no, the goal of my research is always to maintain the status quo."

The Next Viral Videos

2007-10-25T10:30:00.000-07:00

Back in August, Boaz Barak and Moses Charikar organized a two-day course on additive combinatorics for computer scientists in Princeton. Boaz and Avi Wigderson spoke on sum-product theorems and their applications, and I spoke on techniques in the proofs of Szemeredi's theorem and their applications. As an Australian model might say, that's interesting!

Videos of the talks are now online. The quality of the audio and video is quite good, you'll have to decide for yourself on the quality of the lectures. The schedule of the event was grueling, and in my last two lectures (on Gowers uniformity and applications) I am not very lucid. In earlier lectures, however, I am merely sleep deprived -- I can be seen falling asleep in front of the board a few times. Boaz's and Avi's lectures, however, are flawless.