Wednesday, October 25, 2006

The hard science of Sociology

On Monday, the first talk I managed to attend was by Terry Sejnovski, of the Salk Institute, and I only got there towards the end of the talk. Starting from the question of how the brain manages to "do" vision, he talked about fascinating experiments that, among other things, test the rationality of conscious versus unconscious decisions (apparently, in complex decision-making involving lots of variables, "gut feelings" are better than conscious reasoning) and measure the surprisingly good "computational" ability of monkeys.

In the afternoon, Valiant delivered his talk on "accidental algorithms" to a standing-room-only packed audience. His main results are (1) a $\oplus P$-completeness result for $\oplus$-2SAT, even when restricted to planar instances, and (2) a polynomial time algorithm for $mod_7$-2SAT on planar instances. $\oplus P$ is the complexity class of problems that reduce to the question of whether a given non-deterministic Turing machine, on a given input, has an odd number of accepting computations. By the Valiant-Vazirani result, a $\oplus P$-complete problem is also NP-hard under randomized reductions, and, by Toda's theorem, is actually hard for all the levels of the polynomial hierarchy. The $\oplus$-2SAT problem is to determine whether a given 2SAT instance has an odd number of satisfying assignments. Apparently, this is the first natural $\oplus P$-complete problem for which there is a polynomial time algorithm that checks if the number of solutions is zero or non-zero. The $mod_7$-2SAT problem is, given a 2SAT formula, to determine if the number of assignment is or is not a multiple of 7. Valiant's algorithm is based on his earlier work on matchgate computations.

Afterwards, most people moved to Uri Feige's talk in the next room on finding witnesses of unsatisfiability for random 3CNF formulas. Previously, it was known how to find, with high probability, witnesses of unstatisfiability for a given random 3CNF formula with $n$ variables and about $n^{1.5}$ clauses. This new work by Feige, Kim and Ofek shows that witnesses of unsatisfiability exist for most random 3CNF formulas with $n$ variables and about $n^{1.4}$ clauses. (They do not give an efficient algorithm to find such witnesses.)

Nicholas Harvey gave the last talk of the day on his work on reducing the matching problem to matrix multiplication. Previous work had shown that matching can be solved in the same time as matrix multiplication, but the previous reduction was quite complex for general graphs. (And simpler for bipartite graphs.) Harvey gives a considerably more transparent new algorithm that reduces matching in general graphs to matrix multiplication. His talk was very clear and enjoyable, and I am grateful to Moses Charikar for recommending it. (Because of the word "matroid" in the title, I was going to skip to the talk.)

On Tuesday, Jon Kleinberg gave a spectacular invited talk on the kind of radically new results on sociology that one can extract from new datasets (resulting from people using large online retailers, joining online communities such as livejournal and myspace, and so on) and from algorithmic thinking. I did not take notes, and I could not find the slides online, so I can only offer a few snippets that I remember. In one of Jon's most celebrated results, he studied an idealized model of "social networks" made of a mix of "local" connections and of random "long-distance" connections, and he showed that, in the resulting network, people would be able to find short paths among themselves only if the long-distance connections have a probability that is proportional the inverse of the square of the distance between two people. In the model, the local connections are laid out according to a grid, but the same result holds in a more general model where the local connections define a metric space, and the random connections are such that the probability that $i$ and $j$ are connected is proportional to the inverse of the number of people that are closer to $i$ than $j$ is. A study made on livejournal data shows that the "friends" that people make online are distributed in a way that is in spooky agreement with the above distribution.

Why do people make friends with precisely the only distribution that makes it easy to find short paths between themselves? The answer is unclear, but the outstanding thing is that a "law of human nature" seems to have been discovered, which could have never been formulated without thinking in terms of algorithms. Presumably, any explanation of this phenomenon will have to involve an algorithmic model of how people make connections online.

Another interesting question is the effect of "peer pressure" or community involvment. We are more likely to do something if our friends also do it (forgetting for a moment the question of what is the cause and what is the effect, that is whether we make friends with like-minded people, or whether we are influenced by what others do), but what exactly is the dependency between how many friends do something and how likely we are to do the same? Is there a "diminishing returns" curve, the way we are impressed by the first few friends who buy an ipod, and hence more and more likely to buy one ourselves, and then less and less impressed? Or is it a "critical mass" curve, the way skype is useless unless a lot of friends of ours are using it? There has been amusing work studying this question by looking at communities on livejournal and conferences on dblp. One can see how likely is a person to join a livejournal community as a function of the number of friends who are already members of the community, or see how likely is a person to publish for the first time in a conference as a function of former coauthors who have already published in the conference. The two curves are reasonably similar, and both are of "diminishing returns" type.

So one is more likely to, say, start publishing in a conference the more people one knows who are part of the community of that conference. Now, we can ask, given that one knows the same number of people in the community, does it make it more or less likely to start publishing there if the people know each other (say, have published together in the past), or if they do not know each other? To see the same question in other settings, if we hear a rumor, we are more likely to believe it if we hear it from more people. Furthermore, all other things being equal, we are more likely to believe it if we hear it from unrelated people, because it gives some kind of independent confirmation. If we are deciding whether to go to a party, we are also more likely to go the more people we know who are also going. In this case, however, all other things being equal, we are more likely to go if the people that we know who are going also know each other.

In the case of conferences, the former scenario is right. One is more likely to publish in a conference if the former coauthors who are part of the community are unrelated. Similarly, less "densely clustered" communities are the ones that are more likely to grow. There is a cause-and-effect problem here, in the sense that a community where everybody knows everybody else might be a mature and declining one, and being unappealing for such a reason; another explanation is that densely clustered community appear cliquish to outsiders, and are unappealing for such a reason.

There was more and more, in the talk. I have often heard that when Biology will become a "hard science" with a strong mathematical foundation, the mathematics will look familiar to those working in electrical engineering and in algorithms, and that theoretical computer science will be a major contributor to this future theoretical biology. What I did not know is that Sociology itself can become a hard science, and that theoretical computer science can play a major role in this transformation.

Tuesday, October 24, 2006

In theory, it was a good song

After the business meeting, we moved to the adjacent room for the concert of Lady X and the Positive Eigenvalues. We were not supposed to bring alcohol in the conference rooms. There was of course beer at the business meeting, but only because Satish had smuggled it in. But there is a hotel security guy in the hallway, and he should not see us walk out of one conference room and then into another with our beer. And so, following directions given by Satish, a group of distinguished theoreticians, plus me, with their unfinished drinks, takes a back door and tries to get to the next room via the kitchens. This is a worthy start for the rest of the night. (The detour is ultimately unsuccessful, because the back door of the other conference room is locked, so we have to wait for the security guy to leave.)

Lady X is the lead singer, and the band, the Positive Eigenvalues, includes Christos Papadimitriou, Mike Jordan, guest star Anna Karlin, and a number of other Berkeleites (for some reason, all Italians).

They play a number of covers of rock songs that everybody but me recognizes. The crowd of theoreticians sings along, gets warmer and warmer, and starts dancing. The program committee chair surveys the room standing on an eponymous.

Then they play Smells like teen spirit, that even I recognize, a theory pogo breaks out, and those pogoing next to James Lee fly off in all directions. I make my way to the opposite side of the stage.

They also have one original song, titled In Theory. It starts
Picture a carpenter of sorts
he has the flu, he has the blues
he burns his oil every night
all night frustrated and confused

and worst of all he can't complain
in theory his job is plain

At the end, the crowd keeps asking for encores, and, by the time the concert ends, it is almost midnight.

(Pictures courtesy of Madhur Tulsiani.)

Off to a good start

FOCS 2006 is off to a good start with Salil's talk on Sunday morning on realizing statistical Zero-Knowledge arguments for all of NP assuming one-way functions. A zero knowledge proof systems has two important requirements: the proof system must be sound, that is, a cheating prover cannot convince a verifier that a wrong statement is true, and it must be zero knowledge, meaning that a cheating verifier cannot obtain any information by interacting with the prover. Each requirement can come in a statistical version (in which the cheating party can be arbitrarily powerful) or a computational version (in which the requirement holds only for computationally bounded cheating parties).

If both are statistical, we have a Statistical Zero Knowledge proof system, which cannot be achieved for NP-complete problems unless NP is contained in coAM. (Which is about as unlikely as having NP=coNP.) If soundness is statistical but zero knowledge is computational, we have Computational Zero Knowledge proof systems, which exist for all of NP assuming one-way functions.

What if we want computational soundness and statistical zero knowledge? It had been known that one could get such systems for all of NP assuming one-way permutations. Now, with this work of Nguyen, Ong and Vadhan, we finally know that one-way functions suffice for this version as well.

Before lunch, Richard Karp talks about the theme of theoretical computer science as a lens for the sciences. This has been an angle pursued for increased theory funding at NSF, and it is a theme that will recur in all the invited talks. Karp focuses on the case of computational biology. Christos Papadimitriou chairs the session. He says that when he interviewed at MIT, Dertouzos asked him who was his role model, and Christos answered "Dick Karp," an answer that Dertouzos seemed to like. When Karp starts speaking, he says "over time things have switched; now Christos is my role model, and this is why I am wearing black today."

The most enjoyable talk of the day is given by Swastik Kopparty, on joint work with Eli Ben-Sasson and Jaykumar Radhakrishnan. They consider the question of the optimality of the Sudan and Guruswami-Sudan list-decoding algorithms for Reed-Solomon codes. Optimality, that is, in terms of how many errors (or, equivalently, how low agreement) the algorithm can tolerate. The combinatorial question is at which point the size of the list that the algorithm has to compute becomes super-polynomial, precluding any possibility of a polynomial-time algorithm. They vastly improve previous such bounds. In the 20 minutes alloted for the talk, Swastik manages, with contagious enthusiasm, to explain the problem, to fully prove a weaker version of the result, which already slightly improves previous bounds, and to give a good idea of how the general argument looks like.

Xi Chen gives the talk for his paper with Xiaotie Deng on Nash equilibria that won the best paper award. They show that, even for two players, the problem of computing a Nash equilibrium is PPAD-complete, where PPAD includes all search problems in NP where the existence of a solution is guaranteed by a certain "fixed-point argument." In the longer time alloted for the paper award winner, Xi Chen explains the intuition that makes the two-player case seem easier than the case of three or more players, he describes previous results on the complexity of the cases of three or more players, and gives some idea of the new reduction. The talk is really excellent.

After dinner, the business meeting moves along unusually rapidly.

Sanjeev Arora thankfully does without the silly statistics on how many papers were accepted by number of authors and by length of the title, and simply offers the numbers of submissions, acceptance, and attendance. There are only about 220 registered attendants. (Plus a few unregistered ones as well, ehm, ehm.) This is about half the attendance at SODA. Isn't there something wrong going on, suggests Sanjeev? If (now it's me speaking, not Sanjeev) STOC and FOCS have, as a purpose, to be the place where the theory community comes together and we learn abotu results outside our area, but then the community does not come, haven't we lost our purpose?

FOCS 2007 will be in Providence, in a hotel that is currently under renovation, and Alistair Sinclair will be the program committee chair. The bid to hold FOCS 2008 in Philadelphia wins by acclamation against no opponent, despite bidder Sanjeev Khanna pleading not to accept his bid.

Umesh suggests that we move to an online publication of the proceedings and forgo the paper version. After some discussion, we vote for a dizzying array of possibilities. "Would you like to stop publishing proceedings, distribute a CD at the conference, but not do the online publication suggested by Umesh, while continuing the online publication in the IEEE digital library" was one of the options put to vote.