### The hard science of Sociology

On Monday, the first talk I managed to attend was by Terry Sejnovski, of the Salk Institute, and I only got there towards the end of the talk. Starting from the question of how the brain manages to "do" vision, he talked about fascinating experiments that, among other things, test the rationality of conscious versus unconscious decisions (apparently, in complex decision-making involving lots of variables, "gut feelings" are better than conscious reasoning) and measure the surprisingly good "computational" ability of monkeys.

In the afternoon, Valiant delivered his talk on "accidental algorithms" to a standing-room-only packed audience. His main results are (1) a $\oplus P$-completeness result for $\oplus$-2SAT, even when restricted to planar instances, and (2) a polynomial time algorithm for $mod_7$-2SAT on planar instances. $\oplus P$ is the complexity class of problems that reduce to the question of whether a given non-deterministic Turing machine, on a given input, has an odd number of accepting computations. By the Valiant-Vazirani result, a $\oplus P$-complete problem is also NP-hard under randomized reductions, and, by Toda's theorem, is actually hard for all the levels of the polynomial hierarchy. The $\oplus$-2SAT problem is to determine whether a given 2SAT instance has an odd number of satisfying assignments. Apparently, this is the first natural $\oplus P$-complete problem for which there is a polynomial time algorithm that checks if the number of solutions is zero or non-zero. The $mod_7$-2SAT problem is, given a 2SAT formula, to determine if the number of assignment is or is not a multiple of 7. Valiant's algorithm is based on his earlier work on

Afterwards, most people moved to Uri Feige's talk in the next room on finding witnesses of unsatisfiability for random 3CNF formulas. Previously, it was known how to find, with high probability, witnesses of unstatisfiability for a given random 3CNF formula with $n$ variables and about $n^{1.5}$ clauses. This new work by Feige, Kim and Ofek shows that witnesses of unsatisfiability exist for most random 3CNF formulas with $n$ variables and about $n^{1.4}$ clauses. (They do not give an efficient algorithm to find such witnesses.)

Nicholas Harvey gave the last talk of the day on his work on reducing the matching problem to matrix multiplication. Previous work had shown that matching can be solved in the same time as matrix multiplication, but the previous reduction was quite complex for general graphs. (And simpler for bipartite graphs.) Harvey gives a considerably more transparent new algorithm that reduces matching in general graphs to matrix multiplication. His talk was very clear and enjoyable, and I am grateful to Moses Charikar for recommending it. (Because of the word "matroid" in the title, I was going to skip to the talk.)

On Tuesday, Jon Kleinberg gave a spectacular invited talk on the kind of radically new results on sociology that one can extract from new datasets (resulting from people using large online retailers, joining online communities such as livejournal and myspace, and so on) and from algorithmic thinking. I did not take notes, and I could not find the slides online, so I can only offer a few snippets that I remember. In one of Jon's most celebrated results, he studied an idealized model of "social networks" made of a mix of "local" connections and of random "long-distance" connections, and he showed that, in the resulting network, people would be able to find short paths among themselves only if the long-distance connections have a probability that is proportional the inverse of the square of the distance between two people. In the model, the local connections are laid out according to a grid, but the same result holds in a more general model where the local connections define a metric space, and the random connections are such that the probability that $i$ and $j$ are connected is proportional to the inverse of the number of people that are closer to $i$ than $j$ is. A study made on livejournal data shows that the "friends" that people make online are distributed in a way that is in spooky agreement with the above distribution.

Why do people make friends with precisely the only distribution that makes it easy to find short paths between themselves? The answer is unclear, but the outstanding thing is that a "law of human nature" seems to have been discovered, which could have never been formulated without thinking in terms of algorithms. Presumably, any explanation of this phenomenon will have to involve an algorithmic model of how people make connections online.

Another interesting question is the effect of "peer pressure" or community involvment. We are more likely to do something if our friends also do it (forgetting for a moment the question of what is the cause and what is the effect, that is whether we make friends with like-minded people, or whether we are influenced by what others do), but what exactly is the dependency between how many friends do something and how likely we are to do the same? Is there a "diminishing returns" curve, the way we are impressed by the first few friends who buy an ipod, and hence more and more likely to buy one ourselves, and then less and less impressed? Or is it a "critical mass" curve, the way skype is useless unless a lot of friends of ours are using it? There has been amusing work studying this question by looking at communities on livejournal and conferences on dblp. One can see how likely is a person to join a livejournal community as a function of the number of friends who are already members of the community, or see how likely is a person to publish for the first time in a conference as a function of former coauthors who have already published in the conference. The two curves are reasonably similar, and both are of "diminishing returns" type.

So one is more likely to, say, start publishing in a conference the more people one knows who are part of the community of that conference. Now, we can ask, given that one knows the same number of people in the community, does it make it more or less likely to start publishing there if the people know each other (say, have published together in the past), or if they do not know each other? To see the same question in other settings, if we hear a rumor, we are more likely to believe it if we hear it from more people. Furthermore, all other things being equal, we are more likely to believe it if we hear it from

In the case of conferences, the former scenario is right. One is more likely to publish in a conference if the former coauthors who are part of the community are unrelated. Similarly, less "densely clustered" communities are the ones that are more likely to grow. There is a cause-and-effect problem here, in the sense that a community where everybody knows everybody else might be a mature and declining one, and being unappealing for such a reason; another explanation is that densely clustered community appear cliquish to outsiders, and are unappealing for such a reason.

There was more and more, in the talk. I have often heard that when Biology will become a "hard science" with a strong mathematical foundation, the mathematics will look familiar to those working in electrical engineering and in algorithms, and that theoretical computer science will be a major contributor to this future theoretical biology. What I did not know is that Sociology itself can become a hard science, and that theoretical computer science can play a major role in this transformation.

In the afternoon, Valiant delivered his talk on "accidental algorithms" to a standing-room-only packed audience. His main results are (1) a $\oplus P$-completeness result for $\oplus$-2SAT, even when restricted to planar instances, and (2) a polynomial time algorithm for $mod_7$-2SAT on planar instances. $\oplus P$ is the complexity class of problems that reduce to the question of whether a given non-deterministic Turing machine, on a given input, has an odd number of accepting computations. By the Valiant-Vazirani result, a $\oplus P$-complete problem is also NP-hard under randomized reductions, and, by Toda's theorem, is actually hard for all the levels of the polynomial hierarchy. The $\oplus$-2SAT problem is to determine whether a given 2SAT instance has an odd number of satisfying assignments. Apparently, this is the first natural $\oplus P$-complete problem for which there is a polynomial time algorithm that checks if the number of solutions is zero or non-zero. The $mod_7$-2SAT problem is, given a 2SAT formula, to determine if the number of assignment is or is not a multiple of 7. Valiant's algorithm is based on his earlier work on

*matchgate computations*.Afterwards, most people moved to Uri Feige's talk in the next room on finding witnesses of unsatisfiability for random 3CNF formulas. Previously, it was known how to find, with high probability, witnesses of unstatisfiability for a given random 3CNF formula with $n$ variables and about $n^{1.5}$ clauses. This new work by Feige, Kim and Ofek shows that witnesses of unsatisfiability exist for most random 3CNF formulas with $n$ variables and about $n^{1.4}$ clauses. (They do not give an efficient algorithm to find such witnesses.)

Nicholas Harvey gave the last talk of the day on his work on reducing the matching problem to matrix multiplication. Previous work had shown that matching can be solved in the same time as matrix multiplication, but the previous reduction was quite complex for general graphs. (And simpler for bipartite graphs.) Harvey gives a considerably more transparent new algorithm that reduces matching in general graphs to matrix multiplication. His talk was very clear and enjoyable, and I am grateful to Moses Charikar for recommending it. (Because of the word "matroid" in the title, I was going to skip to the talk.)

On Tuesday, Jon Kleinberg gave a spectacular invited talk on the kind of radically new results on sociology that one can extract from new datasets (resulting from people using large online retailers, joining online communities such as livejournal and myspace, and so on) and from algorithmic thinking. I did not take notes, and I could not find the slides online, so I can only offer a few snippets that I remember. In one of Jon's most celebrated results, he studied an idealized model of "social networks" made of a mix of "local" connections and of random "long-distance" connections, and he showed that, in the resulting network, people would be able to find short paths among themselves only if the long-distance connections have a probability that is proportional the inverse of the square of the distance between two people. In the model, the local connections are laid out according to a grid, but the same result holds in a more general model where the local connections define a metric space, and the random connections are such that the probability that $i$ and $j$ are connected is proportional to the inverse of the number of people that are closer to $i$ than $j$ is. A study made on livejournal data shows that the "friends" that people make online are distributed in a way that is in spooky agreement with the above distribution.

Why do people make friends with precisely the only distribution that makes it easy to find short paths between themselves? The answer is unclear, but the outstanding thing is that a "law of human nature" seems to have been discovered, which could have never been formulated without thinking in terms of algorithms. Presumably, any explanation of this phenomenon will have to involve an algorithmic model of how people make connections online.

Another interesting question is the effect of "peer pressure" or community involvment. We are more likely to do something if our friends also do it (forgetting for a moment the question of what is the cause and what is the effect, that is whether we make friends with like-minded people, or whether we are influenced by what others do), but what exactly is the dependency between how many friends do something and how likely we are to do the same? Is there a "diminishing returns" curve, the way we are impressed by the first few friends who buy an ipod, and hence more and more likely to buy one ourselves, and then less and less impressed? Or is it a "critical mass" curve, the way skype is useless unless a lot of friends of ours are using it? There has been amusing work studying this question by looking at communities on livejournal and conferences on dblp. One can see how likely is a person to join a livejournal community as a function of the number of friends who are already members of the community, or see how likely is a person to publish for the first time in a conference as a function of former coauthors who have already published in the conference. The two curves are reasonably similar, and both are of "diminishing returns" type.

So one is more likely to, say, start publishing in a conference the more people one knows who are part of the community of that conference. Now, we can ask, given that one knows the same number of people in the community, does it make it more or less likely to start publishing there if the people know each other (say, have published together in the past), or if they do not know each other? To see the same question in other settings, if we hear a rumor, we are more likely to believe it if we hear it from more people. Furthermore, all other things being equal, we are more likely to believe it if we hear it from

*unrelated*people, because it gives some kind of independent confirmation. If we are deciding whether to go to a party, we are also more likely to go the more people we know who are also going. In this case, however, all other things being equal, we are more likely to go if the people that we know who are going also know each other.In the case of conferences, the former scenario is right. One is more likely to publish in a conference if the former coauthors who are part of the community are unrelated. Similarly, less "densely clustered" communities are the ones that are more likely to grow. There is a cause-and-effect problem here, in the sense that a community where everybody knows everybody else might be a mature and declining one, and being unappealing for such a reason; another explanation is that densely clustered community appear cliquish to outsiders, and are unappealing for such a reason.

There was more and more, in the talk. I have often heard that when Biology will become a "hard science" with a strong mathematical foundation, the mathematics will look familiar to those working in electrical engineering and in algorithms, and that theoretical computer science will be a major contributor to this future theoretical biology. What I did not know is that Sociology itself can become a hard science, and that theoretical computer science can play a major role in this transformation.