What good is O(log n)-approximation?
I have spent a good part of the last two years catching up on recent work on PCP, hardness of approximation, and approximation algorithms. Lately, this has been a spectacularly successful area of theory, from ARV and its many spin-offs, to the fantastically deep work around the unique games conjecture, to Irit Dinur's new approach to PCP, and so on. There is, however, a nagging question about this line of work. Given that, "in practice," one wants (and, often, achieves) approximations within a factor of 1% or 2%, where is all the excitement about having approximation log n versus (log n)1/2, or even 2 versus 1.5? Isn't this an entirely self-referential community that has nothing to offer to the practice of computing?
In his (beautifully written and highly recommended) book, Vijay Vazirani considers this question and gives the objective benchmark response.
In my own words, the point of the objective benchmark response is to observe that the service that theory can give to other communities is the development of ideas and techniques that are so counterintuitive and/or sophisticated that they would never occur to someone just working on a particular problem. Such ideas and techniques are the result of working in an idealized model (simplified problems, looking only for polynomial time algorithms, regardless of efficiency, focusing on worst-case approximation ratio) that, however far from reality, is objectively quantitative. If a new algorithm has a better approximation ratio than previous ones, then there must be something new in the algorithm, and, over time, whatever it is that is new will be understood, simplified, and distilled to the point of becoming a standard technique that can be used in practice. I assume that there are good stories to go with this argument. (Primal-dual algorithms? Graph-partitioning followed by dynamic programming?) Incidentally, this argument suggests that one should be more interested in an algorithm that barely matches known bounds but that works in a radically new way, than in an algorithm that shaves a loglog n factor by a new combination of known methods plus some case analysis.
I think, however, than one can give a slightly different response. Isn't it true that in practice we want 2% approximation? Exactly!, I would say. But if we want to prove that our efficient algorithm achieves such approximation, we can't, because PCP results forbid it. Plus, for specific algorithms, we know how to construct inputs on which the algorithm performs very badly. And there is now an intriguing convergence between the kind of inputs constructed as worst cases of specific algorithms (such as linear and semidefinite programs) and the kind of inputs constructed as gadgets for PCP constructions. So we have an increasingly clear idea of what inputs are hard. This can be an excellent starting point for reasoning about algorithms. Once we know that graphs with property P are very bad for known algorithms and come out of PCP constructions, we can think of algorithms that work on anti-P graphs, and the insights developed in designing algorithms for general graphs, in finding hard instances, and constructing PCPs, might help us understand how algorithms work on anti-P graphs. Those anti-P graphs, in turn, might include most of the inputs that come up in practice. (More later on this point.)
There is at least one story to go with this argument, and it's Sanjeev Arora's lack of success in proving a PCP hardness for TSP in the plane, that led him to try to understand what a PCP gadget for TSP should look like, that led him to conclude that no such gadget could be embedded in the plane, because of a property of geometric instances that was the starting point of his approximation scheme! (See Arora's paper.)
Lately, several researches have been working simultaneously on inapproximability and on approximation algorithms, and I hope that we will see more positive consequences of the understanding gained from lower bounds.
Another standard objection to our way of studying algorithms is the emphasis on worst case, while "in practice" one is happy enough with algorithms that are good on "most" inputs. Here I have never been satisfied with the pessimism response that we don't know what distribution of inputs will be typical in applications, so it is good (and more reliable) to focus on worst-case performance. The recent work on smoothed analysis is certainly a good indication that the theory community can say interesting and rigorous things about random inputs.
There is, however, a different line of work that I would like to see. For example, there is now an understanding that "typical" networks have a power-law degree distribution and there are random models for them which are considered adequate and predictive. Wouldn't it be great to have approximation algorithms for routing and other problems analysed on those distributions? Especially if the analysis had the following modular structure: (1) we understand how worst-case instances look like for important network problem X; (2) by reasoning about networks that are very far from such worst-cases, we develop an algorithm for X that has outstanding performance on graphs with property Y; (3) graphs sampled from model Z of power-law graphs have property Y with very high probability.
Update 8/1/06: David Johnson points out that there is a body of recent work showing discrepancies between the topology of real networks and the topologies generated by popular models of power-law graphs. For example, the graphs in the probabilistic model have smaller vertex-connectivity than real networks (which are designed to be fault-tolerant). Here is one early paper about this.
In his (beautifully written and highly recommended) book, Vijay Vazirani considers this question and gives the objective benchmark response.
In my own words, the point of the objective benchmark response is to observe that the service that theory can give to other communities is the development of ideas and techniques that are so counterintuitive and/or sophisticated that they would never occur to someone just working on a particular problem. Such ideas and techniques are the result of working in an idealized model (simplified problems, looking only for polynomial time algorithms, regardless of efficiency, focusing on worst-case approximation ratio) that, however far from reality, is objectively quantitative. If a new algorithm has a better approximation ratio than previous ones, then there must be something new in the algorithm, and, over time, whatever it is that is new will be understood, simplified, and distilled to the point of becoming a standard technique that can be used in practice. I assume that there are good stories to go with this argument. (Primal-dual algorithms? Graph-partitioning followed by dynamic programming?) Incidentally, this argument suggests that one should be more interested in an algorithm that barely matches known bounds but that works in a radically new way, than in an algorithm that shaves a loglog n factor by a new combination of known methods plus some case analysis.
I think, however, than one can give a slightly different response. Isn't it true that in practice we want 2% approximation? Exactly!, I would say. But if we want to prove that our efficient algorithm achieves such approximation, we can't, because PCP results forbid it. Plus, for specific algorithms, we know how to construct inputs on which the algorithm performs very badly. And there is now an intriguing convergence between the kind of inputs constructed as worst cases of specific algorithms (such as linear and semidefinite programs) and the kind of inputs constructed as gadgets for PCP constructions. So we have an increasingly clear idea of what inputs are hard. This can be an excellent starting point for reasoning about algorithms. Once we know that graphs with property P are very bad for known algorithms and come out of PCP constructions, we can think of algorithms that work on anti-P graphs, and the insights developed in designing algorithms for general graphs, in finding hard instances, and constructing PCPs, might help us understand how algorithms work on anti-P graphs. Those anti-P graphs, in turn, might include most of the inputs that come up in practice. (More later on this point.)
There is at least one story to go with this argument, and it's Sanjeev Arora's lack of success in proving a PCP hardness for TSP in the plane, that led him to try to understand what a PCP gadget for TSP should look like, that led him to conclude that no such gadget could be embedded in the plane, because of a property of geometric instances that was the starting point of his approximation scheme! (See Arora's paper.)
Lately, several researches have been working simultaneously on inapproximability and on approximation algorithms, and I hope that we will see more positive consequences of the understanding gained from lower bounds.
Another standard objection to our way of studying algorithms is the emphasis on worst case, while "in practice" one is happy enough with algorithms that are good on "most" inputs. Here I have never been satisfied with the pessimism response that we don't know what distribution of inputs will be typical in applications, so it is good (and more reliable) to focus on worst-case performance. The recent work on smoothed analysis is certainly a good indication that the theory community can say interesting and rigorous things about random inputs.
There is, however, a different line of work that I would like to see. For example, there is now an understanding that "typical" networks have a power-law degree distribution and there are random models for them which are considered adequate and predictive. Wouldn't it be great to have approximation algorithms for routing and other problems analysed on those distributions? Especially if the analysis had the following modular structure: (1) we understand how worst-case instances look like for important network problem X; (2) by reasoning about networks that are very far from such worst-cases, we develop an algorithm for X that has outstanding performance on graphs with property Y; (3) graphs sampled from model Z of power-law graphs have property Y with very high probability.
Update 8/1/06: David Johnson points out that there is a body of recent work showing discrepancies between the topology of real networks and the topologies generated by popular models of power-law graphs. For example, the graphs in the probabilistic model have smaller vertex-connectivity than real networks (which are designed to be fault-tolerant). Here is one early paper about this.