What good is O(log n)-approximation?
I have spent a good part of the last two years catching up on recent work on PCP, hardness of approximation, and approximation algorithms. Lately, this has been a spectacularly successful area of theory, from ARV and its many spin-offs, to the fantastically deep work around the unique games conjecture, to Irit Dinur's new approach to PCP, and so on. There is, however, a nagging question about this line of work. Given that, "in practice," one wants (and, often, achieves) approximations within a factor of 1% or 2%, where is all the excitement about having approximation log n versus (log n)1/2, or even 2 versus 1.5? Isn't this an entirely self-referential community that has nothing to offer to the practice of computing?
In his (beautifully written and highly recommended) book, Vijay Vazirani considers this question and gives the objective benchmark response.
In my own words, the point of the objective benchmark response is to observe that the service that theory can give to other communities is the development of ideas and techniques that are so counterintuitive and/or sophisticated that they would never occur to someone just working on a particular problem. Such ideas and techniques are the result of working in an idealized model (simplified problems, looking only for polynomial time algorithms, regardless of efficiency, focusing on worst-case approximation ratio) that, however far from reality, is objectively quantitative. If a new algorithm has a better approximation ratio than previous ones, then there must be something new in the algorithm, and, over time, whatever it is that is new will be understood, simplified, and distilled to the point of becoming a standard technique that can be used in practice. I assume that there are good stories to go with this argument. (Primal-dual algorithms? Graph-partitioning followed by dynamic programming?) Incidentally, this argument suggests that one should be more interested in an algorithm that barely matches known bounds but that works in a radically new way, than in an algorithm that shaves a loglog n factor by a new combination of known methods plus some case analysis.
I think, however, than one can give a slightly different response. Isn't it true that in practice we want 2% approximation? Exactly!, I would say. But if we want to prove that our efficient algorithm achieves such approximation, we can't, because PCP results forbid it. Plus, for specific algorithms, we know how to construct inputs on which the algorithm performs very badly. And there is now an intriguing convergence between the kind of inputs constructed as worst cases of specific algorithms (such as linear and semidefinite programs) and the kind of inputs constructed as gadgets for PCP constructions. So we have an increasingly clear idea of what inputs are hard. This can be an excellent starting point for reasoning about algorithms. Once we know that graphs with property P are very bad for known algorithms and come out of PCP constructions, we can think of algorithms that work on anti-P graphs, and the insights developed in designing algorithms for general graphs, in finding hard instances, and constructing PCPs, might help us understand how algorithms work on anti-P graphs. Those anti-P graphs, in turn, might include most of the inputs that come up in practice. (More later on this point.)
There is at least one story to go with this argument, and it's Sanjeev Arora's lack of success in proving a PCP hardness for TSP in the plane, that led him to try to understand what a PCP gadget for TSP should look like, that led him to conclude that no such gadget could be embedded in the plane, because of a property of geometric instances that was the starting point of his approximation scheme! (See Arora's paper.)
Lately, several researches have been working simultaneously on inapproximability and on approximation algorithms, and I hope that we will see more positive consequences of the understanding gained from lower bounds.
Another standard objection to our way of studying algorithms is the emphasis on worst case, while "in practice" one is happy enough with algorithms that are good on "most" inputs. Here I have never been satisfied with the pessimism response that we don't know what distribution of inputs will be typical in applications, so it is good (and more reliable) to focus on worst-case performance. The recent work on smoothed analysis is certainly a good indication that the theory community can say interesting and rigorous things about random inputs.
There is, however, a different line of work that I would like to see. For example, there is now an understanding that "typical" networks have a power-law degree distribution and there are random models for them which are considered adequate and predictive. Wouldn't it be great to have approximation algorithms for routing and other problems analysed on those distributions? Especially if the analysis had the following modular structure: (1) we understand how worst-case instances look like for important network problem X; (2) by reasoning about networks that are very far from such worst-cases, we develop an algorithm for X that has outstanding performance on graphs with property Y; (3) graphs sampled from model Z of power-law graphs have property Y with very high probability.
Update 8/1/06: David Johnson points out that there is a body of recent work showing discrepancies between the topology of real networks and the topologies generated by popular models of power-law graphs. For example, the graphs in the probabilistic model have smaller vertex-connectivity than real networks (which are designed to be fault-tolerant). Here is one early paper about this.
In his (beautifully written and highly recommended) book, Vijay Vazirani considers this question and gives the objective benchmark response.
In my own words, the point of the objective benchmark response is to observe that the service that theory can give to other communities is the development of ideas and techniques that are so counterintuitive and/or sophisticated that they would never occur to someone just working on a particular problem. Such ideas and techniques are the result of working in an idealized model (simplified problems, looking only for polynomial time algorithms, regardless of efficiency, focusing on worst-case approximation ratio) that, however far from reality, is objectively quantitative. If a new algorithm has a better approximation ratio than previous ones, then there must be something new in the algorithm, and, over time, whatever it is that is new will be understood, simplified, and distilled to the point of becoming a standard technique that can be used in practice. I assume that there are good stories to go with this argument. (Primal-dual algorithms? Graph-partitioning followed by dynamic programming?) Incidentally, this argument suggests that one should be more interested in an algorithm that barely matches known bounds but that works in a radically new way, than in an algorithm that shaves a loglog n factor by a new combination of known methods plus some case analysis.
I think, however, than one can give a slightly different response. Isn't it true that in practice we want 2% approximation? Exactly!, I would say. But if we want to prove that our efficient algorithm achieves such approximation, we can't, because PCP results forbid it. Plus, for specific algorithms, we know how to construct inputs on which the algorithm performs very badly. And there is now an intriguing convergence between the kind of inputs constructed as worst cases of specific algorithms (such as linear and semidefinite programs) and the kind of inputs constructed as gadgets for PCP constructions. So we have an increasingly clear idea of what inputs are hard. This can be an excellent starting point for reasoning about algorithms. Once we know that graphs with property P are very bad for known algorithms and come out of PCP constructions, we can think of algorithms that work on anti-P graphs, and the insights developed in designing algorithms for general graphs, in finding hard instances, and constructing PCPs, might help us understand how algorithms work on anti-P graphs. Those anti-P graphs, in turn, might include most of the inputs that come up in practice. (More later on this point.)
There is at least one story to go with this argument, and it's Sanjeev Arora's lack of success in proving a PCP hardness for TSP in the plane, that led him to try to understand what a PCP gadget for TSP should look like, that led him to conclude that no such gadget could be embedded in the plane, because of a property of geometric instances that was the starting point of his approximation scheme! (See Arora's paper.)
Lately, several researches have been working simultaneously on inapproximability and on approximation algorithms, and I hope that we will see more positive consequences of the understanding gained from lower bounds.
Another standard objection to our way of studying algorithms is the emphasis on worst case, while "in practice" one is happy enough with algorithms that are good on "most" inputs. Here I have never been satisfied with the pessimism response that we don't know what distribution of inputs will be typical in applications, so it is good (and more reliable) to focus on worst-case performance. The recent work on smoothed analysis is certainly a good indication that the theory community can say interesting and rigorous things about random inputs.
There is, however, a different line of work that I would like to see. For example, there is now an understanding that "typical" networks have a power-law degree distribution and there are random models for them which are considered adequate and predictive. Wouldn't it be great to have approximation algorithms for routing and other problems analysed on those distributions? Especially if the analysis had the following modular structure: (1) we understand how worst-case instances look like for important network problem X; (2) by reasoning about networks that are very far from such worst-cases, we develop an algorithm for X that has outstanding performance on graphs with property Y; (3) graphs sampled from model Z of power-law graphs have property Y with very high probability.
Update 8/1/06: David Johnson points out that there is a body of recent work showing discrepancies between the topology of real networks and the topologies generated by popular models of power-law graphs. For example, the graphs in the probabilistic model have smaller vertex-connectivity than real networks (which are designed to be fault-tolerant). Here is one early paper about this.
13 Comments:
7/31/2006 11:18:00 PM
I agree with your sentiment regarding work on random models that both predict problem instances and satisfy some "easyness" property. One of the other areas where I'd like to see this is with SAT solvers. We have software like zChaff that works remarkably well "in practice," so the motivation is there.
So far, most of the work I've seen on characterizing SAT hardness focuses on understanding things like how the ratio of clauses to variables affects hardness for a random formula, not on characterizing hardness of instances drawn from predictive models. Of course, there is an ocean of work going on, so probably I just missed it...
8/01/2006 10:21:00 AM
The propositional proof complexity community certainly looks at issues like this, although they tend to focus on cooking up hard examples for specific proof systems or analyzing the hardness of "standard" random k-cnf model for different proof systems. More specific models of "random planning problems with structure BLAH" or "circuit equivalence for these Nice Circuits" have not come up in that literature.
Coming with a predictive model that explains why things like zchaff are "good in practice" is an intersting point. The most important thing seems to be the method of translating the problem into CNF. Engineering folks chunk out a few ideas on this in DAC, ICCAD, CAV, etc but there is no theoretical understanding that I have yet seen.
Nate
8/01/2006 01:00:00 PM
Regarding power-law networks, I believe the modelling issue is quite controversial. Certainly, typical networks are power-law, and there are a variety of models generating power-law networks, but claims that instances produced by these networks are typical tend to be rather unrigorous. In the networking community, at least, there seems to be a definite trend toward a more low-level analysis. Maybe this is a reaction to hype about universality of power-law behavior - it's even claimed that traditional measurement tools are biased towards observing such behavior.
Rahul.
8/01/2006 01:48:00 PM
Hi Luca,
Just wanted to point out: the two arguments that you mentioned are generic, and relativize to many other areas of research. To be specific:
- the first argument says: a measureable scientific progress can lead to discoveries that can be useful, even if optimizing the measure by itself it is not necessarily useful.
- the second argument says: an impossibility of achieving a solution within a model, even if not very predictive by itself, can lead to a more accurate (and useful) model.
In a sense, the two arguments complement each other: it is good when (i) one can make (measureable) progress and (ii) when one (provably) cannot.
Piotr
8/01/2006 02:45:00 PM
Hi Piotr, I too thought of the two arguments as complementary, but your nice summary makes the complementarity much more clear.
8/02/2006 12:32:00 PM
Re Rahul's comment, I agree that average-case analysis based on models coming from TCS such as the Aiello-Chung-Lu one (essentially random graphs constrained to a fixed degree distribution) is unconvincing. I'm more comfortable with work showing good approximations or fast algorithms with only weak assumptions such as a power law distribution or low diameter.
Re your original question, "what good is O(log n)-approximation" – the post has something the flavor of begging the question: assuming that the math you find interesting must be useful somehow and then hunting around for stories to convince yourself of what you were already assuming. Wouldn't it be more satisfactory to admit that exploring the mathematical limits of computation can be interesting in its own right? Because face it, if our primary goal were to do something as useful as possible, we'd be physicians or software developers or sanitation engineers or something else rather than TCS researchers.
8/02/2006 06:37:00 PM
Re David's comments: sure, the "inherent value" or "mathematical beauty" is another important motivation for TCS work. But I think there is plenty of TCS work that is exciting in itself and has a big practical impact, see the Theory Matters web page for some of the examples (crypto is a classic one). No need to change the job ;)
Piotr
8/03/2006 12:42:00 PM
Sort of related to D. Eppstein's comments: their are surely results in math, physics or statistic that might be useful for CS as well. I don't think Luca's comments by themselves make TCS results more interesting than results in those fields to CS researchers (non-TCS).
In other words, is TCS's contribution to CS at large any different from the intrinsic interest any rigorous mathematical exercise might have?
8/03/2006 04:46:00 PM
My guess (hope?) is that TCS works on models that are
(1) abstract enough that we can make incremental and measurable progress, until a big breakthrough can emerge (building on the understanding created by the incremental results) and
(2) close enough to reality that a breakthrough in the model is likely to be useful in reality. (And much more likely to be useful in CS than a breakthrough in a random area of Math.)
If, after accounting for the small size and young age of the TCS community, it were demonstrated that the TCS contribution to non-theoretical CS were of the same order as contributions from Physics, pure Math, and so on, I would consider such a finding to be very worrysome.
It would mean either
(i) that we do produce a sustained stream of ideas and techniques that can be used by other computer scientists (and at a higher rate than would be expected from a random rigorous math exercise), but that the "transfer" of such ideas and techniques to practice is not happening, or
(ii) that (God forbids!) we are working on models that don't really capture the limitations of computing. In which case we will eventually change such model.
The story of PRAM is instructive. Deep understandings about the nature of parallelism were gained by studying PRAM, and some insights were timeless and likely to be useful in other applications (see Trifonov's work on undirect graph connectivity in nearly logarithmic space). And yet people have stopped studying PRAMs since before my time.
8/03/2006 06:04:00 PM
Hi, Luca,
Could you please list the body of work mentioned by david Johnson? Thanks.
8/03/2006 06:54:00 PM
My guess (hope?) is that TCS works on models that are abstract enough [so that] a big breakthrough can emerge and close enough to reality that a breakthrough in the model is likely to be useful in reality.
For what is worth, the computer science department at Waterloo has had five largish spin-offs (plus countless small ones) over the years, with four of them coming from the algorithms and/or combinatorics side of things. Albeit a rather small sample, one can take hope from this that the work we theoreticians do does have relevance in the real world.
Alex Lopez-Ortiz
8/03/2006 07:00:00 PM
Could you please list the body of work mentioned by david Johnson?
He referred me to the work of Walter Willinger. This paper in particular is quite relevant, and searching for Walter Willinger (or Scott Shenker) on citeseer or google scholar should give several references.
8/03/2006 11:58:00 PM
And yet people have stopped studying PRAMs since before my time.
My understanding is that parallel algorithms (though maybe not PRAMs per se) are heading for something of a resurgence — the failure of Moore's law means that the only way to get faster PCs these days is to go to dual-core or quad-core machines, and in not too long it may be 256-cpu architectures. A colleague who just came back from SPAA tells me that last year there was a feeling of doom and gloom, that people were thinking they should close the conference down, but this year there was a lot more energy.
Post a Comment
<< Home