Pick a random instance of 3SAT by picking at random $m$ of the possible $8 {n\choose 3}$ clauses that can be constructed over $n$ variables. It is easy to see that if one sets $m=cn$, for a sufficiently large constant $c$, then the formula will be unsatisfiable with very high probability (at least $1-2^n \cdot (7/8)^m$), and it is also possible (but less easy) to see that if $c$ is a sufficiently small constant, then the formula is satisfiable with very high probability.
A number of questions come to mind:
- If I plot, for large $n$, the probability that a random 3SAT formula with $n$ variables and $cn$ clauses is satisfiable, against the density $c$, what does the graph look like? We just said the probability is going to be close to 1 for small $c$ and close to $0$ for large $c$, but does it go down smoothly or sharply?
Here the conjecture, supported by experimental evidence, is that the graph looks like a step function: that there is a constant $c_3$ such that the probability of satisfiability is $1-o_n(1)$ for density $< c_3$ and it is $o_n(1)$ for density $>c_3$. A similar behavior is conjectured for kSAT for all $k$, with the threshold value $c_k$ being dependent on $k$.
Friedgut proved a result that comes quite close to establishing the conjecture.
For, say, 3SAT, the statement of the conjecture is that there is a value $c_3$ such that for every interval size $\epsilon$, every confidence $\delta$ and every sufficiently large $n$, if you pick a 3SAT formula with $(c_3+\epsilon)n$ clauses and $n$ variables, the probability of satisfiability is at most $\delta$, but if you pick a formula with $(c_3-\epsilon)n$ clauses then the probability of satisfiability is at least $1-\delta$.
Friedgut proved that for every $n$ there is a density $c_{3,n}$, such that for every interval size $\epsilon$, every confidence $\delta$ and every sufficiently large $n$, if you pick a 3SAT formula with $(c_{3,n}+\epsilon)n$ clauses and $n$ variables, the probability of satisfiability is at most $\delta$, but if you pick a formula with $(c_{3,n}-\epsilon)n$ clauses then the probability of satisfiability is at least $1-\delta$.
So, for larger and larger $n$, the graph of proability of satisfiability versus density does look more and more like a step function, but Friedgut's proof does not guarantee that the location of the step stays the same. Of course the location is not going to move, but nobody has been able to prove that yet.
Semi-rigorous methods (by which I mean, methods where you make things up as you go along) from statistical physics predict the truth of the conjecture and predict a specific value for $c_3$ (and for $c_k$ for each $k$) that agrees with experiments. It remains a long-term challenge to turn these arguments into a rigorous proof.
For large $k$, work by Achlioptas, Moore, and Peres shows almost matching upper and lower bounds on $c_k$ by a second moment approach. They show that if you pick a random kSAT formula for large $k$ the variance of the number of satisfying assignments of the formula is quite small, and so the formula is likely to be unsatisfiable when the average number of assignments is close to zero (which actually just follows from Markov's inequality), but also the formula is likely to be satisfiable when the average number of assignments is large. Their methods, however, do not improve previous results for 3SAT. Indeed, it is known that the variance is quite large for 3SAT, and the conjectured location of $c_3$ is not the place where the average number of assignments goes from being small to being large. (The conjectured value of $c_3$ is smaller.)
- Pick a random formula with a density that makes it very likely that the formula is satisfiable: is this a distribution of inputs that makes 3SAT hard-on-average?
Before addressing the question we need to better specify what we mean by hard-on-average (and, complementarily, easy-on-average) in this case. For example, the algorithm that always says "satisfiable" works quite well; over the random choice of the formula, the error probability of the algorithm is extremely small. In such settings, however, what one would like from an algorithm is to produce an actual satisfying assignment. So far, all known lower bounds for $c_3$ are algorithmic, so in the density range in which we rigorously know that a random 3SAT formula is likely to be satisfiable we also know how to produce, with high probability, a satisfying assignment in polynomial time. The results for large $k$, however, are non-constructive and it remains an open question to match them with an algorithmic approach.
The statistical physics methods that suggest the existence of sharp thresholds also inspired an algorithm (the survey propagation algorithm) that, in experiments, efficiently finds satisfying assignments in the full range of density in which 3SAT formulas are believed to be satisfiable with high probability. It is an exciting, but very difficult, question to rigorously analyze the behavior of this algorithm.
- Pick a random formula with a density that makes it very likely that the formula is unsatisfiable: is this a distribution of inputs that makes 3SAT hard-on-average?
Again, an algorithm that simply says "unsatisfiable" works with high probability. The interesting question, however, is whether there is an algorithm that efficiently and with high probability delivers certificates of unsatisfiability. (Just like the survey propagation algorithm delivers certificates of satisfiability in the density range in which they exist, or so it is conjectured.) This will be the topic of the next post.