## Tuesday, May 15, 2007

### Proving unsatisfiability of random kSAT

In the previous random kSAT post we saw that for every $k$ there is a constant $c_k$ such that

1. A random kSAT formula with $n$ variables and $m$ clauses is conjectured to be almost surely satisfiable when $m/n < c_k - \epsilon$ and almost surely unsatisfiable when $m/n > c_k + \epsilon$;
2. There is an algorithm that is conjectured to find satisfying assignments with high probability when given a random kSAT formula with $n$ variables and fewer than $(c_k - \epsilon) n$ clauses.

So, conjecturally, the probability of satisfiability of a random kSAT formula has a sudden jump at a certain threshold value of the ratio of clauses to variables, and in the regime where the formula is likely to be satisfiable, the kSAT problem is easy-on-average.

What about the regime where the formula is likely to be unsatisfiable? Is the problem still easy on average? And what would that exactly mean? The natural question about average-case complexity is: is there an efficient algorithm that, in the unsatisfiable regime, finds with high probability a certifiably correct answer? In other words, is there an algorithm that efficiently delivers a proof of unsatisfiability given a random formula with $m$ clauses and $n$ variables, $m> (c_k + \epsilon) n$?

Some non-trivial algorithms, that I am going to describe shortly, find such unsatisfiability proofs but only in regimes of fairly high density. It is also known that certain broad classes of algorithms fail for all constant densities. It is plausible that finding unsatisfiability proofs for random kSAT formulas with any constant density is an intractable problem. If so, its intractability has a number of interesting consequences, as shown by Feige.

A first observation is that if we have an unsatisfiable 2SAT formula then we can easily prove its unsatisfiability, and so we may try to come with some kind of reduction from 3SAT to 2SAT. In general, this is of course hopeless. But consider a random 3SAT formula $\phi$ with $n$ variables and $10 n^2$ clauses. Now, set $x_1 \leftarrow 0$ in $\phi$, and consider the resulting formula $\phi'$. The variable $x_1$ occurred in about $30 n$ clauses, positively in about $15 n$ of them (which have now become 2SAT clauses in $\phi'$) and negatively in about $15 n$ clauses, that have now disappeared in $\phi'$. Let's look at the 2SAT clauses of $\phi'$: there are about $15 n$ such clauses, they are random, so they are extremely likely to be unsatisfiable, and, if so, we can easily prove that they are. If the 2SAT subset of $\phi'$ is unsatisfiable, then so is $\phi'$, and so we have a proof of unsatisfiability for $\phi'$.

Now set $x_1 \leftarrow 1$ in $\phi$, thus constructing a new formula $\phi''$. As before, the 2SAT part of $\phi''$ is likely to be unsatisfiable, and, if so, its unsatisfiability is easily provable in polynomial time.

Overall, we have that we can prove that $\phi$ is unsatisfiable when setting $x_1 \leftarrow 0$, and also unsatisfiable when setting $x_1\leftarrow 1$, and so $\phi$ is unsatisfiable.

This works when $m$ is about $n^2$ for 3SAT, and when $m$ is about $n^{k-1}$ for kSAT. By fixing $O(\log n)$ at a time it is possible to shave another polylog factor. These idea is due to Beame, Karp, Pitassi, and Saks.

A limitation of this approach is that it produces polynomial-size resolution proofs of unsatisfiability and, in fact tree-like resolution proofs. It is known that polynomial-size resolution proofs do not exist for random 3SAT formulas with fewer than $n^{1.5-\epsilon}$ clauses, and tree-like resolution proofs do not exist even when the number of clauses is just less than $n^{2-\epsilon}$. This is a limitation that afflicts all backtracking algorithms, and so all approaches of the form let's fix some variables, then apply the 2SAT algorithm.'' So something really different is needed to make further progress.

Besides the 2SAT algorithm, what other algorithms do we have to prove that no solution exists for a given problem? There are algorithms for linear and semidefinite programming, and there is Gaussian elimination. We'll see how they can be applied to random kSAT in the next theory post.

1.  Andy D
5/15/2007 08:46:00 PM

Thanks for these posts!

Could random k-SAT also be easy on average above the satisfiability threshold in the sense that:

There exists an algorithm A that, given a random instance I with m/n > c_k + eps,

finds a satisfying solution with high probability, *conditioning* on the event that I is satisfiable?

The hope being that the solution will be obvious by virtue of being so constrained. You could apply the partial reduction to 2-SAT for starters.

2.  Anonymous
5/16/2007 06:11:00 AM

What is the reference that "polynomial-size resolution proofs do not exist for random 3SAT formulas with fewer than n1.5-ε clauses"?

3.  Anonymous
5/16/2007 07:44:00 AM

[Beame, Karpe, Pitassi, Saxe '98] seems to show n^{1.25 - epsilon}, though [Ben-Sasson, Wigderson '99] seems to suggest that BKPS98 gives n^{1.5 - epsilon}. I'm also curious now...

4.  Paul Beame
5/18/2007 07:28:00 PM

Eli and Avi were too generous. Eli Ben-Sasson's PhD Thesis shows n^{1.5-epsilon} for resolution and n^{2-epsilon} for tree resolution/DPLL. The improvement is not a result of using the width-size relationship but rather a small change that involves using the minimality of the set of clauses along with Theorem 2.18 on page 23 of his dissertation.

5.  Anonymous
3/14/2008 04:41:00 AM

6.  Anonymous
7/17/2008 12:34:00 AM
7.  Anonymous
7/17/2008 12:34:00 AM
8.  Anonymous
7/17/2008 12:34:00 AM
9.  Anonymous
7/17/2008 12:34:00 AM
10.  Anonymous
7/17/2008 12:35:00 AM