Final Project: Solving NP Complete Problems with Cook-Levin Theorem

Chris Tralie


We suspect that NP complete problems are harder than problems in P, but we don't know for sure. Usually, though, the fact that a problem is NP complete is enough for people to throw up their hands and say we can't solve it. But what if we actually do want to solve it for some practical purpose? What options do we have available to us?

In this project, you will explore one strategy for solving NP complete decision problems via polynomial time reductions. To accomplish this, you will devise a reduction from an NP complete problem of your choosing to an instance of SAT. Since SAT is itself an NP-complete problem, you can use it to solve any problem in NP, including other NP-complete problems. Thus, you will use SAT to decide your problem, and as a side effect, you will be able to extract a solution to your problem instance if it is satisfiable.

We know by the Cook-Levin theorem that any problem with a polynomial time verifier can be reduced to SAT, but the proof assumes that we have a Turing machine for the verifier, which is usually not practical. Instead, you will explore a more bespoke but practical reduction to SAT for a particular NP-compelte problem. The steps you will need to do are as follows:

  1. Create a method to generate random instances of your problem of a certain size.
  2. Polynomial time reduce your problem to SAT; that is, generate a CNF boolean statement that, if solved, will provide a solution to your problem. The goal is for this to be much easier and more direct than making a Turing machine verifier!
  3. Convert the SAT solution back into a certificate in the language of your problem, and create a verifier that checks whether the certificate is a valid solution to a problem of your type.

Below are some more details about the three tasks you have to do:

Your Tasks

Click here to download the starter code. Also review the notes at this link, and choose an NP-complete problem to solve. I describe k-clique and vertex cover at that link, but you are not limited to these. Click here to view a longer list of NP complete problems if you're interested. But be sure to check with me first if you want to do something I haven't described just to make sure it's not too tough for this project.

The only problems you can't choose are Hamiltonian path and Hamiltonian cycle, since I've already done Hamiltonian cycle and Hamiltonian path is a trivial modification of that. Have a look at the video below to see me going through all of the tasks for the final project on Hamiltonian cycle, and click here to view the solution

Part 1: Generating And Plotting (Pseudo)Random Solvable Problems (25 Points)

You will have to make a (pseudo)random (seeded random) problem of size N, where N depends on the context of the problem. For instance, in an NP-complete graph such as Hamiltonian path, vertex cover, or k-clique, N is the number of nodes in the graph. Your random problems should be solvable; that is, it should be possibly to satisfy the CNF clauses that you generate for your problem.

You should also create a method to draw an instance of your problem, as well as a certificate for it.

Other Parameters

Aside from N, you may have some extra parameters to set for your problem. For instance, for Hamiltonian path, you have the number of edges, for vertex cover, you have the size K of the number of vertices that are supposed to be part of the cover, and for a K-clique, you will have the size K of the clique we're looking for. You can choose these extra parameters to be whatever you like, but to be competitive, you should think carefully about what a "hard" problem of your type would look like. For example, intuitively, graphs with very few edges might be easier to check for the Hamiltonian path problem since there are fewer possibilities. However, too many edges might be too easy as well, since that increases the number of options in which a Hamiltonian path could occur. Often, the sweet spot of hardest complexity is somewhere in the middle.

Another heuristic you can use is that when you plot or show your problem, it shouldn't be too obvious for a human to pick out the soluton.

Part 2: Polynomial Time Certificate Translation (45 Points)

Perform a reduction from your problem to SAT in polynomial time in the size of the problem. The number of clauses and variables should be a polynomial size as a function of your input size.

Part 3: Certificate Translation And Verification (20 Points)

To make sure the solution is valid, you should convert the certificate back from SAT to your language and re-implement the verifier in your language. In other words, you should re-check everything you expressed in the SAT clauses, but in your language. This consistency check will make sure you actually implemented the SAT clauses properly.

Part 4: Hard Instances (10 Points)

Now that you've finished the entire pipeline, we're going to try to stump the Glucose solver. Explore different random examples with at most 200 variables, and an arbitrary polynomial number of clauses of arbitrary polynomial length, and pick out the example that seems to be the toughest for glucose to solve. Save your CNF formula to disk with save the method in the CNF class. Submit this file, along with an image or text description of the problem. If I have time, I will make a class gallery of the problems, along with some experiments for how long distributions of runtimes are over many experiments.